synchronization mechanism
Recently Published Documents


TOTAL DOCUMENTS

258
(FIVE YEARS 54)

H-INDEX

19
(FIVE YEARS 2)

Author(s):  
Antonio Fuentes-Alventosa ◽  
Juan Gómez-Luna ◽  
José Maria González-Linares ◽  
Nicolás Guil ◽  
R. Medina-Carnicer

AbstractCAVLC (Context-Adaptive Variable Length Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several hardware accelerators for CAVLC have been designed. In contrast, high-performance software implementations of CAVLC (e.g., GPU-based) are scarce. A high-performance GPU-based implementation of CAVLC is desirable in several scenarios. On the one hand, it can be exploited as the entropy component in GPU-based H.264 encoders, which are a very suitable solution when GPU built-in H.264 hardware encoders lack certain necessary functionality, such as data encryption and information hiding. On the other hand, a GPU-based implementation of CAVLC can be reused in a wide variety of GPU-based compression systems for encoding images and videos in formats other than H.264, such as medical images. This is not possible with hardware implementations of CAVLC, as they are non-separable components of hardware H.264 encoders. In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5$$\times$$ × and 5.4$$\times$$ × faster than the only state-of-the-art GPU-based implementation of CAVLC.


2021 ◽  
Vol 2108 (1) ◽  
pp. 012076
Author(s):  
Jinliang Dong ◽  
Xu Zhang ◽  
Haijiang Li ◽  
Wenzhi Song ◽  
Jinglin Guo

Abstract For the security monitoring of pumped storage power station, a model synchroniza-tion mechanism for cloud edge cooperation framework is proposed. The method uses the belief function to describe the threshold and uses the ping-pong operation strategy to update the model alternately, which solves the problem of artificial intelligence model synchronization and update of edge equipment. The cloud is based on Baidu BML platform, the edge uses customized servers, and the average model update cycle is about three months.


Author(s):  
Ronald Jackson ◽  
Shamsul Aizam Zulkifli ◽  
Muhamed Benbouzid ◽  
Suriana Salimin ◽  
Mubashir Hayat Khan ◽  
...  

Author(s):  
Wenjie Huang ◽  
Antonio Chella ◽  
Angelo Cangelosi

There are many developed theories and implemented artificial systems in the area of machine consciousness, while none has achieved that. For a possible approach, we are interested in implementing a system by integrating different theories. Along this way, this paper proposes a model based on the global workspace theory and attention mechanism, and providing a fundamental framework for our future work. To examine this model, two experiments are conducted. The first one demonstrates the agent’s ability to shift attention over multiple stimuli, which accounts for the dynamics of conscious content. Another experiment of simulations of attentional blink and lag-1 sparing, which are two well-studied effects in psychology and neuroscience of attention and consciousness, aims to justify the agent’s compatibility with human brains. In summary, the main contributions of this paper are (1) Adaptation of the global workspace framework by separated workspace nodes, reducing unnecessary computation but retaining the potential of global availability; (2) Embedding attention mechanism into the global workspace framework as the competition mechanism for the consciousness access; (3) Proposing a synchronization mechanism in the global workspace for supporting lag-1 sparing effect, retaining the attentional blink effect.


2021 ◽  
Author(s):  
Ayleen Schinko ◽  
Walter Vogler ◽  
Johannes Gareis ◽  
N. Tri Nguyen ◽  
Gerald Lüttgen

AbstractInterface theories based on Interface Automata (IA) are formalisms for the component-based specification of concurrent systems. Extensions of their basic synchronization mechanism permit the modelling of data, but are studied in more complex settings involving modal transition systems or do not abstract from internal computation. In this article, we show how de Alfaro and Henzinger’s original IA theory can be conservatively extended by shared memory data, without sacrificing simplicity or imposing restrictions. Our extension IA for shared Memory (IAM) decorates transitions with pre- and post-conditions over algebraic expressions on shared variables, which are taken into account by IA’s notion of component compatibility. Simplicity is preserved as IAM can be embedded into IA and, thus, accurately lifts IA’s compatibility concept to shared memory. We also provide a ground semantics for IAM that demonstrates that our abstract handling of data within IA’s open systems view is faithful to the standard treatment of data in closed systems.


2021 ◽  
Author(s):  
Adrián Castelló ◽  
Enrique S. Quintana-Ortí ◽  
José Duato

AbstractTensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sara Pérez-García ◽  
Mario García-Navarrete ◽  
Diego Ruiz-Sanchis ◽  
Cristina Prieto-Navarro ◽  
Merisa Avdovic ◽  
...  

AbstractThe synchronization is a recurring phenomenon in neuroscience, ecology, human sciences, and biology. However, controlling synchronization in complex eukaryotic consortia on extended spatial-temporal scales remains a major challenge. Here, to address this issue we construct a minimal synthetic system that directly converts chemical signals into a coherent gene expression synchronized among eukaryotic communities through rate-dependent hysteresis. Guided by chemical rhythms, isolated colonies of yeast Saccharomyces cerevisiae oscillate in near-perfect synchrony despite the absence of intercellular coupling or intrinsic oscillations. Increased speed of chemical rhythms and incorporation of feedback in the system architecture can tune synchronization and precision of the cell responses in a growing cell collectives. This synchronization mechanism remain robust under stress in the two-strain consortia composed of toxin-sensitive and toxin-producing strains. The sensitive cells can maintain the spatial-temporal synchronization for extended periods under the rhythmic toxin dosages produced by killer cells. Our study provides a simple molecular framework for generating global coordination of eukaryotic gene expression through dynamic environment.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 759
Author(s):  
Edel Díaz ◽  
Raúl Mateos ◽  
Emilio J. Bueno ◽  
Rubén Nieto

Presently, the trend is to increase the number of cores per chip. This growth is appreciated in Multi-Processor System-On-Chips (MPSoC), composed of more cores in heterogeneous and homogeneous architectures in recent years. Thus, the difficulty of verification of this type of system has been great. The hardware/software co-simulation Virtual Platforms (VP) are presented as a perfect solution to address this complexity, allowing verification by simulation/emulation of software and hardware in the same environment. Some works parallelized the software emulator to reduce the verification times. An example of this parallelization is the QEMU (Quick EMUlator) tool. However, there is no solution to synchronize QEMU with the hardware simulator in this new parallel mode. This work analyzes the current software emulators and presents a new method to allow an external synchronization of QEMU in its parallelized mode. Timing details of the cores are taken into account. In addition, performance analysis of the software emulator with the new synchronization mechanism is presented, using: (1) a boot Linux for MPSoC Zynq-7000 (dual-core ARM Cortex-A9) (Xilinx, San Jose, CA, USA); (2) an FPGA-Linux co-simulation of a power grid monitoring system that is subsequently implemented in an industrial application. The results show that the novel synchronization mechanism does not add any appreciable computational load and enables parallelized-QEMU in hardware/software co-simulation virtual platforms.


2021 ◽  
Author(s):  
Ramesh Guntha ◽  
Maneesha Vinodini Ramesh

<p>Substantially complete landslide inventories aid the accurate landslide modelling of a region’s susceptibility and landslide forecasting. Recording of landslides soon after they have occurred is important as their presence can be quickly erased (e.g., the landslide removed by people or through erosion/vegetation). In this paper, we present the technical software considerations that went into building a Landslide Tracker app to aid in the collection of landslide information by non-technical local citizens, trained volunteers, and experts to create more complete inventories on a real-time basis through the model of crowdsourcing. The tracked landslide information is available for anyone across the world to view. This app is available on Google Play Store for free, and at http://landslides.amrita.edu, with software conceived and developed by Amrita University in the context of the UK NERC/FCDO funded LANDSLIP research project (http://www.landslip.org/).</p><p>The three technical themes we discuss in this paper are the following: (i) security, (ii) performance, and (iii) network resilience. (i) Security considerations include authentication, authorization, and client/server-side enforcement. Authentication allows only the registered users to record and view the landslides, whereas authorization protects the data from illegal access. For example, landslides created by one user are not editable by others, and no user should be able to delete landslides. This validation is enforced at the client-side (mobile and web apps) and also at the server-side software to prevent unintentional and intentional illegal access. (ii) Performance considerations include designing high-performance data structures, mobile databases, client-side caching, server-side caching, cache synchronization, and push-notifications. The database is designed to ensure the best performance without sacrificing data integrity. Then the read-heavy data is cached in memory to get this data with very low latency. Similarly, the data, once fetched, is cached in memory on the app so that it can be re-used without making repeated calls to the server every time when the user visits a screen.  The data persists in the mobile database so the app can load faster while reopening. A cache-synchronization mechanism is implemented to prevent the caches' data from becoming stale as new data comes into the database. The synchronization mechanism consists of push-notifications and incremental data pulls. (iii) Network resiliency considerations are achieved with the help of local storage on the app. This allows recording the landslides even when there is no internet connection. The app automatically pushes the updates to the server as soon as the connectivity resumes. We have observed over 300% reduction in time taken to load 2000 landslides, between the no-cache mode to cache mode during the performance testing. </p><p>The Landslide tracker app was released during the 2020 monsoon season and more than 250 landslides were recorded through the app across India and the world.</p>


Sign in / Sign up

Export Citation Format

Share Document