distributed execution
Recently Published Documents


TOTAL DOCUMENTS

171
(FIVE YEARS 26)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 15 ◽  
Author(s):  
Florian Porrmann ◽  
Sarah Pilz ◽  
Alessandra Stella ◽  
Alexander Kleinjohann ◽  
Michael Denker ◽  
...  

The SPADE (spatio-temporal Spike PAttern Detection and Evaluation) method was developed to find reoccurring spatio-temporal patterns in neuronal spike activity (parallel spike trains). However, depending on the number of spike trains and the length of recording, this method can exhibit long runtimes. Based on a realistic benchmark data set, we identified that the combination of pattern mining (using the FP-Growth algorithm) and the result filtering account for 85–90% of the method's total runtime. Therefore, in this paper, we propose a customized FP-Growth implementation tailored to the requirements of SPADE, which significantly accelerates pattern mining and result filtering. Our version allows for parallel and distributed execution, and due to the improvements made, an execution on heterogeneous and low-power embedded devices is now also possible. The implementation has been evaluated using a traditional workstation based on an Intel Broadwell Xeon E5-1650 v4 as a baseline. Furthermore, the heterogeneous microserver platform RECS|Box has been used for evaluating the implementation on two HiSilicon Hi1616 (Kunpeng 916), an Intel Coffee Lake-ER Xeon E-2276ME, an Intel Broadwell Xeon D-D1577, and three NVIDIA Tegra devices (Jetson AGX Xavier, Jetson Xavier NX, and Jetson TX2). Depending on the platform, our implementation is between 27 and 200 times faster than the original implementation. At the same time, the energy consumption was reduced by up to two orders of magnitude.


Author(s):  
Basireddy Ithihas Reddy

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.


2021 ◽  
Vol 37 (1-4) ◽  
pp. 1-37
Author(s):  
Youwei Zhuo ◽  
Jingji Chen ◽  
Gengyu Rao ◽  
Qinyi Luo ◽  
Yanzhi Wang ◽  
...  

To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems— GraphS and GraphSR —for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.


2021 ◽  
Vol 20 (4) ◽  
pp. 1-27
Author(s):  
Marten Lohstroh ◽  
Christian Menard ◽  
Soroush Bateni ◽  
Edward A. Lee

Many programming languages and programming frameworks focus on parallel and distributed computing. Several frameworks are based on actors, which provide a more disciplined model for concurrency than threads. The interactions between actors, however, if not constrained, admit nondeterminism. As a consequence, actor programs may exhibit unintended behaviors and are less amenable to rigorous testing. We show that nondeterminism can be handled in a number of ways, surveying dataflow dialects, process networks, synchronous-reactive models, and discrete-event models. These existing approaches, however, tend to require centralized control, pose challenges to modular system design, or introduce a single point of failure. We describe “reactors,” a new coordination model that combines ideas from several of these approaches to enable determinism while preserving much of the style of actors. Reactors promote modularity and allow for distributed execution. By using a logical model of time that can be associated with physical time, reactors also provide control over timing. Reactors also expose parallelism that can be exploited on multicore machines and in distributed configurations without compromising determinacy.


Author(s):  
Hatem Elshazly ◽  
Francesc Lordan ◽  
Jorge Ejarque ◽  
Rosa M. Badia

Task-based programming models offer a flexible way to express the unstructured parallelism patterns of nowadays complex applications. This expressive capability is required to achieve maximum possible performance for applications that are executed in distributed execution platforms. In current task-based workflows, tasks are launched for execution when their data dependencies are satisfied. However, even though the data dependencies of a certain task might have been already produced, the execution of this task will be delayed until its predecessor tasks completely finish their execution. As a consequence of this approach of releasing dependencies, the amount of parallelism inherent in applications is limited and performance improvement opportunities are wasted. To mitigate this limitation, we propose an eager approach for releasing data dependencies. Following this approach, the execution of tasks will not be delayed until their predecessor tasks completely finish their execution, instead, tasks will be launched for execution as soon as their data requirements are available. Hence, more parallelism is exposed and applications can achieve higher levels of performance by overlapping the execution of tasks. Towards achieving this goal, in this paper we propose applying two changes to task-based workflow systems. First, modifying the dependency relationships of tasks to be specified not only in terms of predecessor and successor tasks but also in terms of the data that caused these dependencies. Second, triggering the release of dependencies as soon as a predecessor task generates the output data instead of having to wait until the end of the predecessor execution to release all of its dependencies. We realize this proposal using PyCOMPSs: a task-based programming model for parallelizing Python applications. Our experiments show that using an eager approach for releasing dependencies achieves more than 50% performance improvement in the total execution time as compared to the default approach of releasing dependencies.


2021 ◽  
Vol 2 ◽  
Author(s):  
Lei-Xin Xu ◽  
Yang-Yang Chen

The applications of the deep reinforcement learning method to achieve the arcs welding by multi-robot systems are presented, where the states and the actions of each robot are continuous and obstacles are considered in the welding environment. In order to adapt to the time-varying welding task and local information available to each robot in the welding environment, the so-called multi-agent deep deterministic policy gradient (MADDPG) algorithm is designed with a new set of rewards. Based on the idea of the distributed execution and centralized training, the proposed MADDPG algorithm is distributed. Simulation results demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document