scholarly journals Apache Nemo: A Framework for Optimizing Distributed Data Processing

2020 ◽  
Vol 38 (3-4) ◽  
pp. 1-31
Author(s):  
Won Wook Song ◽  
Youngseok Yang ◽  
Jeongyoon Eo ◽  
Jangho Seo ◽  
Joo Yeon Kim ◽  
...  

Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control. We describe Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance and also ensures correctness for ease of use. We combine several techniques to achieve this, including an intermediate representation of dataflow, compiler optimization passes, and runtime extensions. Our evaluation results show that Nemo enables composable and reusable optimizations that bring performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario. Apache Nemo is open-sourced at https://nemo.apache.org as an Apache incubator project.

2013 ◽  
Vol 765-767 ◽  
pp. 941-944
Author(s):  
Peng Wang ◽  
Jia Nan Wang ◽  
Ji Ci Ba ◽  
Yu Tan

This paper introduces SPRINT algorithm optimized in the Hadoop core framework. Combing the data mining process, we will study the cloud computing in the MapReduce programming model, then improve and optimize the SPRINT algorithm in conjunction with the mode, transplant the optimized algorithm to Hadoop platform for distributed data processing.


Author(s):  
Breno A. de Melo Menezes ◽  
Nina Herrmann ◽  
Herbert Kuchen ◽  
Fernando Buarque de Lima Neto

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Onur Yukselen ◽  
Osman Turkyilmaz ◽  
Ahmet Rasit Ozturk ◽  
Manuel Garber ◽  
Alper Kucukural

Sign in / Sign up

Export Citation Format

Share Document