distributed and parallel computing
Recently Published Documents


TOTAL DOCUMENTS

48
(FIVE YEARS 8)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
pp. 17-46
Author(s):  
Sunilkumar Manvi ◽  
Gopal K. Shyam

2020 ◽  
Vol 62 (5) ◽  
pp. 435-450
Author(s):  
Dominik Filipiak ◽  
Krzysztof Węcel ◽  
Milena Stróżyna ◽  
Michał Michalak ◽  
Witold Abramowicz

Abstract The presented method reconstructs a network (a graph) from AIS data, which reflects vessel traffic and can be used for route planning. The approach consists of three main steps: maneuvering points detection, waypoints discovery, and edge construction. The maneuvering points detection uses the CUSUM method and reduces the amount of data for further processing. The genetic algorithm with spatial partitioning is used for waypoints discovery. Finally, edges connecting these waypoints form the final maritime traffic network. The approach aims at advancing the practice of maritime voyage planning, which is typically done manually by a ship’s navigation officer. The authors demonstrate the results of the implementation using Apache Spark, a popular distributed and parallel computing framework. The method is evaluated by comparing the results with an on-line voyage planning application. The evaluation shows that the approach has the capacity to generate a graph which resembles the real-world maritime traffic network.


Author(s):  
Cristian Ramon-Cortes ◽  
Ramon Amela ◽  
Jorge Ejarque ◽  
Philippe Clauss ◽  
Rosa M. Badia

The last improvements in programming languages and models have focused on simplicity and abstraction; leading Python to the top of the list of the programming languages. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and contains one single annotation (in the form of a Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. The evaluation demonstrates that AutoParallel goes one step further in easing the development of distributed applications. On the one hand, the programmability evaluation highlights the benefits of using a single Python decorator instead of manually annotating each task and its parameters or, even worse, having to develop the parallel code explicitly (e.g., using OpenMP, MPI). On the other hand, the performance evaluation demonstrates that AutoParallel is capable of automatically generating task-based workflows from sequential Python code while achieving the same performances than manually taskified versions of established state-of-the-art algorithms (i.e., Cholesky, LU, and QR decompositions). Finally, AutoParallel is also capable of automatically building data blocks to increase the tasks’ granularity; freeing the user from creating the data chunks, and re-designing the algorithm. For advanced users, we believe that this feature can be useful as a baseline to design blocked algorithms.


Author(s):  
Syed Muhammad Fawad Ali ◽  
Johannes Mey ◽  
Maik Thiele

Abstract Today’s ETL tools provide capabilities to develop custom code as user-defined functions (UDFs) to extend the expressiveness of the standard ETL operators. However, while this allows us to easily add new functionalities, it also comes with the risk that the custom code is not intended to be optimized, e.g., by parallelism, and for this reason, it performs poorly for data-intensive ETL workflows. In this paper we present a novel framework, which allows the ETL developer to choose a design pattern in order to write parallelizable code and generates a configuration for the UDFs to be executed in a distributed environment. This enables ETL developers with minimum expertise in distributed and parallel computing to develop UDFs without taking care of parallelization configurations and complexities. We perform experiments on large-scale datasets based on TPC-DS and BigBench. The results show that our approach significantly reduces the effort of ETL developers and at the same time generates efficient parallel configurations to support complex and data-intensive ETL tasks.


Sign in / Sign up

Export Citation Format

Share Document