massive parallel processing
Recently Published Documents


TOTAL DOCUMENTS

22
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

2020 ◽  
pp. 525-532
Author(s):  
Nelson Enrique Vera-Parra ◽  
Danilo Alfonso López-Sarmiento ◽  
Cristian Alejandro Rojas-Quintero

The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing the super k-mers, their minimizers and two massive parallelization patterns in an indexed and compact way: one for obtaining the canonical m-mers of a set of reads and another for  searching for super k-mers based on minimizers. The model was implemented through two OpenCL kernels. The evaluation of the kernels shows favorable results in terms of execution times and memory requirements to use the model for constructing heterogeneous solutions with simultaneous execution (workload distribution), which perform co-processing using the current search methods of super k -mers on the CPU and the methods presented herein on GPU. The model implementation code is available in the repository: https://github.com/BioinfUD/K-mersCL.



2019 ◽  
Vol 5 (1) ◽  
pp. 65-79
Author(s):  
Yunhong Ji ◽  
Yunpeng Chai ◽  
Xuan Zhou ◽  
Lipeng Ren ◽  
Yajie Qin

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.





Author(s):  
В.П. Потапов ◽  
С.Е. Попов ◽  
М.А. Костылев

Рассмотрена задача создания информационно-вычислительной системы обработки радарных снимков с возможностью визуализации, конфигурирования и запуска алгоритмов основных этапов процессинга интерферометрических данных методом Persistent Scatterer в интеграции с MPP-системой (Massive Parallel Processing) для высокопроизводительного мониторинга смещений земной поверхности участков аэрокосмической съемки. Приведены основные схемы маршрутизации потоков данных исполнения заданий. Представлена программная реализация в виде веб-портала на базе компонентов ReactJS, включая автоматизированную загрузку и обновление базы данных радарных снимков Sentinel-1A посредством технологии RESTful API. The aim of the presented work is the development of an information computational system for processing radar images with the ability to visualize, configure and run algorithms for the main stages of processing interferometric data by the Persistent Scatterer method integrated with the MPP system (massive parallel processing) for high-performance monitoring of the Earth surface displacement of aerospace survey sites. As a result of the analysis of the different approaches used in the processing of radar data and the review of distributed computing technologies, a distributed information system based on the architecture of massively parallel execution of the Apache Hadoop ecosystem processes the streaming post-processing of radar images and the construction of a displacement map was proposed and implemented. A software implementation is presented in the form of a web portal based on ReactJS components, including automated downloading and updating of the Sentinel-1A radar image database using RESTful API technology. The innovation of suggested solution consists of the model of the interaction between developed processing modules based on the isolated execution context with HDFS data storage during the preparing procedure and the complete cycle for the processing of the Earth surface displacement. An integrated approach to the developing scalable front-end and back-end software complex components with the use of ReactJS, Redux and Apache Spark framework was used for the first time. Supporting of WPS specification makes it possible using almost any GIS, which works with this standard. The evaluation of a scientific and technological level of research shows high performance of the developed system while maintaining the results quality. In particular, the adapted and integrated ESA SNAP Toolbox returned identical arrays of processed interferometric data in the per-pixel comparison but the speed of the procedure is several times faster.



Author(s):  
Vijayalakshmi Saravanan ◽  
Anpalagan Alagan ◽  
Isaac Woungang

With the advent of novel wireless technologies and Cloud Computing, large volumes of data are being produced from various heterogeneous devices such as mobile phones, credit cards, and computers. Managing this data has become the de-facto challenge in the current Information Systems. According to Moore's law, processor speeds are no longer doubling, the processing power also continuing to grow rapidly which leads to a new scientific data intensive problem in every field, especially Big Data domain. The revolution of Big Data lies in the improved statistical analysis and computational power depend on its processing speed. Hence, the need to put massively multi-core systems on the job is vital in order to overcome the physical limits of complexity and speed. It also arises with many challenges such as difficulties in capturing massive applications, data storage, and analysis. This chapter discusses some of the Big Data architectural challenges in the perspective of multi-core processors.





Sign in / Sign up

Export Citation Format

Share Document