massive parallel processing Latest Research Papers

HETEROGENEOUS COMPUTING TO ACCELERATE THE SEARCH OF SUPER K-MERS BASED ON MINIMIZERS

International Journal of Computing ◽

10.47839/ijc.19.4.1985 ◽

2020 ◽

pp. 525-532

Author(s):

Nelson Enrique Vera-Parra ◽

Danilo Alfonso López-Sarmiento ◽

Cristian Alejandro Rojas-Quintero

Keyword(s):

Data Structure ◽

Parallel Processing ◽

Large Volume ◽

Heterogeneous Computing ◽

Data Set ◽

Workload Distribution ◽

Processing Techniques ◽

Massive Parallelization ◽

Massive Parallel Processing ◽

Parallel Processing Model

The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing the super k-mers, their minimizers and two massive parallelization patterns in an indexed and compact way: one for obtaining the canonical m-mers of a set of reads and another for searching for super k-mers based on minimizers. The model was implemented through two OpenCL kernels. The evaluation of the kernels shows favorable results in terms of execution times and memory requirements to use the model for constructing heterogeneous solutions with simultaneous execution (workload distribution), which perform co-processing using the current search methods of super k -mers on the CPU and the methods presented herein on GPU. The model implementation code is available in the repository: https://github.com/BioinfUD/K-mersCL.

Unsupervised Dimensionality Reduction in Big Data via Massive Parallel Processing with MapReduce and Resilient Distributed Datasets

10.11606/d.55.2020.tde-20012021-125711 ◽

2020 ◽

Author(s):

Jadson Jose Monteiro Oliveira

Keyword(s):

Big Data ◽

Parallel Processing ◽

Dimensionality Reduction ◽

Massive Parallel Processing

Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases

Data Science and Engineering ◽

10.1007/s41019-019-00114-z ◽

2019 ◽

Vol 5 (1) ◽

pp. 65-79

Author(s):

Yunhong Ji ◽

Yunpeng Chai ◽

Xuan Zhou ◽

Lipeng Ren ◽

Yajie Qin

Keyword(s):

Cost Effectiveness ◽

Fault Tolerance ◽

Parallel Processing ◽

Query Processing ◽

Success Rate ◽

Database Systems ◽

Analytical Processing ◽

Commodity Clusters ◽

Massive Parallel Processing ◽

Query Latency

AbstractIntra-query fault tolerance has increasingly been a concern for online analytical processing, as more and more enterprises migrate data analytical systems from mainframes to commodity computers. Most massive parallel processing (MPP) databases do not support intra-query fault tolerance. They may suffer from prolonged query latency when running on unreliable commodity clusters. While SQL-on-Hadoop systems can utilize the fault tolerance support of low-level frameworks, such as MapReduce and Spark, their cost-effectiveness is not always acceptable. In this paper, we propose a smart intra-query fault tolerance (SIFT) mechanism for MPP databases. SIFT achieves fault tolerance by performing checkpointing, i.e., materializing intermediate results of selected operators. Different from existing approaches, SIFT aims at promoting query success rate within a given time. To achieve its goal, it needs to: (1) minimize query rerunning time after encountering failures and (2) introduce as less checkpointing overhead as possible. To evaluate SIFT in real-world MPP database systems, we implemented it in Greenplum. The experimental results indicate that it can improve success rate of query processing effectively, especially when working with unreliable hardware.

Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases

Lecture Notes in Computer Science - Advances and Trends in Artificial Intelligence. From Theory to Practice ◽

10.1007/978-3-030-22999-3_6 ◽

2019 ◽

pp. 63-70

Author(s):

Zhihao Zheng ◽

Yuanzhe Bei ◽

Hongyan Sun ◽

Pengyu Hong

Keyword(s):

Parallel Processing ◽

Execution Time ◽

Query Execution ◽

Time Prediction ◽

Query Execution Time Prediction ◽

Execution Time Prediction ◽

Massive Parallel Processing

Bank Big Data Architecture Based on Massive Parallel Processing Database

2018 15th International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN) ◽

10.1109/i-span.2018.00024 ◽

2018 ◽

Author(s):

Shenglan Ma ◽

Hong Xiao ◽

Botong Xu ◽

Ran Tao ◽

Fangkai Xie ◽

...

Keyword(s):

Big Data ◽

Parallel Processing ◽

Data Architecture ◽

Massive Parallel Processing

The information and computational system for the massive parallel processing of radar data based on Apache Spark framework

Вычислительные технологии ◽

10.25743/ict.2018.23.16507 ◽

2018 ◽

Author(s):

В.П. Потапов ◽

С.Е. Попов ◽

М.А. Костылев

Keyword(s):

Parallel Processing ◽

High Performance ◽

Surface Displacement ◽

Radar Data ◽

Computational System ◽

Radar Images ◽

The Earth ◽

Persistent Scatterer ◽

Massive Parallel Processing ◽

Spark Framework

Рассмотрена задача создания информационно-вычислительной системы обработки радарных снимков с возможностью визуализации, конфигурирования и запуска алгоритмов основных этапов процессинга интерферометрических данных методом Persistent Scatterer в интеграции с MPP-системой (Massive Parallel Processing) для высокопроизводительного мониторинга смещений земной поверхности участков аэрокосмической съемки. Приведены основные схемы маршрутизации потоков данных исполнения заданий. Представлена программная реализация в виде веб-портала на базе компонентов ReactJS, включая автоматизированную загрузку и обновление базы данных радарных снимков Sentinel-1A посредством технологии RESTful API. The aim of the presented work is the development of an information computational system for processing radar images with the ability to visualize, configure and run algorithms for the main stages of processing interferometric data by the Persistent Scatterer method integrated with the MPP system (massive parallel processing) for high-performance monitoring of the Earth surface displacement of aerospace survey sites. As a result of the analysis of the different approaches used in the processing of radar data and the review of distributed computing technologies, a distributed information system based on the architecture of massively parallel execution of the Apache Hadoop ecosystem processes the streaming post-processing of radar images and the construction of a displacement map was proposed and implemented. A software implementation is presented in the form of a web portal based on ReactJS components, including automated downloading and updating of the Sentinel-1A radar image database using RESTful API technology. The innovation of suggested solution consists of the model of the interaction between developed processing modules based on the isolated execution context with HDFS data storage during the preparing procedure and the complete cycle for the processing of the Earth surface displacement. An integrated approach to the developing scalable front-end and back-end software complex components with the use of ReactJS, Redux and Apache Spark framework was used for the first time. Supporting of WPS specification makes it possible using almost any GIS, which works with this standard. The evaluation of a scientific and technological level of research shows high performance of the developed system while maintaining the results quality. In particular, the adapted and integrated ESA SNAP Toolbox returned identical arrays of processed interferometric data in the per-pixel comparison but the speed of the procedure is several times faster.

Evolution of the GPU Device widely used in AI and Massive Parallel Processing

2018 IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM) ◽

10.1109/edtm.2018.8421507 ◽

2018 ◽

Cited By ~ 5

Author(s):

Toru Baji

Keyword(s):

Parallel Processing ◽

Massive Parallel Processing

Big Data in Massive Parallel Processing

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch011 ◽

2018 ◽

pp. 276-302 ◽

Cited By ~ 1

Author(s):

Vijayalakshmi Saravanan ◽

Anpalagan Alagan ◽

Isaac Woungang

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Storage ◽

Mobile Phones ◽

Scientific Data ◽

Computational Power ◽

Data Intensive ◽

Heterogeneous Devices ◽

Processing Power ◽

Massive Parallel Processing

With the advent of novel wireless technologies and Cloud Computing, large volumes of data are being produced from various heterogeneous devices such as mobile phones, credit cards, and computers. Managing this data has become the de-facto challenge in the current Information Systems. According to Moore's law, processor speeds are no longer doubling, the processing power also continuing to grow rapidly which leads to a new scientific data intensive problem in every field, especially Big Data domain. The revolution of Big Data lies in the improved statistical analysis and computational power depend on its processing speed. Hence, the need to put massively multi-core systems on the job is vital in order to overcome the physical limits of complexity and speed. It also arises with many challenges such as difficulties in capturing massive applications, data storage, and analysis. This chapter discusses some of the Big Data architectural challenges in the perspective of multi-core processors.

Goldfish: In-Memory Massive Parallel Processing SQL Engine Based on Columnar Store

2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) ◽

10.1109/ithings-greencom-cpscom-smartdata.2016.49 ◽

2016 ◽

Author(s):

Jin Wang ◽

Hancong Duan ◽

Geyong Min ◽

Guangqiang Ying ◽

Song Zheng

Keyword(s):

Parallel Processing ◽

Massive Parallel Processing

Basic k-mer operations using massive parallel processing on heterogeneus architectures

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2016.7883047 ◽

2016 ◽

Author(s):

Nelson Enrique Vera-Parra ◽

Cristian Alejandro Rojas-Quintero ◽

Jose Nelson Perez-Castillo

Keyword(s):

Parallel Processing ◽

Massive Parallel Processing

massive parallel processing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

HETEROGENEOUS COMPUTING TO ACCELERATE THE SEARCH OF SUPER K-MERS BASED ON MINIMIZERS

Unsupervised Dimensionality Reduction in Big Data via Massive Parallel Processing with MapReduce and Resilient Distributed Datasets

Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases

Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases

Bank Big Data Architecture Based on Massive Parallel Processing Database

The information and computational system for the massive parallel processing of radar data based on Apache Spark framework

Evolution of the GPU Device widely used in AI and Massive Parallel Processing

Big Data in Massive Parallel Processing

Goldfish: In-Memory Massive Parallel Processing SQL Engine Based on Columnar Store

Basic k-mer operations using massive parallel processing on heterogeneus architectures

Export Citation Format

massive parallel processingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

HETEROGENEOUS COMPUTING TO ACCELERATE THE SEARCH OF SUPER K-MERS BASED ON MINIMIZERS

Unsupervised Dimensionality Reduction in Big Data via Massive Parallel Processing with MapReduce and Resilient Distributed Datasets

Smart Intra-query Fault Tolerance for Massive Parallel Processing Databases

Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases

Bank Big Data Architecture Based on Massive Parallel Processing Database

The information and computational system for the massive parallel processing of radar data based on Apache Spark framework

Evolution of the GPU Device widely used in AI and Massive Parallel Processing

Big Data in Massive Parallel Processing

Goldfish: In-Memory Massive Parallel Processing SQL Engine Based on Columnar Store

Basic k-mer operations using massive parallel processing on heterogeneus architectures

massive parallel processing
Recently Published Documents