distributed data processing Latest Research Papers

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

Frontiers in Big Data ◽

10.3389/fdata.2021.666174 ◽

2021 ◽

Vol 4 ◽

Author(s):

Toon Albers ◽

Elena Lazovik ◽

Mostafa Hadadian Nejad Yousefi ◽

Alexander Lazovik

Keyword(s):

Distributed Processing ◽

Big Data Analytics ◽

Distributed Data ◽

Proof Of Concept ◽

Planning Time ◽

Distributed Data Processing ◽

Development Experience ◽

A Chain ◽

Processing Steps ◽

Running Calculation

Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

10.21203/rs.3.rs-1079576/v1 ◽

2021 ◽

Author(s):

Pankaj Singh ◽

Sudhakar Singh ◽

P K Mishra ◽

Rakhi Garg

Keyword(s):

Data Processing ◽

Iterative Algorithms ◽

Frequent Itemset ◽

Experimental Results ◽

Distributed Data ◽

Data Intensive ◽

Hadoop Mapreduce ◽

Distributed Data Processing ◽

Benchmark Datasets ◽

Processing Framework

Abstract Frequent itemset mining (FIM) is a highly computational and data intensive algorithm. Therefore, parallel and distributed FIM algorithms have been designed to process large volume of data in a reduced time. Recently, a number of FIM algorithms have been designed on Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for the highly iterative FIM algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On this framework, Apriori and FP-Growth based FIM algorithms have been designed on the Spark RDD framework, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, and the experimental results show that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

10.1109/ic2e52221.2021.00018 ◽

2021 ◽

Author(s):

Jonathan Will ◽

Lauritz Thamsen ◽

Dominik Scheinert ◽

Jonathan Bader ◽

Odei Kao

Keyword(s):

Data Processing ◽

Distributed Data ◽

Configuration Optimization ◽

Distributed Data Processing ◽

Cluster Configuration

ScienceIoT: Evolution of the Wireless Infrastructure of KREONET

Sensors ◽

10.3390/s21175852 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5852

Author(s):

Cheonyong Kim ◽

Joobum Kim ◽

Ki-Hyeon Kim ◽

Sang-Kwon Lee ◽

Kiwook Kim ◽

...

Keyword(s):

High Performance ◽

Area Network ◽

Distributed Data ◽

Wide Area Network ◽

Scientific Applications ◽

Computing Platform ◽

Distributed Data Processing ◽

Research And Education ◽

Education Network ◽

Wireless Infrastructure

Here, we introduce the current stage and future directions of the wireless infrastructure of the Korea Research Environment Open NETwork (KREONET), a representative national research and education network in Korea. In 2018, ScienceLoRa, a pioneering wireless network infrastructure for scientific applications based on low-power wide-area network technology, was launched. Existing in-service applications in monitoring regions, research facilities, and universities prove the effectiveness of using wireless infrastructure in scientific areas. Furthermore, to support the more stringent requirements of various scientific scenarios, ScienceLoRa is evolving toward ScienceIoT by employing high-performance wireless technology and distributed computing capability. Specifically, by accommodating a private 5G network and an integrated edge computing platform, ScienceIoT is expected to support cutting-edge scientific applications requiring high-throughput and distributed data processing.

Distributed Data Processing for Large-Scale Simulations on Cloud

10.1109/emc/si/pi/emceurope52599.2021.9559316 ◽

2021 ◽

Author(s):

Tianjian Lu ◽

Stephan Hoyer ◽

Qing Wang ◽

Lily Hu ◽

Yi-Fan Chen

Keyword(s):

Data Processing ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Processing ◽

Large Scale Simulations

Design and Implementation of Internet Ticketing System Based on Distributed Data Processing Platform

10.1109/icmsse53595.2021.00067 ◽

2021 ◽

Author(s):

Xiaomei Pei ◽

Hailin Tang

Keyword(s):

Data Processing ◽

Distributed Data ◽

Design And Implementation ◽

Distributed Data Processing ◽

Processing Platform

Compliant geo-distributed data processing in action

Proceedings of the VLDB Endowment ◽

10.14778/3476311.3476359 ◽

2021 ◽

Vol 14 (12) ◽

pp. 2843-2846

Author(s):

Kaustubh Beedkar ◽

David Brekardin ◽

Jorge-Anulfo Quiané-Ruiz ◽

Volker Markl

Keyword(s):

Data Processing ◽

Distributed Data ◽

Distributed Data Processing

Research of Distributed Data Processing in Corporate Information Systems

2021 IEEE 16th International Conference on the Experience of Designing and Application of CAD Systems (CADSM) ◽

10.1109/cadsm52681.2021.9385244 ◽

2021 ◽

Author(s):

Mykhailo Klymash ◽

Ihor Chaikovskyi ◽

Nataliia Syvkova ◽

Olena Hordiichuk-Bublivska ◽

Marian Kyryk

Keyword(s):

Information Systems ◽

Data Processing ◽

Distributed Data ◽

Distributed Data Processing ◽

Corporate Information Systems ◽

Corporate Information

Keysystems in large systems implementing distributed data processing and storage technologies

Highly available systems ◽

10.18127/j20729472-202103-01 ◽

2021 ◽

Author(s):

V.G. Belenkov ◽

V.I. Korolev ◽

V.I. Budzko ◽

D.A. Melnikov

Keyword(s):

Data Processing ◽

Distributed Processing ◽

Specific Work ◽

Distributed Data ◽

Distributed Data Processing ◽

Technical Specifications ◽

The Creation ◽

Processing And Storage ◽

Storage Technologies ◽

And Storage

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.

Software Redocumentation Using Distributed Data Processing Technique to Support Program Understanding for Legacy System: A Proposed Approach

10.1007/978-3-030-90235-3_21 ◽

2021 ◽

pp. 239-252

Author(s):

Sugumaran Nallusamy ◽

Hoo Meei Hao ◽

Farizuwana Akma Zulkifle

Keyword(s):

Data Processing ◽

Processing Technique ◽

Support Program ◽

Distributed Data ◽

Program Understanding ◽

Legacy System ◽

Distributed Data Processing ◽

System A ◽

Data Processing Technique

distributed data processing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

ScienceIoT: Evolution of the Wireless Infrastructure of KREONET

Distributed Data Processing for Large-Scale Simulations on Cloud

Design and Implementation of Internet Ticketing System Based on Distributed Data Processing Platform

Compliant geo-distributed data processing in action

Research of Distributed Data Processing in Corporate Information Systems

Keysystems in large systems implementing distributed data processing and storage technologies

Software Redocumentation Using Distributed Data Processing Technique to Support Program Understanding for Legacy System: A Proposed Approach

Export Citation Format

distributed data processingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version)

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

ScienceIoT: Evolution of the Wireless Infrastructure of KREONET

Distributed Data Processing for Large-Scale Simulations on Cloud

Design and Implementation of Internet Ticketing System Based on Distributed Data Processing Platform

Compliant geo-distributed data processing in action

Research of Distributed Data Processing in Corporate Information Systems

Keysystems in large systems implementing distributed data processing and storage technologies

Software Redocumentation Using Distributed Data Processing Technique to Support Program Understanding for Legacy System: A Proposed Approach

distributed data processing
Recently Published Documents