Adaptive On-the-Fly Changes in Distributed Processing Pipelines

Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.

Download Full-text

Keysystems in large systems implementing distributed data processing and storage technologies

Highly available systems ◽

10.18127/j20729472-202103-01 ◽

2021 ◽

Author(s):

V.G. Belenkov ◽

V.I. Korolev ◽

V.I. Budzko ◽

D.A. Melnikov

Keyword(s):

Data Processing ◽

Distributed Processing ◽

Specific Work ◽

Distributed Data ◽

Distributed Data Processing ◽

Technical Specifications ◽

The Creation ◽

Processing And Storage ◽

Storage Technologies ◽

And Storage

The article discusses the features of the use of the cryptographic information protection means (CIPM)in the environment of distributed processing and storage of data of large information and telecommunication systems (LITS).A brief characteristic is given of the properties of the cryptographic protection control subsystem - the key system (CS). A description is given of symmetric and asymmetric cryptographic systems, required to describe the problem of using KS in LITS.Functional and structural models of the use of KS and CIPM in LITS, are described. Generalized information about the features of using KS in LITS is given. The obtained results form the basis for further work on the development of the architecture and principles of KS construction in LITS that implement distributed data processing and storage technologies. They can be used both as a methodological guide, and when carrying out specific work on the creation and development of systems that implement these technologies, as well as when forming technical specifications for the implementation of work on the creation of such systems.

Download Full-text

4. Big data analytics

Big Data: A Very Short Introduction ◽

10.1093/actrade/9780198779575.003.0004 ◽

2017 ◽

pp. 44-58

Author(s):

Dawn E. Holmes

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Processing System ◽

Distributed Data ◽

New Paradigm ◽

Distributed Data Processing ◽

Customer Preferences ◽

Classical Statistics ◽

Core Functionality

‘Big data analytics’ argues that big data is only useful if we can extract useful information from it. It looks at some of the techniques used to discover useful information from big data, such as customer preferences or how fast an epidemic is spreading. Big data analytics is changing rapidly as the size of the datasets increases and classical statistics makes room for this new paradigm. An example of big data analytics is the algorithmic method called MapReduce, a distributed data processing system that forms part of the core functionality of the Hadoop Ecosystem. Amazon, Google, Facebook, and many others use Hadoop to store and process their data.

Download Full-text

Big Data Analytics in Medicine and Healthcare

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0030 ◽

2018 ◽

Vol 15 (3) ◽

Cited By ~ 38

Author(s):

Blagoj Ristevski ◽

Ming Chen

Keyword(s):

Big Data ◽

Data Analytics ◽

Data Privacy ◽

Big Data Analytics ◽

Heterogeneous Data ◽

Distributed Data ◽

Biomedical Data ◽

Privacy And Security ◽

Distributed Data Processing ◽

Data Processing Software

Abstract This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

Download Full-text

A Distributed Processing Laboratory Data System

Journal of AOAC INTERNATIONAL ◽

10.1093/jaoac/65.5.1279 ◽

1982 ◽

Vol 65 (5) ◽

pp. 1279-1282

Author(s):

Thomas J Birkel ◽

Laurence R Dusold

Keyword(s):

Data Processing ◽

Computer System ◽

Response Times ◽

Distributed Processing ◽

Assembly Language ◽

Laboratory Data ◽

Distributed Data ◽

System Effect ◽

Distributed Data Processing ◽

Protocol Conversion

Abstract Distributed data processing has been accomplished by a computer system in which laboratory instrument data are collected on a PEAK-11 system for preliminary processing and generation of initial reports. When further processing is required, or when archival storage of raw or processed data is desired, data are transferred over telephone lines to an IBM 3033; an IBM 7406 Device Coupler is used to handle protocol conversion and ″handshaking.″ User-written programs in APL.SV on the IBM machine and in Assembly Language on the PEAK-11 system effect the transfer of bidirectional data. The distributed processing approach allows efficient use of expensive peripherals while maintaining short response times.

Download Full-text

Real-time stream processing for Big Data

it - Information Technology ◽

10.1515/itit-2016-0002 ◽

2016 ◽

Vol 58 (4) ◽

Cited By ~ 7

Author(s):

Wolfram Wingerath ◽

Felix Gessert ◽

Steffen Friedrich ◽

Norbert Ritter

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Sensor Data ◽

Distributed Data ◽

Qualitative Comparison ◽

Data Repositories ◽

Fine Grained ◽

Distributed Data Processing ◽

Trade Offs

AbstractWith the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics.In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza and Spark Streaming. We describe their respective underlying rationales, the guarantees they provide and discuss the trade-offs that come with selecting one of them for a particular task.

Download Full-text

Study on Distributed Data Processing System for Decentralized Spherical Multi-robot based on Edge Computing and Blockchain

2020 IEEE International Conference on Mechatronics and Automation (ICMA) ◽

10.1109/icma49215.2020.9233789 ◽

2020 ◽

Author(s):

Shuxiang Guo ◽

Sheng Cao ◽

Jian Guo ◽

Jigang Xu

Keyword(s):

Data Processing ◽

Processing System ◽

Edge Computing ◽

Distributed Data ◽

Data Processing System ◽

Distributed Data Processing ◽

Multi Robot

Download Full-text

Efficient Geo-distributed Data Processing with Rout

2013 IEEE 33rd International Conference on Distributed Computing Systems ◽

10.1109/icdcs.2013.23 ◽

2013 ◽

Cited By ~ 4

Author(s):

Chamikara Jayalath ◽

Patrick Eugster

Keyword(s):

Data Processing ◽

Distributed Data ◽

Distributed Data Processing

Download Full-text

MULTIFUNCTIONAL DATA COLLECTION SYSTEM FOR MONITORING THE STATE OF TECHNICAL FACILITIES

ВЕСТНИК ВОРОНЕЖСКОГО ГОСУДАРСТВЕННОГО ТЕХНИЧЕСКОГО УНИВЕРСИТЕТА ◽

10.36622/vstu.2021.17.6.007 ◽

2022 ◽

pp. 56-61

Author(s):

Г.В. Петрухнова ◽

И.Р. Болдырев

Keyword(s):

Data Collection ◽

Power Supply ◽

Input Voltage ◽

Distributed Data ◽

Collection System ◽

Technical Object ◽

Data Collection System ◽

Control Functions ◽

Communication Circuits ◽

A Chain

Представлен комплекс технических средств создания для системы сбора данных. Проведена формализация процессов реализации функций контроля технического объекта. Рассматриваемая система сбора данных состоит из функционально законченных устройств, выполняющих определенные функции в контексте работы системы. Данная система, с одной стороны, может быть одним из узлов распределенной системы сбора данных, с другой стороны, может использоваться автономно. Показана актуальность создания системы. В основе разработки использован RISC микроконтроллер STM32H743VIT6, семейства ARM Cortex-M7, работающий на частоте до 400 МГц. К основным модулям системы относятся 20-входовый распределитель напряжения; модуль питания и настройки; модуль цифрового управления; модуль анализа, хранения и передачи данных в управляющий компьютер. Рассмотрен состав и назначение этих модулей. За сбор данных в рассматриваемой системе отвечает цепочка устройств: датчик - схема согласования - АЦП - микроконтроллер. Поскольку в составе системы имеются не только АЦП, но и ЦАП, то на ее базе может быть реализована система управления объектом. Выбор датчиков для снятия информации обусловлен особенностями объекта контроля. Имеется возможность в ручном режиме измерять электрические параметры контуров связи, в том числе обеспечивать проверку питания IDE и SATA-устройств. Представленная система сбора данных является средством, которое может быть использовано для автоматизации процессов контроля состояния технических объектов We present a set of technical means for creating a data collection system. We carried out the formalization of the processes of implementing the control functions of a technical object. The multifunctional data collection system consists of functionally complete devices that perform certain functions in the context of the system operation. This system, on the one hand, can be one of the nodes of a distributed data collection system, on the other hand, it can be used autonomously. We show the relevance of the system creation. The development is based on the RISC microcontroller STM32H743VIT6, ARM Cortex-M7 family, operating at a frequency of up to 400 MHz. The main modules of the system include: a 20-input voltage distributor; a power supply and settings module; a digital control module; a module for analyzing, storing and transmitting data to a control computer. We considered the composition and purpose of these modules. A chain of devices is responsible for data collection in the system under consideration: sensor - matching circuit - ADC - microcontroller. Since the system includes not only an ADC but also a DAC, an object management system can be implemented on its basis. The choice of sensors for taking information is due to the characteristics of the object of control. It is possible to manually measure the electrical parameters of the communication circuits, including checking the power supply of IDE and SATA devices. The presented data collection system is a tool that can be used to automate the processes of monitoring the condition of technical facilities

Download Full-text