High-Level Declarative Stream Processing

Abstract The growing number of Internet of Things (IoT) devices provide a massive pool of sensing data. However, turning data into actionable insights is not a trivial task, especially in the context of IoT, where application development itself is complex. The process entails working with heterogeneous devices via various communication protocols to co-ordinate and fetch datasets, followed by a series of data transformations. Graphical mashup tools, based on the principles of flow-based programming paradigm, operating at a higher-level of abstraction are in widespread use to support rapid prototyping of IoT applications. Nevertheless, the current state-of-the-art mashup tools suffer from several architectural limitations which prevent composing in-flow data analytics pipelines. In response to this, the paper contributes by (i) designing novel flow-based programming concepts based on the actor model to support data analytics pipelines in mashup tools, prototyping the ideas in a new mashup tool called aFlux and providing a detailed comparison with the existing state-of-the-art and (ii) enabling easy prototyping of streaming applications in mashup tools by abstracting the behavioural configurations of stream processing via graphical flows and validating the ease as well as the effectiveness of composing stream processing pipelines from an end-user perspective in a traffic simulation scenario.

Download Full-text

SPar: A DSL for High-Level and Productive Stream Parallelism

Parallel Processing Letters ◽

10.1142/s0129626417400059 ◽

2017 ◽

Vol 27 (01) ◽

pp. 1740005 ◽

Cited By ~ 20

Author(s):

Dalvan Griebler ◽

Marco Danelutto ◽

Massimo Torquati ◽

Luiz Gustavo Fernandes

Keyword(s):

Shared Memory ◽

Stream Processing ◽

Parallel Applications ◽

Domain Specific Language ◽

Specific Language ◽

Domain Specific ◽

Implementation Techniques ◽

High Level

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.

Download Full-text

Stream Processing of a Neural Classifier I

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch218 ◽

2011 ◽

pp. 1490-1496

Author(s):

M. Martínez-Zarzuela ◽

F. J. Díaz Pernas ◽

D. González Ortega ◽

J. F. Díez Higuera ◽

M. Antón Rodríguez

Keyword(s):

Neural Networks ◽

Real Time ◽

Stream Processing ◽

Neural Processing ◽

Real Time Processing ◽

Fuzzy Art ◽

Neural Classifier ◽

Single Data ◽

Artificial Neural Network Ann ◽

High Level

An Artificial Neural Network (ANN) is a computational structure inspired by the study of biological neural processing. Although neurons are considered as very simple computation units, inside the nervous system, an incredible amount of widely inter-connected neurons can process huge amounts of data working in a parallel fashion. There are many different types of ANNs, from relatively simple to very complex, just as there are many theories on how biological neural processing works. However, execution of ANNs is always a heavy computational task. Important kinds of ANNs are those devoted to pattern recognition such as Multi-Layer Perceptron (MLP), Self-Organizing Maps (SOM) or Adaptive Resonance Theory (ART) classifiers (Haykin, 2007). Traditional implementations of ANNs used by most of scientists have been developed in high level programming languages, so that they could be executed on common Personal Computers (PCs). The main drawback of these implementations is that though neural networks are intrinsically parallel systems, simulations are executed on a Central Processing Unit (CPU), a processor designed for the execution of sequential programs on a Single Instruction Single Data (SISD) basis. As a result, these heavy programs can take hours or even days to process large input data. For applications that require real-time processing, it is possible to develop small ad-hoc neural networks on specific hardware like Field Programmable Gate Arrays (FPGAs). However, FPGA-based realization of ANNs is somewhat expensive and involves extra design overheads (Zhu & Sutton, 2003). Using dedicated hardware to do machine learning was typically expensive; results could not be shared with other researchers and hardware became obsolete within a few years. This situation has changed recently with the popularization of Graphics Processing Units (GPUs) as low-cost and high-level programmable hardware platforms. GPUs are being increasingly used for speeding up computations in many research fields following a Stream Processing Model (Owens, Luebke, Govindaraju, Harris, Krüger, Lefohn & Purcell, 2007). This article presents a GPU-based parallel implementation of a Fuzzy ART ANN, which can be used both for training and testing processes. Fuzzy ART is an unsupervised neural classifier capable of incremental learning, widely used in a universe of applications as medical sciences, economics and finance, engineering and computer science. CPU-based implementations of Fuzzy ART lack efficiency and cannot be used for testing purposes in real-time applications. The GPU implementation of Fuzzy ART presented in this article speeds up computations more than 30 times with respect to a CPU-based C/C++ development when executed on an NVIDIA 7800 GT GPU.

Download Full-text

Stream Processing of a Neural Classifier I

Gaming and Simulations ◽

10.4018/978-1-60960-195-9.ch419 ◽

2011 ◽

pp. 1200-1207

Author(s):

M. Martínez-Zarzuela ◽

F. J. Díaz Pernas ◽

D. González Ortega ◽

J. F. Díez Higuera ◽

M. Antón Rodríguez

Keyword(s):

Neural Networks ◽

Real Time ◽

Stream Processing ◽

Neural Processing ◽

Real Time Processing ◽

Fuzzy Art ◽

Neural Classifier ◽

Single Data ◽

Artificial Neural Network Ann ◽

High Level

An Artificial Neural Network (ANN) is a computational structure inspired by the study of biological neural processing. Although neurons are considered as very simple computation units, inside the nervous system, an incredible amount of widely inter-connected neurons can process huge amounts of data working in a parallel fashion. There are many different types of ANNs, from relatively simple to very complex, just as there are many theories on how biological neural processing works. However, execution of ANNs is always a heavy computational task. Important kinds of ANNs are those devoted to pattern recognition such as Multi-Layer Perceptron (MLP), Self-Organizing Maps (SOM) or Adaptive Resonance Theory (ART) classifiers (Haykin, 2007). Traditional implementations of ANNs used by most of scientists have been developed in high level programming languages, so that they could be executed on common Personal Computers (PCs). The main drawback of these implementations is that though neural networks are intrinsically parallel systems, simulations are executed on a Central Processing Unit (CPU), a processor designed for the execution of sequential programs on a Single Instruction Single Data (SISD) basis. As a result, these heavy programs can take hours or even days to process large input data. For applications that require real-time processing, it is possible to develop small ad-hoc neural networks on specific hardware like Field Programmable Gate Arrays (FPGAs). However, FPGA-based realization of ANNs is somewhat expensive and involves extra design overheads (Zhu & Sutton, 2003). Using dedicated hardware to do machine learning was typically expensive; results could not be shared with other researchers and hardware became obsolete within a few years. This situation has changed recently with the popularization of Graphics Processing Units (GPUs) as low-cost and high-level programmable hardware platforms. GPUs are being increasingly used for speeding up computations in many research fields following a Stream Processing Model (Owens, Luebke, Govindaraju, Harris, Krüger, Lefohn & Purcell, 2007). This article presents a GPU-based parallel implementation of a Fuzzy ART ANN, which can be used both for training and testing processes. Fuzzy ART is an unsupervised neural classifier capable of incremental learning, widely used in a universe of applications as medical sciences, economics and finance, engineering and computer science. CPU-based implementations of Fuzzy ART lack efficiency and cannot be used for testing purposes in real-time applications. The GPU implementation of Fuzzy ART presented in this article speeds up computations more than 30 times with respect to a CPU-based C/C++ development when executed on an NVIDIA 7800 GT GPU.

Download Full-text

Revision Processing in a Stream Processing Engine: A High-Level Design

22nd International Conference on Data Engineering (ICDE'06) ◽

10.1109/icde.2006.130 ◽

2006 ◽

Cited By ~ 26

Author(s):

E. Ryvkina ◽

A.S. Maskey ◽

M. Cherniack ◽

S. Zdonik

Keyword(s):

Stream Processing ◽

Level Design ◽

High Level ◽

Processing Engine

Download Full-text

Composing high-level stream processing pipelines

Journal Of Big Data ◽

10.1186/s40537-020-00353-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tanmaya Mahapatra

Keyword(s):

Data Analytics ◽

State Of The Art ◽

Stream Processing ◽

Detailed Comparison ◽

Application Development ◽

Actor Model ◽

Data Transformations ◽

Simulation Scenario ◽

Iot Devices ◽

High Level

Abstract The growing number of Internet of Things (IoT) devices provide a massive pool of sensing data. However, turning data into actionable insights is not a trivial task, especially in the context of IoT, where application development itself is complex. The process entails working with heterogeneous devices via various communication protocols to co-ordinate and fetch datasets, followed by a series of data transformations. Graphical mashup tools, based on the principles of flow-based programming paradigm, operating at a higher-level of abstraction are in widespread use to support rapid prototyping of IoT applications. Nevertheless, the current state-of-the-art mashup tools suffer from several architectural limitations which prevent composing in-flow data analytics pipelines. In response to this, the paper contributes by (i) designing novel flow-based programming concepts based on the actor model to support data analytics pipelines in mashup tools, prototyping the ideas in a new mashup tool called aFlux and providing a detailed comparison with the existing state-of-the-art and (ii) enabling easy prototyping of streaming applications in mashup tools by abstracting the behavioural configurations of stream processing via graphical flows and validating the ease as well as the effectiveness of composing stream processing pipelines from an end-user perspective in a traffic simulation scenario.

Download Full-text

Composing High-level Stream Processing Pipelines

10.21203/rs.3.rs-33951/v1 ◽

2020 ◽

Author(s):

Tanmaya Mahapatra

Keyword(s):

Data Analytics ◽

State Of The Art ◽

Stream Processing ◽

Detailed Comparison ◽

Application Development ◽

Actor Model ◽

Data Transformations ◽

Simulation Scenario ◽

Iot Devices ◽

High Level

Abstract The growing number of Internet of Things (IoT) devices provide a massive pool of sensing data. However, turning data into actionable insights is not a trivial task, especially in the context of IoT, where application development itself is complex. The process entails working with heterogeneous devices via various communication protocols to co-ordinate and fetch datasets, followed by a series of data transformations. Graphical mashup tools, based on the principles of flow-based programming paradigm, operating at a higher-level of abstraction are in widespread use to support rapid prototyping of IoT applications. Nevertheless, the current state-of-the-art mashup tools suffer from several architectural limitations which prevent composing in-flow data analytics pipelines. In response to this, the paper contributes by (i) designing novel flow-based programming concepts based on the actor model to support data analytics pipelines in mashup tools, prototyping the ideas in a new mashup tool called aFlux and providing a detailed comparison with the existing state-of-the-art and (ii) enabling easy prototyping of streaming applications in mashup tools by abstracting the behavioural configurations of stream processing via graphical flows and validating the ease as well as the effectiveness of composing stream processing pipelines from an end-user perspective in a traffic simulation scenario.

Download Full-text