A Survey of Challenges Facing Streaming Data

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.

Download Full-text

Analysis of Data Stream Processing At Edge Layer for Internet of Things

Journal of ISMAC - June 2019 ◽

10.36548/jismac.2020.1.003 ◽

2020 ◽

Vol 2 (1) ◽

pp. 26-37

Author(s):

Dr. Pasumponpandian

Keyword(s):

Internet Of Things ◽

Data Streams ◽

Data Stream ◽

Smart Cities ◽

Stream Processing ◽

Middle Layer ◽

Cloud Services ◽

Decentralized Systems ◽

Data Stream Processing ◽

Edge Layer

The progress of internet of things at a rapid pace and simultaneous development of the technologies and the processing capabilities has paved way for the development of decentralized systems that are relying on cloud services. Though the decentralized systems are founded on cloud complexities still prevail in transferring all the information’s that are been sensed through the IOT devices to the cloud. This because of the huge streams of information’s gathered by certain applications and the expectation to have a timely response, incurring minimized delay, computing energy and enhanced reliability. So this kind of decentralization has led to the development of middle layer between the cloud and the IOT, and was termed as the Edge layer, meaning bringing down the service of the cloud to the user edge. The paper puts forth the analysis of the data stream processing in the edge layer taking in the complexities involved in the computing the data streams of IOT in an edge layer and puts forth the real time analytics in the edge layer to examine the data streams of the internet of things offering a data- driven insight for parking system in the smart cities.

Download Full-text

A Review on Big Data Stream Processing Applications: Contributions, Benefits, and Limitations

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.5.4.737 ◽

2021 ◽

Vol 5 (4) ◽

pp. 456

Author(s):

Shaimaa Safaa Ahmed Alwaisi ◽

Maan Nawaf Abbood ◽

Luma Fayeq Jalil ◽

Shahreen Kasim ◽

Mohd Farhan Mohd Fudzee ◽

...

Keyword(s):

Big Data ◽

Data Stream ◽

Learning Algorithm ◽

Stream Processing ◽

Streaming Data ◽

Data Streaming ◽

Data Stream Processing ◽

Efficient Processing ◽

Suitable Framework ◽

Online Streaming

The amount of data in our world has been rapidly keep growing from time to time. In the era of big data, the efficient processing and analysis of big data using machine learning algorithm is highly required, especially when the data comes in form of streams. There is no doubt that big data has become an important source of information and knowledge in making decision process. Nevertheless, dealing with this kind of data comes with great difficulties; thus, several techniques have been used in analyzing the data in the form of streams. Many techniques have been proposed and studied to handle big data and give decisions based on off-line batch analysis. Today, we need to make a constructive decision based on online streaming data analysis. Many researchers in recent years proposed some different kind of frameworks for processing the big data streaming. In this work, we explore and present in detail some of the recent achievements in big data streaming in term of contributions, benefits, and limitations. As well as some of recent platforms suitable to be used for big data streaming analytics. Moreover, we also highlight several issues that will be faced in big data stream processing. In conclusion, it is hoped that this study will assist the researchers in choosing the best and suitable framework for big data streaming projects.

Download Full-text

QoS-Aware Approximate Query Processing for Smart Cities Spatial Data Streams

Sensors ◽

10.3390/s21124160 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4160

Author(s):

Isam Mashhour Al Jawarneh ◽

Paolo Bellavista ◽

Antonio Corradi ◽

Luca Foschini ◽

Rebecca Montanari

Keyword(s):

Data Streams ◽

Spatial Data ◽

Data Stream ◽

Smart Cities ◽

Stream Processing ◽

Processing System ◽

Strategic Decision ◽

Approximate Query Processing ◽

Data Stream Processing ◽

Mobility Data

Large amounts of georeferenced data streams arrive daily to stream processing systems. This is attributable to the overabundance of affordable IoT devices. In addition, interested practitioners desire to exploit Internet of Things (IoT) data streams for strategic decision-making purposes. However, mobility data are highly skewed and their arrival rates fluctuate. This nature poses an extra challenge on data stream processing systems, which are required in order to achieve pre-specified latency and accuracy goals. In this paper, we propose ApproxSSPS, which is a system for approximate processing of geo-referenced mobility data, at scale with quality of service guarantees. We focus on stateful aggregations (e.g., means, counts) and top-N queries. ApproxSSPS features a controller that interactively learns the latency statistics and calculates proper sampling rates to meet latency or/and accuracy targets. An overarching trait of ApproxSSPS is its ability to strike a plausible balance between latency and accuracy targets. We evaluate ApproxSSPS on Apache Spark Structured Streaming with real mobility data. We also compared ApproxSSPS against a state-of-the-art online adaptive processing system. Our extensive experiments prove that ApproxSSPS can fulfill latency and accuracy targets with varying sets of parameter configurations and load intensities (i.e., transient peaks in data loads versus slow arriving streams). Moreover, our results show that ApproxSSPS outperforms the baseline counterpart by significant magnitudes. In short, ApproxSSPS is a novel spatial data stream processing system that can deliver real accurate results in a timely manner, by dynamically specifying the limits on data samples.

Download Full-text

Design and Evaluation of an Autonomous Load Balancing System for Mobile Data Stream Processing Based On a Data Centric Publish Subscribe Approach

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/ijaras.2014070101 ◽

2014 ◽

Vol 5 (3) ◽

pp. 1-19 ◽

Cited By ~ 4

Author(s):

Rafael Oliveira Vasconcelos ◽

Markus Endler ◽

Berto de Tácio Pereira Gomes ◽

Francisco José da Silva e Silva

Keyword(s):

Load Balancing ◽

Data Streams ◽

Intelligent Transportation Systems ◽

Data Stream ◽

Stream Processing ◽

Industrial Process ◽

Transportation Systems ◽

Mobile Nodes ◽

Data Stream Processing ◽

Mobile Data

Several new applications of mobile computing environments, such as Intelligent Transportation Systems, Fleet Management and Logistics, and integrated Industrial Process Automation share the requirement of remote monitoring and high performance processing of huge data streams produced by large sets of mobile nodes. Two key requirements for the deployment and operation of such mobile infrastructures are the handling of large and variable numbers of wireless connections to the monitored mobile nodes regardless of their current use or locations, and to automatically adapt to variations in the volume of the mobile data streams. This article describes the design, implementation, and evaluation of an autonomic mechanism for load balancing of mobile data streams. The autonomic capability has been incorporated into a scalable middleware system based on a Data Centric Publish Subscribe approach using the OMG Data Distribution Service (DDS) standard and aimed at real-time and adaptive handling of mobile connectivity and data stream processing for great sets of mobile nodes. A significant amount of evaluation experiments of the proposed infrastructure is presented, reinforcing its viability and the benefits arising from the use of an autonomic approach to handle the requirements of high variability and scalability.

Download Full-text

GEOSPATIAL DATA STREAM PROCESSING IN PYTHON USING FOSS4G COMPONENTS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b7-931-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 931-937

Author(s):

G. McFerren ◽

T. van Zyl

Keyword(s):

Data Streams ◽

Data Model ◽

Data Stream ◽

Stream Processing ◽

Geospatial Data ◽

Common Data Model ◽

Spatial Relationships ◽

Software Components ◽

Data Stream Processing ◽

Software Libraries

One viewpoint of current and future IT systems holds that there is an increase in the scale and velocity at which data are acquired and analysed from heterogeneous, dynamic sources. In the earth observation and geoinformatics domains, this process is driven by the increase in number and types of devices that report location and the proliferation of assorted sensors, from satellite constellations to oceanic buoy arrays. Much of these data will be encountered as self-contained messages on data streams - continuous, infinite flows of data. Spatial analytics over data streams concerns the search for spatial and spatio-temporal relationships within and amongst data “on the move”. In spatial databases, queries can assess a store of data to unpack spatial relationships; this is not the case on streams, where spatial relationships need to be established with the incomplete data available. Methods for spatially-based indexing, filtering, joining and transforming of streaming data need to be established and implemented in software components. This article describes the usage patterns and performance metrics of a number of well known FOSS4G Python software libraries within the data stream processing paradigm. In particular, we consider the RTree library for spatial indexing, the Shapely library for geometric processing and transformation and the PyProj library for projection and geodesic calculations over streams of geospatial data. We introduce a message oriented Python-based geospatial data streaming framework called Swordfish, which provides data stream processing primitives, functions, transports and a common data model for describing messages, based on the Open Geospatial Consortium Observations and Measurements (O&M) and Unidata Common Data Model (CDM) standards. We illustrate how the geospatial software components are integrated with the Swordfish framework. Furthermore, we describe the tight temporal constraints under which geospatial functionality can be invoked when processing high velocity, potentially infinite geospatial data streams. The article discusses the performance of these libraries under simulated streaming loads (size, complexity and volume of messages) and how they can be deployed and utilised with Swordfish under real load scenarios, illustrated by a set of Vessel Automatic Identification System (AIS) use cases. We conclude that the described software libraries are able to perform adequately under geospatial data stream processing scenarios - many real application use cases will be handled sufficiently by the software.

Download Full-text