scholarly journals Configurable Distributed Data Management for the Internet of the Things

Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 360 ◽  
Author(s):  
Nikos Kefalakis ◽  
Aikaterini Roukounaki ◽  
John Soldatos

One of the main challenges in modern Internet of Things (IoT) systems is the efficient collection, routing and management of data streams from heterogeneous sources, including sources with high ingestion rates. Despite the existence of various IoT data streaming frameworks, there is still no easy way for collecting and routing IoT streams in efficient and configurable ways that are easy to be implemented and deployed in realistic environments. In this paper, we introduce a programmable engine for Distributed Data Analytics (DDA), which eases the task of collecting IoT streams from different sources and accordingly, routing them to appropriate consumers. The engine provides also the means for preprocessing and analysis of data streams, which are two of the most important tasks in Big Data analytics applications. At the heart of the engine lies a Domain Specific Language (DSL) that enables the zero-programming definition of data routing and preprocessing tasks. This DSL is outlined in the paper, along with the middleware that supports its runtime execution. As part of the paper, we present the architecture of the engine, as well as the digital models that it uses for modelling data streams in the digital world. We also discuss the validation of the DDA in several data intensive IoT use cases in industrial environments, including use cases in pilot productions lines and in several real-life manufacturing environments. The latter manifest the configurability, programmability and flexibility of the DDA engine, as well as its ability to support practical applications.

2020 ◽  
pp. 100-117
Author(s):  
Sarah Brayne

This chapter looks at the promise and peril of police use of big data analytics for inequality. On the one hand, big data analytics may be a means by which to ameliorate persistent inequalities in policing. Data can be used to “police the police” and replace unparticularized suspicion of racial minorities and human exaggeration of patterns with less biased predictions of risk. On the other hand, data-intensive police surveillance practices are implicated in the reproduction of inequality in at least four ways: by deepening the surveillance of individuals already under suspicion, codifying a secondary surveillance network of individuals with no direct police contact, widening the criminal justice dragnet unequally, and leading people to avoid institutions that collect data and are fundamental to social integration. Crucially, as currently implemented, “data-driven” decision-making techwashes, both obscuring and amplifying social inequalities under a patina of objectivity.


Author(s):  
Chung-Min Chen

This paper examines the driving forces of big data analytics in the telecom domain and the benefits it offers. We provide example use cases of big data analytics and the associated challenges, with the hope to inspire new research ideas that can eventually benefit the practice of the telecommunication industry.


Author(s):  
Rosa Filguiera ◽  
Amrey Krause ◽  
Malcolm Atkinson ◽  
Iraklis Klampanos ◽  
Alexander Moreno

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.


2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 17
Author(s):  
Shohel Sayeed ◽  
Abu Fuad Ahmad ◽  
Tan Choo Peng

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.


Author(s):  
Arushi Jain ◽  
Vishal Bhatnagar

The word big data analytics have been increased substantially these days, one of the most prominent reasons is to predict the behavior of the customer purchase. This analysis helps to understand what customer wants to purchase, where they want to go, what they want to eat etc. So that valuable insights can be converted into actions. The knowledge thus gained helps in understanding the needs of every customer individually so that it becomes easier to do the business with them. This is the revolutionary change to build a customer-centric business. To build a customer centric business an organization must be observant about what customer is doing, must keep a record about what customer is purchasing and lastly should discover the insights to maximum the profit for customer. In this chapter we discussed about various approaches to big data management and the use cases where these approaches can be applied successfully.


Sign in / Sign up

Export Citation Format

Share Document