Configurable Distributed Data Management for the Internet of the Things

One of the main challenges in modern Internet of Things (IoT) systems is the efficient collection, routing and management of data streams from heterogeneous sources, including sources with high ingestion rates. Despite the existence of various IoT data streaming frameworks, there is still no easy way for collecting and routing IoT streams in efficient and configurable ways that are easy to be implemented and deployed in realistic environments. In this paper, we introduce a programmable engine for Distributed Data Analytics (DDA), which eases the task of collecting IoT streams from different sources and accordingly, routing them to appropriate consumers. The engine provides also the means for preprocessing and analysis of data streams, which are two of the most important tasks in Big Data analytics applications. At the heart of the engine lies a Domain Specific Language (DSL) that enables the zero-programming definition of data routing and preprocessing tasks. This DSL is outlined in the paper, along with the middleware that supports its runtime execution. As part of the paper, we present the architecture of the engine, as well as the digital models that it uses for modelling data streams in the digital world. We also discuss the validation of the DDA in several data intensive IoT use cases in industrial environments, including use cases in pilot productions lines and in several real-life manufacturing environments. The latter manifest the configurability, programmability and flexibility of the DDA engine, as well as its ability to support practical applications.

Download Full-text

Leveraging Distributed Data Over Big Data Analytics Platform for Healthcare Services

2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI) ◽

10.1109/icoei.2018.8553827 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ramesh Mande ◽

G. JayaLakshmi ◽

Kalyan Chakravarti Yelavarti

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Healthcare Services ◽

Distributed Data

Download Full-text

BigSift: automated debugging of big data analytics in data-intensive scalable computing

Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018 ◽

10.1145/3236024.3264586 ◽

2018 ◽

Cited By ~ 3

Author(s):

Muhammad Ali Gulzar ◽

Siman Wang ◽

Miryung Kim

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Scalable Computing ◽

Data Intensive ◽

Automated Debugging

Download Full-text

Coding Inequality

Predict and Surveil ◽

10.1093/oso/9780190684099.003.0006 ◽

2020 ◽

pp. 100-117

Author(s):

Sarah Brayne

Keyword(s):

Decision Making ◽

Big Data ◽

Data Analytics ◽

Social Inequalities ◽

Big Data Analytics ◽

The Other ◽

Surveillance Network ◽

Data Intensive ◽

The One ◽

Reproduction Of Inequality

This chapter looks at the promise and peril of police use of big data analytics for inequality. On the one hand, big data analytics may be a means by which to ameliorate persistent inequalities in policing. Data can be used to “police the police” and replace unparticularized suspicion of racial minorities and human exaggeration of patterns with less biased predictions of risk. On the other hand, data-intensive police surveillance practices are implicated in the reproduction of inequality in at least four ways: by deepening the surveillance of individuals already under suspicion, codifying a secondary surveillance network of individuals with no direct police contact, widening the criminal justice dragnet unequally, and leading people to avoid institutions that collect data and are fundamental to social integration. Crucially, as currently implemented, “data-driven” decision-making techwashes, both obscuring and amplifying social inequalities under a patina of objectivity.

Download Full-text

Big Data Analytics Using Public Cloud Infrastructure: Use Cases and Cost Economics

2015 International Conference on Computational Intelligence and Communication Networks (CICN) ◽

10.1109/cicn.2015.159 ◽

2015 ◽

Author(s):

Sanjay Deshmukh ◽

Shailja Sumeet

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Use Cases ◽

Public Cloud ◽

Cloud Infrastructure

Download Full-text

Use cases and challenges in telecom big data analytics

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2016.20 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 7

Author(s):

Chung-Min Chen

Keyword(s):

Big Data ◽

Data Analytics ◽

Driving Forces ◽

Big Data Analytics ◽

Use Cases ◽

Telecommunication Industry ◽

New Research

This paper examines the driving forces of big data analytics in the telecom domain and the benefits it offers. We provide example use cases of big data analytics and the associated challenges, with the hope to inspire new research ideas that can eventually benefit the practice of the telecommunication industry.

Download Full-text

dispel4py: A Python framework for data-intensive scientific computing

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016649766 ◽

2016 ◽

Vol 31 (4) ◽

pp. 316-334 ◽

Cited By ~ 6

Author(s):

Rosa Filguiera ◽

Amrey Krause ◽

Malcolm Atkinson ◽

Iraklis Klampanos ◽

Alexander Moreno

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Local Development ◽

Distributed Data ◽

Data Streaming ◽

Data Intensive ◽

Distributed Computing Infrastructures ◽

Python Programming ◽

Smooth Transitions

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. These combine the familiarity of Python programming with the scalability of workflows. Data streaming is used to gain performance, rapid prototyping and applicability to live observations. dispel4py enables scientists to focus on their scientific goals, avoiding distracting details and retaining flexibility over the computing infrastructure they use. The implementation, therefore, has to map dispel4py abstract workflows optimally onto target platforms chosen dynamically. We present four dispel4py mappings: Apache Storm, message-passing interface (MPI), multi-threading and sequential, showing two major benefits: a) smooth transitions from local development on a laptop to scalable execution for production work, and b) scalable enactment on significantly different distributed computing infrastructures. Three application domains are reported and measurements on multiple infrastructures show the optimisations achieved; they have provided demanding real applications and helped us develop effective training. The dispel4py.org is an open-source project to which we invite participation. The effective mapping of dispel4py onto multiple target infrastructures demonstrates exploitation of data-intensive and high-performance computing (HPC) architectures and consistent scalability.

Download Full-text

A Survey on Big Data Analytics Using HADOOP

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s3.2091 ◽

2019 ◽

Vol 8 (S3) ◽

pp. 35-40

Author(s):

S. Mamatha ◽

T. Sudha

Keyword(s):

Big Data ◽

Social Networking Sites ◽

Data Analytics ◽

Business Processes ◽

Big Data Analytics ◽

Large Data ◽

Structured Data ◽

Map Reduce ◽

Data Set ◽

Digital World

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.

Download Full-text

Smartic: A smart tool for Big Data analytics and IoT

F1000Research ◽

10.12688/f1000research.73613.1 ◽

2022 ◽

Vol 11 ◽

pp. 17

Author(s):

Shohel Sayeed ◽

Abu Fuad Ahmad ◽

Tan Choo Peng

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Model Performance ◽

Principal Component ◽

Missing Value ◽

Missing Value Imputation ◽

Standard Data ◽

Digital World ◽

Massive Scale

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.

Download Full-text

Not Your PAPAS’ Problem—Users and Ethical Use Cases in the Big Data Analytics Age: A Rejoinder to Richardson, Petter, and Carter

Communications of the Association for Information Systems ◽

10.17705/1cais.04921 ◽

2021 ◽

Vol 49 (1) ◽

pp. 462-467

Author(s):

M. Lynne Markus ◽

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Use Cases

Download Full-text

Big Data in Operation Management

Advances in Business Information Systems and Analytics - Applied Big Data Analytics in Operations Management ◽

10.4018/978-1-5225-0886-1.ch001 ◽

2016 ◽

pp. 1-29

Author(s):

Arushi Jain ◽

Vishal Bhatnagar

Keyword(s):

Big Data ◽

Data Management ◽

Data Analytics ◽

Big Data Analytics ◽

Operation Management ◽

Use Cases ◽

Revolutionary Change

The word big data analytics have been increased substantially these days, one of the most prominent reasons is to predict the behavior of the customer purchase. This analysis helps to understand what customer wants to purchase, where they want to go, what they want to eat etc. So that valuable insights can be converted into actions. The knowledge thus gained helps in understanding the needs of every customer individually so that it becomes easier to do the business with them. This is the revolutionary change to build a customer-centric business. To build a customer centric business an organization must be observant about what customer is doing, must keep a record about what customer is purchasing and lastly should discover the insights to maximum the profit for customer. In this chapter we discussed about various approaches to big data management and the use cases where these approaches can be applied successfully.

Download Full-text