Harnessing Machine Learning and Big Data Analytics for Real-World Applications: A Comprehensive Survey

In discussions on the General Data Protection Regulation (GDPR), anonymisation and deletion are frequently mentioned as suitable technical and organisational methods (TOMs) for privacy protection. The major problem of distortion in machine learning environments, as well as related issues with respect to privacy, are rarely mentioned. The Big Data Analytics project addresses these issues.

Download Full-text

A Comprehensive Survey on Machine Learning-Based Big Data Analytics for IoT-Enabled Smart Healthcare System

Mobile Networks and Applications ◽

10.1007/s11036-020-01700-6 ◽

2021 ◽

Author(s):

Wei Li ◽

Yuanbo Chai ◽

Fazlullah Khan ◽

Syed Rooh Ullah Jan ◽

Sahil Verma ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Healthcare System ◽

Data Analytics ◽

Big Data Analytics ◽

Smart Healthcare ◽

Comprehensive Survey

Download Full-text

A Framework Using Binary Cross Entropy - Gradient Boost Hybrid Ensemble Classifier for Imbalanced Data Classification

Webology ◽

10.14704/web/v18i1/web18076 ◽

2021 ◽

Vol 18 (1) ◽

pp. 104-120

Author(s):

S. Josephine Isabella ◽

Sujatha Srinivasan ◽

G. Suseendran

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Imbalanced Data ◽

Ensemble Classifier ◽

Research Field ◽

Cross Entropy ◽

Target Variable ◽

Real World Applications

During the big data era, there is a continuous occurrence of developing the learning of imbalanced data gives a pathway for the research field along with data mining and machine learning concepts. In recent years, Big Data and Big Data Analytics having high eminence due to data exploration by many of the applications in real-time. Using machine learning will be a greater solution to solve the difficulties that occur when we learn the imbalanced data. Many real-world applications have to predict the solutions for highly imbalanced datasets with the imbalanced target variable. In most of the cases, the target variable assigns or having the least occurrences of the target values due to the sort of imbalances associated with things or events strongly applicable for the users who avail the solutions (for example, results of stock changes, fraud finding, network security, etc.). The expansion of the availability of data due to the rise of big data from the network systems such as security, internet transactions, finance manipulations, surveillance of CCTV or other devices makes the chance to the critical study of insufficient knowledge from the imbalance data when supporting the decision making processes. The data imbalance occurrence is a challenge to the research field. In recent trends, there is more data level and an algorithm level method is being upgraded constantly and leads to develop a new hybrid framework to solve this problem in classification. Classifying the imbalanced data is a challenging task in the field of big data analytics. This study mainly concentrates on the problem existing in most cases of real-world applications as an imbalance occurs in the data. This difficulty present due to the data distribution with skewed nature. We have analyses the data imbalance and find the solution. This paper concentrates mainly on finding a better solution to this nature of the problem to be solved with the proposed framework using a hybrid ensemble classifier based on the Binary Cross-Entropy method as loss function along with the Gradient Boost Algorithm.

Download Full-text

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Symmetry ◽

10.3390/sym10100485 ◽

2018 ◽

Vol 10 (10) ◽

pp. 485 ◽

Cited By ~ 6

Author(s):

Muhammad Ashfaq Khan ◽

Md. Rezaul Karim ◽

Yangwoo Kim

Keyword(s):

Machine Learning ◽

Big Data ◽

Real World ◽

Data Analytics ◽

Short Term Memory ◽

Big Data Analytics ◽

Short Term ◽

Two Stage ◽

Term Memory ◽

Long Short Term Memory

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to harness these kinds of challenges, developing an efficient big data analytics framework is an important research topic. Consequently, to address these challenges by exploiting non-linear relationships from very large and high-dimensional datasets, machine learning (ML) and deep learning (DL) algorithms are being used in analytics frameworks. Apache Spark has been in use as the fastest big data processing arsenal, which helps to solve iterative ML tasks, using distributed ML library called Spark MLlib. Considering real-world research problems, DL architectures such as Long Short-Term Memory (LSTM) is an effective approach to overcoming practical issues such as reduced accuracy, long-term sequence dependency, and vanishing and exploding gradient in conventional deep architectures. In this paper, we propose an efficient analytics framework, which is technically a progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP) and LSTM, using a two-stage cascade structure in order to enhance the predictive accuracy. Our proposed architecture enables us to organize big data analytics in a scalable and efficient way. To show the effectiveness of our framework, we applied the cascading structure to two different real-life datasets to solve a multiclass and a binary classification problem, respectively. Experimental results show that our analytical framework outperforms state-of-the-art approaches with a high-level of classification accuracy.

Download Full-text

Data Driven Smart Proxy for CFD Application of Big Data Analytics & Machine Learning in Computational Fluid Dynamics, Report Two: Model Building at the Cell Level

10.2172/1431303 ◽

2018 ◽

Cited By ~ 1

Author(s):

A. Ansari ◽

S. Mohaghegh ◽

M. Shahnam ◽

J. F. Dietiker ◽

T. Li

Keyword(s):

Machine Learning ◽

Fluid Dynamics ◽

Computational Fluid Dynamics ◽

Big Data ◽

Data Analytics ◽

Model Building ◽

Big Data Analytics ◽

Data Driven ◽

Cell Level

Download Full-text

Big Data Analytics of Identifying Geochemical Anomalies Supported by Machine Learning Methods

Natural Resources Research ◽

10.1007/s11053-017-9357-0 ◽

2017 ◽

Vol 27 (1) ◽

pp. 5-13 ◽

Cited By ~ 37

Author(s):

Renguang Zuo ◽

Yihui Xiong

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Learning Methods ◽

Machine Learning Methods ◽

Geochemical Anomalies

Download Full-text

Biases in machine learning models and big data analytics: The international criminal and humanitarian law implications

International Review of the Red Cross ◽

10.1017/s1816383121000096 ◽

2020 ◽

Vol 102 (913) ◽

pp. 199-234

Author(s):

Nema Milaninia

Keyword(s):

Machine Learning ◽

Human Rights ◽

Big Data ◽

Data Analytics ◽

International Criminal ◽

Big Data Analytics ◽

International Criminal Law ◽

Gender Inequalities ◽

Humanitarian Law ◽

Mass Graves

AbstractAdvances in mobile phone technology and social media have created a world where the volume of information generated and shared is outpacing the ability of humans to review and use that data. Machine learning (ML) models and “big data” analytical tools have the power to ease that burden by making sense of this information and providing insights that might not otherwise exist. In the context of international criminal and human rights law, ML is being used for a variety of purposes, including to uncover mass graves in Mexico, find evidence of homes and schools destroyed in Darfur, detect fake videos and doctored evidence, predict the outcomes of judicial hearings at the European Court of Human Rights, and gather evidence of war crimes in Syria. ML models are also increasingly being incorporated by States into weapon systems in order to better enable targeting systems to distinguish between civilians, allied soldiers and enemy combatants or even inform decision-making for military attacks.The same technology, however, also comes with significant risks. ML models and big data analytics are highly susceptible to common human biases. As a result of these biases, ML models have the potential to reinforce and even accelerate existing racial, political or gender inequalities, and can also paint a misleading and distorted picture of the facts on the ground. This article discusses how common human biases can impact ML models and big data analytics, and examines what legal implications these biases can have under international criminal law and international humanitarian law.

Download Full-text