Special issue on Machine learning approaches and challenges of missing data in the era of big data

The presence of large-scale data systems can be felt, consciously or not, in almost every facet of modern life, whether through the simple act of selecting travel options online, purchasing products from online retailers, or navigating through the streets of an unfamiliar neighborhood using global positioning system (GPS) mapping. These systems operate through the momentum of big data, a term introduced by data scientists to describe a data-rich environment enabled by a superconvergence of advanced computer-processing speeds and storage capacities; advanced connectivity between people and devices through the Internet; the ubiquity of smart, mobile devices and wireless sensors; and the creation of accelerated data flows among systems in the global economy. Some researchers have suggested that big data represents the so-called fourth paradigm in science, wherein the first paradigm was marked by the evolution of the experimental method, the second was brought about by the maturation of theory, the third was marked by an evolution of statistical methodology as enabled by computational technology, while the fourth extended the benefits of the first three, but also enabled the application of novel machine-learning approaches to an evidence stream that exists in high volume, high velocity, high variety, and differing levels of veracity. In public health and medicine, the emergence of big data capabilities has followed naturally from the expansion of data streams from genome sequencing, protein identification, environmental surveillance, and passive patient sensing. In 2001, the National Committee on Vital and Health Statistics published a road map for connecting these evidence streams to each other through a national health information infrastructure. Since then, the road map has spurred national investments in electronic health records (EHRs) and motivated the integration of public surveillance data into analytic platforms for health situational awareness. More recently, the boom in consumer-oriented mobile applications and wireless medical sensing devices has opened up the possibility for mining new data flows directly from altruistic patients. In the broader public communication sphere, the ability to mine the digital traces of conversation on social media presents an opportunity to apply advanced machine learning algorithms as a way of tracking the diffusion of risk communication messages. In addition to utilizing big data for improving the scientific knowledge base in risk communication, there will be a need for health communication scientists and practitioners to work as part of interdisciplinary teams to improve the interfaces to these data for professionals and the public. Too much data, presented in disorganized ways, can lead to what some have referred to as “data smog.” Much work will be needed for understanding how to turn big data into knowledge, and just as important, how to turn data-informed knowledge into action.

Download Full-text

Method for a cloud based remaining-service-life-prediction for vehicle-gearboxes based on big-data-analysis and machine learning

Forschung im Ingenieurwesen ◽

10.1007/s10010-020-00415-0 ◽

2020 ◽

Vol 84 (4) ◽

pp. 305-314

Author(s):

Daniel Vietze ◽

Michael Hein ◽

Karsten Stahl

Keyword(s):

Machine Learning ◽

Big Data ◽

Service Life ◽

Operating Time ◽

The Other ◽

Learning Approaches ◽

State Of Health ◽

Remaining Service Life ◽

Other Hand ◽

The One

AbstractMost vehicle-gearboxes operating today are designed for a limited service-life. On the one hand, this creates significant potential for decreasing cost and mass as well as reduction of the carbon-footprint. On the other hand, this causes a rising risk of failure with increasing operating time of the machine. Especially if a failure can result in a high economic loss, this fact creates a conflict of goals. On the one hand, the machine should only be maintained or replaced when necessary and, on the other hand, the probability of a failure increases with longer operating times. Therefore, a method is desirable, making it possible to predict the remaining service-life and state of health with as little effort as possible.Centerpiece of gearboxes are the gears. A failure of these components usually causes the whole gearbox to fail. The fatigue life analysis deals with the dimensioning of gears according to the expected loads and the required service-life. Unfortunately, there is very little possibility to validate the technical design during operation, today. Hence, the goal of this paper is to present a method, enabling the prediction of the remaining-service-life and state-of-health of gears during operation. Within this method big-data and machine-learning approaches are used. The method is designed in a way, enabling an easy transfer to other machine elements and kinds of machinery.

Download Full-text

Can a machine understand real estate pricing? – Evaluating machine learning approaches with big data

10.15396/eres2019_232 ◽

2019 ◽

Author(s):

Marcelo Cajias

Keyword(s):

Machine Learning ◽

Big Data ◽

Real Estate ◽

Learning Approaches

Download Full-text

Enabling Cognitive Smart Cities Using Big Data and Machine Learning: Approaches and Challenges

IEEE Communications Magazine ◽

10.1109/mcom.2018.1700298 ◽

2018 ◽

Vol 56 (2) ◽

pp. 94-101 ◽

Cited By ~ 81

Author(s):

Mehdi Mohammadi ◽

Ala Al-Fuqaha

Keyword(s):

Machine Learning ◽

Big Data ◽

Smart Cities ◽

Learning Approaches

Download Full-text

Comprehensive Contemplation of Probabilistic Aspects in Intelligent Analytics

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2020010108 ◽

2020 ◽

Vol 11 (1) ◽

pp. 116-141 ◽

Cited By ~ 2

Author(s):

Neeti Sangwan ◽

Vishal Bhatnagar

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Big Data Analysis ◽

Future Research ◽

Learning Approaches ◽

Text Analytics ◽

Review Of Literature ◽

Classification Framework

In Big Data analysis, the application of machine learning has proven to be a revolutionary. The systematic review of literature shows that research has been carried out on the domain of big data analytics particularly text analytics with the inclusion of machine learning approaches. This extensive survey deals with the data at hand that provides different ways and issues while combining the machine learning approaches with the text. During the course of the survey, various publications in the field of synchronous application of machine learning in text analytics were searched and studied. Classification framework is proposed as the contribution of machine learning in text analytics. A classification framework represented the various application areas to motivate researchers for future research on the application of two emerging technologies.

Download Full-text

Machine learning approaches on map reduce for Big Data analytics

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) ◽

10.1109/icgciot.2015.7380512 ◽

2015 ◽

Author(s):

J V N Lakshmi ◽

Ananthi Sheshasaayee

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Map Reduce ◽

Learning Approaches

Download Full-text

Diabetes prediction by using Big Data Tool and Machine Learning Approaches

2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) ◽

10.1109/iciss49785.2020.9315866 ◽

2020 ◽

Author(s):

Srinivasa Rao Swarna ◽

Sumati Boyapati ◽

Pooja Dixit ◽

Rashmi Agrawal

Keyword(s):

Machine Learning ◽

Big Data ◽

Learning Approaches ◽

Diabetes Prediction

Download Full-text

Big Data Analytics to Increase the Agricultural Yield by Using Machine Learning Approaches

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2018.7.s1.1799 ◽

2018 ◽

Vol 7 (S1) ◽

pp. 82-86

Author(s):

V. Sudha ◽

S. Mohan ◽

S. Arivalagan

Keyword(s):

Machine Learning ◽

Big Data ◽

Agricultural Research ◽

Research Field ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Modern Trend ◽

Cropping Pattern ◽

Learning Approaches ◽

Learning Techniques

Agriculture is the backbone of Indian economy. Big data are emerging précised and viable analytical tool in agricultural research field. This review paper facilitates the farmers in selecting the right crops and appropriate cropping pattern for a particular locality. A modern trend in the Agriculture domain has made people realize the importance of big data. It provides a methodology for facing challenges in agricultural production, by applying this Algorithm, using machine learning techniques. The different machine learning techniques survey is presented in this paper to realize enhanced monitory benefits in a particular area. A study of machine learning algorithms for big data Analytic is also done and presented in this paper.

Download Full-text

A Survey On Missing Data in Machine Learning

10.21203/rs.3.rs-535520/v1 ◽

2021 ◽

Author(s):

Tlamelo Emmanuel ◽

Thabiso Maupong ◽

Dimane Mpoeleng ◽

Thabo Semong ◽

Mphago Banyatsang ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Human Error ◽

Missing Values ◽

Nearest Neighbor ◽

Research Direction ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approaches ◽

K Nearest Neighbor

Abstract Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

Download Full-text