Comparison of Prediction Models for Delays of Trains by using Data Mining and Machine Learning Methods

2018 ◽  
Author(s):  
Dennis Leser ◽  
Matthias Wastian ◽  
Matthias Rößler ◽  
Michael Landsied
2019 ◽  
Vol 29 (1) ◽  
pp. 45-47
Author(s):  
Dennis Leser ◽  
Matthias Wastian ◽  
Matthias Rößler ◽  
Michael Landsiedl ◽  
Edmond Hajrizi

Author(s):  
Sook-Ling Chua ◽  
Stephen Marsland ◽  
Hans W. Guesgen

The problem of behaviour recognition based on data from sensors is essentially an inverse problem: given a set of sensor observations, identify the sequence of behaviours that gave rise to them. In a smart home, the behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are two main approaches to identifying behaviours from the sensor stream. One is to use a symbolic approach, which explicitly models the recognition process. Another is to use a sub-symbolic approach to behaviour recognition, which is the focus in this chapter, using data mining and machine learning methods. While there have been many machine learning methods of identifying behaviours from the sensor stream, they have generally relied upon a labelled dataset, where a person has manually identified their behaviour at each time. This is particularly tedious to do, resulting in relatively small datasets, and is also prone to significant errors as people do not pinpoint the end of one behaviour and commencement of the next correctly. In this chapter, the authors consider methods to deal with unlabelled sensor data for behaviour recognition, and investigate their use. They then consider whether they are best used in isolation, or should be used as preprocessing to provide a training set for a supervised method.


10.2196/12001 ◽  
2018 ◽  
Vol 20 (11) ◽  
pp. e12001 ◽  
Author(s):  
Quazi Abidur Rahman ◽  
Tahir Janmohamed ◽  
Meysam Pirbaglou ◽  
Hance Clarke ◽  
Paul Ritvo ◽  
...  

Author(s):  
Jaime Lynn Speiser ◽  
Kathryn E Callahan ◽  
Denise K Houston ◽  
Jason Fanning ◽  
Thomas M Gill ◽  
...  

Abstract Background Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. Method We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Results Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Conclusions Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


2018 ◽  
Author(s):  
Quazi Abidur Rahman ◽  
Tahir Janmohamed ◽  
Meysam Pirbaglou ◽  
Hance Clarke ◽  
Paul Ritvo ◽  
...  

BACKGROUND Measuring and predicting pain volatility (fluctuation or variability in pain scores over time) can help improve pain management. Perceptions of pain and its consequent disabling effects are often heightened under the conditions of greater uncertainty and unpredictability associated with pain volatility. OBJECTIVE This study aimed to use data mining and machine learning methods to (1) define a new measure of pain volatility and (2) predict future pain volatility levels from users of the pain management app, Manage My Pain, based on demographic, clinical, and app use features. METHODS Pain volatility was defined as the mean of absolute changes between 2 consecutive self-reported pain severity scores within the observation periods. The k-means clustering algorithm was applied to users’ pain volatility scores at the first and sixth month of app use to establish a threshold discriminating low from high volatility classes. Subsequently, we extracted 130 demographic, clinical, and app usage features from the first month of app use to predict these 2 volatility classes at the sixth month of app use. Prediction models were developed using 4 methods: (1) logistic regression with ridge estimators; (2) logistic regression with Least Absolute Shrinkage and Selection Operator; (3) Random Forests; and (4) Support Vector Machines. Overall prediction accuracy and accuracy for both classes were calculated to compare the performance of the prediction models. Training and testing were conducted using 5-fold cross validation. A class imbalance issue was addressed using a random subsampling of the training dataset. Users with at least five pain records in both the predictor and outcome periods (N=782 users) are included in the analysis. RESULTS k-means clustering algorithm was applied to pain volatility scores to establish a threshold of 1.6 to differentiate between low and high volatility classes. After validating the threshold using random subsamples, 2 classes were created: low volatility (n=611) and high volatility (n=171). In this class-imbalanced dataset, all 4 prediction models achieved 78.1% (611/782) to 79.0% (618/782) in overall accuracy. However, all models have a prediction accuracy of less than 18.7% (32/171) for the high volatility class. After addressing the class imbalance issue using random subsampling, results improved across all models for the high volatility class to greater than 59.6% (102/171). The prediction model based on Random Forests performs the best as it consistently achieves approximately 70% accuracy for both classes across 3 random subsamples. CONCLUSIONS We propose a novel method for measuring pain volatility. Cluster analysis was applied to divide users into subsets of low and high volatility classes. These classes were then predicted at the sixth month of app use with an acceptable degree of accuracy using machine learning methods based on the features extracted from demographic, clinical, and app use information from the first month.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 601
Author(s):  
Nelson K. Dumakor-Dupey ◽  
Sampurna Arya ◽  
Ankit Jha

Rock fragmentation in mining and construction industries is widely achieved using drilling and blasting technique. The technique remains the most effective and efficient means of breaking down rock mass into smaller pieces. However, apart from its intended purpose of rock breakage, throw, and heave, blasting operations generate adverse impacts, such as ground vibration, airblast, flyrock, fumes, and noise, that have significant operational and environmental implications on mining activities. Consequently, blast impact studies are conducted to determine an optimum blast design that can maximize the desirable impacts and minimize the undesirable ones. To achieve this objective, several blast impact estimation empirical models have been developed. However, despite being the industry benchmark, empirical model results are based on a limited number of factors affecting the outcomes of a blast. As a result, modern-day researchers are employing machine learning (ML) techniques for blast impact prediction. The ML approach can incorporate several factors affecting the outcomes of a blast, and therefore, it is preferred over empirical and other statistical methods. This paper reviews the various blast impacts and their prediction models with a focus on empirical and machine learning methods. The details of the prediction methods for various blast impacts—including their applications, advantages, and limitations—are discussed. The literature reveals that the machine learning methods are better predictors compared to the empirical models. However, we observed that presently these ML models are mainly applied in academic research.


Author(s):  
Ihor Ponomarenko ◽  
Oleksandra Lubkovska

The subject of the research is the approach to the possibility of using data science methods in the field of health care for integrated data processing and analysis in order to optimize economic and specialized processes The purpose of writing this article is to address issues related to the specifics of the use of Data Science methods in the field of health care on the basis of comprehensive information obtained from various sources. Methodology. The research methodology is system-structural and comparative analyzes (to study the application of BI-systems in the process of working with large data sets); monograph (the study of various software solutions in the market of business intelligence); economic analysis (when assessing the possibility of using business intelligence systems to strengthen the competitive position of companies). The scientific novelty the main sources of data on key processes in the medical field. Examples of innovative methods of collecting information in the field of health care, which are becoming widespread in the context of digitalization, are presented. The main sources of data in the field of health care used in Data Science are revealed. The specifics of the application of machine learning methods in the field of health care in the conditions of increasing competition between market participants and increasing demand for relevant products from the population are presented. Conclusions. The intensification of the integration of Data Science in the medical field is due to the increase of digitized data (statistics, textual informa- tion, visualizations, etc.). Through the use of machine learning methods, doctors and other health professionals have new opportunities to improve the efficiency of the health care system as a whole. Key words: Data science, efficiency, information, machine learning, medicine, Python, healthcare.


Sign in / Sign up

Export Citation Format

Share Document