Ensemble Data Mining Methods

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch024 ◽

2008 ◽

pp. 356-363 ◽

Cited By ~ 4

Author(s):

Nikunj C. Oza

Keyword(s):

Machine Learning ◽

Data Mining ◽

Prediction Accuracy ◽

Ensemble Methods ◽

Multiple Models ◽

Machine Learning Methods ◽

Ensemble Data ◽

Basic Goal ◽

Mining Methods ◽

The Individual

Ensemble data mining methods, also known as committee methods or model combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: Each member of the committee should be as competent as possible, but the members should complement one another. If the members are not complementary, that is, if they always agree, then the committee is unnecessary — any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

Download Full-text

Ensemble Data Mining Methods

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch119 ◽

2011 ◽

pp. 770-776

Author(s):

Nikunj C. Oza

Keyword(s):

Machine Learning ◽

Data Mining ◽

Prediction Accuracy ◽

Ensemble Methods ◽

Multiple Models ◽

Machine Learning Methods ◽

Ensemble Data ◽

Basic Goal ◽

Mining Methods ◽

The Individual

Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, that is, if they always agree, then the committee is unnecessary—any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

Download Full-text

Defining and Predicting Pain Volatility in Users of the Manage My Pain App: Analysis Using Data Mining and Machine Learning Methods (Preprint)

10.2196/preprints.12001 ◽

2018 ◽

Author(s):

Quazi Abidur Rahman ◽

Tahir Janmohamed ◽

Meysam Pirbaglou ◽

Hance Clarke ◽

Paul Ritvo ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Random Forests ◽

Prediction Accuracy ◽

Clustering Algorithm ◽

Prediction Models ◽

Class Imbalance ◽

Learning Methods ◽

Machine Learning Methods ◽

High Volatility

BACKGROUND Measuring and predicting pain volatility (fluctuation or variability in pain scores over time) can help improve pain management. Perceptions of pain and its consequent disabling effects are often heightened under the conditions of greater uncertainty and unpredictability associated with pain volatility. OBJECTIVE This study aimed to use data mining and machine learning methods to (1) define a new measure of pain volatility and (2) predict future pain volatility levels from users of the pain management app, Manage My Pain, based on demographic, clinical, and app use features. METHODS Pain volatility was defined as the mean of absolute changes between 2 consecutive self-reported pain severity scores within the observation periods. The k-means clustering algorithm was applied to users’ pain volatility scores at the first and sixth month of app use to establish a threshold discriminating low from high volatility classes. Subsequently, we extracted 130 demographic, clinical, and app usage features from the first month of app use to predict these 2 volatility classes at the sixth month of app use. Prediction models were developed using 4 methods: (1) logistic regression with ridge estimators; (2) logistic regression with Least Absolute Shrinkage and Selection Operator; (3) Random Forests; and (4) Support Vector Machines. Overall prediction accuracy and accuracy for both classes were calculated to compare the performance of the prediction models. Training and testing were conducted using 5-fold cross validation. A class imbalance issue was addressed using a random subsampling of the training dataset. Users with at least five pain records in both the predictor and outcome periods (N=782 users) are included in the analysis. RESULTS k-means clustering algorithm was applied to pain volatility scores to establish a threshold of 1.6 to differentiate between low and high volatility classes. After validating the threshold using random subsamples, 2 classes were created: low volatility (n=611) and high volatility (n=171). In this class-imbalanced dataset, all 4 prediction models achieved 78.1% (611/782) to 79.0% (618/782) in overall accuracy. However, all models have a prediction accuracy of less than 18.7% (32/171) for the high volatility class. After addressing the class imbalance issue using random subsampling, results improved across all models for the high volatility class to greater than 59.6% (102/171). The prediction model based on Random Forests performs the best as it consistently achieves approximately 70% accuracy for both classes across 3 random subsamples. CONCLUSIONS We propose a novel method for measuring pain volatility. Cluster analysis was applied to divide users into subsets of low and high volatility classes. These classes were then predicted at the sixth month of app use with an acceptable degree of accuracy using machine learning methods based on the features extracted from demographic, clinical, and app use information from the first month.

Download Full-text

Completing the Market

IMF Working Papers ◽

10.5089/9781513524085.001 ◽

2019 ◽

Vol 19 (292) ◽

Author(s):

Nan Hu ◽

Jian Li ◽

Alexis Meyer-Cirkel

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Prediction Accuracy ◽

Ensemble Methods ◽

Predictive Performance ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Ensemble Machine Learning ◽

Out Of Sample Prediction

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.

Download Full-text

TESTING PREDICTION ACCURACY OF HDU ADMISSION FOLLOWING HIGH GRADE SEROUS ADVANCED OVARIAN CANCER CYTOREDUCTIVE SURGERY USING MACHINE LEARNING METHODS.

10.26226/morressier.5fa3ee5d55b1fd4cc4dd93d7 ◽

2020 ◽

Author(s):

Alexandros Laios ◽

Angelika Kaufmann ◽

Mohamed Otify ◽

Diederick De Jong ◽

Tim Broadhead ◽

...

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

Cytoreductive Surgery ◽

Prediction Accuracy ◽

Advanced Ovarian Cancer ◽

High Grade ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

331 Testing prediction accuracy of hdu admission following high grade serous advanced ovarian cancer cytoreductive surgery using machine learning methods

10.1136/ijgc-2020-esgo.195 ◽

2020 ◽

Author(s):

Alexandros Laios ◽

Camilo De Lelis Medeiros-de-Morais ◽

Yong Tan ◽

Gwendolyn Saalmink ◽

Mohamed Otify ◽

...

Keyword(s):

Machine Learning ◽

Ovarian Cancer ◽

Cytoreductive Surgery ◽

Prediction Accuracy ◽

Advanced Ovarian Cancer ◽

High Grade ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Data Mining and Machine Learning Methods Applied to A Numerical Clinching Model

Computer Modeling in Engineering & Sciences ◽

10.31614/cmes.2018.04112 ◽

2018 ◽

Vol 117 (3) ◽

pp. 387-423

Author(s):

Marco Götz ◽

Ferenc Leichsenring ◽

Thomas Kropp ◽

Peter Müller ◽

Tobias Falk ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Enhanced Machine Learning and Data Mining Methods for Analysing Large Hybrid Electric Vehicle Fleets based on Load Spectrum Data

10.1007/978-3-658-20367-2 ◽

2018 ◽

Cited By ~ 2

Author(s):

Philipp Bergmeir

Keyword(s):

Machine Learning ◽

Data Mining ◽

Electric Vehicle ◽

Hybrid Electric Vehicle ◽

Load Spectrum ◽

Mining Methods ◽

Hybrid Electric ◽

Spectrum Data

Download Full-text

Towards Behaviour Recognition with Unlabelled Sensor Data

Human Behavior Recognition Technologies ◽

10.4018/978-1-4666-3682-8.ch005 ◽

2013 ◽

pp. 86-110

Author(s):

Sook-Ling Chua ◽

Stephen Marsland ◽

Hans W. Guesgen

Keyword(s):

Machine Learning ◽

Data Mining ◽

Inverse Problem ◽

Sensor Data ◽

Training Set ◽

Learning Methods ◽

Machine Learning Methods ◽

Using Data ◽

Symbolic Approach ◽

Behaviour Recognition

The problem of behaviour recognition based on data from sensors is essentially an inverse problem: given a set of sensor observations, identify the sequence of behaviours that gave rise to them. In a smart home, the behaviours are likely to be the standard human behaviours of living, and the observations will depend upon the sensors that the house is equipped with. There are two main approaches to identifying behaviours from the sensor stream. One is to use a symbolic approach, which explicitly models the recognition process. Another is to use a sub-symbolic approach to behaviour recognition, which is the focus in this chapter, using data mining and machine learning methods. While there have been many machine learning methods of identifying behaviours from the sensor stream, they have generally relied upon a labelled dataset, where a person has manually identified their behaviour at each time. This is particularly tedious to do, resulting in relatively small datasets, and is also prone to significant errors as people do not pinpoint the end of one behaviour and commencement of the next correctly. In this chapter, the authors consider methods to deal with unlabelled sensor data for behaviour recognition, and investigate their use. They then consider whether they are best used in isolation, or should be used as preprocessing to provide a training set for a supervised method.

Download Full-text

Big Data and Service Science

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch009 ◽

2016 ◽

pp. 180-196

Author(s):

Tu-Bao Ho ◽

Siriwon Taewijit ◽

Quang-Bach Ho ◽

Hieu-Chi Dam

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Customer Relationship Management ◽

Relationship Management ◽

Service Science ◽

Customer Relationship ◽

Service Value ◽

Mining Methods ◽

The Relationship

Big data is about handling huge and/or complex datasets that conventional technologies cannot handle or handle well. Big data is currently receiving tremendous attention from both industry and academia as there is much more data around us than ever before. This chapter addresses the relationship between big data and service science, especially how big data can contribute to the process of co-creation of service value. In particular, the value co-creation in terms of customer relationship management is mentioned. The chapter starts with brief descriptions of big data, machine learning and data mining methods, service science and its model of value co-creation, and then addresses the key idea of how big data can contribute to co-create service value.

Download Full-text