Prediction of Antidepressant Treatment Response and Remission Using an Ensemble Machine Learning Framework

In the wake of recent advances in machine learning research, the study of pharmacogenomics using predictive algorithms serves as a new paradigmatic application. In this work, our goal was to explore an ensemble machine learning approach which aims to predict probable antidepressant treatment response and remission in major depressive disorder (MDD). To discover the status of antidepressant treatments, we established an ensemble predictive model with a feature selection algorithm resulting from the analysis of genetic variants and clinical variables of 421 patients who were treated with selective serotonin reuptake inhibitors. We also compared our ensemble machine learning framework with other state-of-the-art models including multi-layer feedforward neural networks (MFNNs), logistic regression, support vector machine, C4.5 decision tree, naïve Bayes, and random forests. Our data revealed that the ensemble predictive algorithm with feature selection (using fewer biomarkers) performed comparably to other predictive algorithms (such as MFNNs and logistic regression) to derive the perplexing relationship between biomarkers and the status of antidepressant treatments. Our study demonstrates that the ensemble machine learning framework may present a useful technique to create bioinformatics tools for discriminating non-responders from responders prior to antidepressant treatments.

Download Full-text

Prediction of functional outcomes of schizophrenia with genetic biomarkers using a bagging ensemble machine learning method with feature selection

Scientific Reports ◽

10.1038/s41598-021-89540-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Eugene Lin ◽

Chieh-Hsin Lin ◽

Hsien-Yuan Lane

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Functional Outcome ◽

Functional Outcomes ◽

Learning Algorithm ◽

Machine Learning Method ◽

Ensemble Machine Learning ◽

Predictive Algorithms ◽

Ensemble Algorithm ◽

Bagging Ensemble

AbstractGenetic variants such as single nucleotide polymorphisms (SNPs) have been suggested as potential molecular biomarkers to predict the functional outcome of psychiatric disorders. To assess the schizophrenia’ functional outcomes such as Quality of Life Scale (QLS) and the Global Assessment of Functioning (GAF), we leveraged a bagging ensemble machine learning method with a feature selection algorithm resulting from the analysis of 11 SNPs (AKT1 rs1130233, COMT rs4680, DISC1 rs821616, DRD3 rs6280, G72 rs1421292, G72 rs2391191, 5-HT2A rs6311, MET rs2237717, MET rs41735, MET rs42336, and TPH2 rs4570625) of 302 schizophrenia patients in the Taiwanese population. We compared our bagging ensemble machine learning algorithm with other state-of-the-art models such as linear regression, support vector machine, multilayer feedforward neural networks, and random forests. The analysis reported that the bagging ensemble algorithm with feature selection outperformed other predictive algorithms to forecast the QLS functional outcome of schizophrenia by using the G72 rs2391191 and MET rs2237717 SNPs. Furthermore, the bagging ensemble algorithm with feature selection surpassed other predictive algorithms to forecast the GAF functional outcome of schizophrenia by using the AKT1 rs1130233 SNP. The study suggests that the bagging ensemble machine learning algorithm with feature selection might present an applicable approach to provide software tools for forecasting the functional outcomes of schizophrenia using molecular biomarkers.

Download Full-text

A Comparison of Feature Selection and Forecasting Machine Learning Algorithms for Predicting Glycaemia in Type 1 Diabetes Mellitus

Applied Sciences ◽

10.3390/app11041742 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1742

Author(s):

Ignacio Rodríguez-Rodríguez ◽

José-Víctor Rodríguez ◽

Wai Lok Woo ◽

Bo Wei ◽

Domingo-Javier Pardo-Quiles

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Type 1 Diabetes ◽

Feature Selection ◽

Blood Glucose ◽

Type 1 Diabetes Mellitus ◽

Support Vector ◽

Chronic Hyperglycemia ◽

Predictive Algorithms

Type 1 diabetes mellitus (DM1) is a metabolic disease derived from falls in pancreatic insulin production resulting in chronic hyperglycemia. DM1 subjects usually have to undertake a number of assessments of blood glucose levels every day, employing capillary glucometers for the monitoring of blood glucose dynamics. In recent years, advances in technology have allowed for the creation of revolutionary biosensors and continuous glucose monitoring (CGM) techniques. This has enabled the monitoring of a subject’s blood glucose level in real time. On the other hand, few attempts have been made to apply machine learning techniques to predicting glycaemia levels, but dealing with a database containing such a high level of variables is problematic. In this sense, to the best of the authors’ knowledge, the issues of proper feature selection (FS)—the stage before applying predictive algorithms—have not been subject to in-depth discussion and comparison in past research when it comes to forecasting glycaemia. Therefore, in order to assess how a proper FS stage could improve the accuracy of the glycaemia forecasted, this work has developed six FS techniques alongside four predictive algorithms, applying them to a full dataset of biomedical features related to glycaemia. These were harvested through a wide-ranging passive monitoring process involving 25 patients with DM1 in practical real-life scenarios. From the obtained results, we affirm that Random Forest (RF) as both predictive algorithm and FS strategy offers the best average performance (Root Median Square Error, RMSE = 18.54 mg/dL) throughout the 12 considered predictive horizons (up to 60 min in steps of 5 min), showing Support Vector Machines (SVM) to have the best accuracy as a forecasting algorithm when considering, in turn, the average of the six FS techniques applied (RMSE = 20.58 mg/dL).

Download Full-text

UAV-Based Hyperspectral and Ensemble Machine Learning for Predicting Yield in Winter Wheat

Agronomy ◽

10.3390/agronomy12010202 ◽

2022 ◽

Vol 12 (1) ◽

pp. 202

Author(s):

Zhen Chen ◽

Qian Cheng ◽

Fuyi Duan ◽

Xiuqiao Huang ◽

Honggang Xu ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Winter Wheat ◽

Prediction Model ◽

Grain Filling ◽

Yield Prediction ◽

Spectral Indices ◽

Learner Model ◽

Ensemble Machine Learning ◽

Base Learner

Winter wheat is a widely-grown cereal crop worldwide. Using growth-stage information to estimate winter wheat yields in a timely manner is essential for accurate crop management and rapid decision-making in sustainable agriculture, and to increase productivity while reducing environmental impact. UAV remote sensing is widely used in precision agriculture due to its flexibility and increased spatial and spectral resolution. Hyperspectral data are used to model crop traits because of their ability to provide continuous rich spectral information and higher spectral fidelity. In this study, hyperspectral image data of the winter wheat crop canopy at the flowering and grain-filling stages was acquired by a low-altitude unmanned aerial vehicle (UAV), and machine learning was used to predict winter wheat yields. Specifically, a large number of spectral indices were extracted from the spectral data, and three feature selection methods, recursive feature elimination (RFE), Boruta feature selection, and the Pearson correlation coefficient (PCC), were used to filter high spectral indices in order to reduce the dimensionality of the data. Four major basic learner models, (1) support vector machine (SVM), (2) Gaussian process (GP), (3) linear ridge regression (LRR), and (4) random forest (RF), were also constructed, and an ensemble machine learning model was developed by combining the four base learner models. The results showed that the SVM yield prediction model, constructed on the basis of the preferred features, performed the best among the base learner models, with an R2 between 0.62 and 0.73. The accuracy of the proposed ensemble learner model was higher than that of each base learner model; moreover, the R2 (0.78) for the yield prediction model based on Boruta’s preferred characteristics was the highest at the grain-filling stage.

Download Full-text

A Machine Learning Framework for Intrusion Detection System in IoT Networks Using an Ensemble Feature Selection Method

10.1109/iemcon53756.2021.9623082 ◽

2021 ◽

Author(s):

Ge Guo

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Feature Selection Method ◽

Selection Method ◽

Learning Framework

Download Full-text

Non-Intrusive Load Monitoring of Residential Water-Heating Circuit Using Ensemble Machine Learning Techniques

Inventions ◽

10.3390/inventions5040057 ◽

2020 ◽

Vol 5 (4) ◽

pp. 57

Author(s):

Attique Ur Rehman ◽

Tek Tjing Lie ◽

Brice Vallès ◽

Shafiqur Rahman Tito

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Learning Techniques ◽

Learning Models ◽

Water Heating ◽

Energy Monitoring ◽

Non Invasive ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Load Monitoring

The recent advancement in computational capabilities and deployment of smart meters have caused non-intrusive load monitoring to revive itself as one of the promising techniques of energy monitoring. Toward effective energy monitoring, this paper presents a non-invasive load inference approach assisted by feature selection and ensemble machine learning techniques. For evaluation and validation purposes of the proposed approach, one of the major residential load elements having solid potential toward energy efficiency applications, i.e., water heating, is considered. Moreover, to realize the real-life deployment, digital simulations are carried out on low-sampling real-world load measurements: New Zealand GREEN Grid Database. For said purposes, MATLAB and Python (Scikit-Learn) are used as simulation tools. The employed learning models, i.e., standalone and ensemble, are trained on a single household’s load data and later tested rigorously on a set of diverse households’ load data, to validate the generalization capability of the employed models. This paper presents a comprehensive performance evaluation of the presented approach in the context of event detection, feature selection, and learning models. Based on the presented study and corresponding analysis of the results, it is concluded that the proposed approach generalizes well to the unseen testing data and yields promising results in terms of non-invasive load inference.

Download Full-text

A machine-learning framework for predicting multiple air pollutants' concentrations via multi-target regression and feature selection

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.136991 ◽

2020 ◽

Vol 715 ◽

pp. 136991 ◽

Cited By ~ 2

Author(s):

Sahar Masmoudi ◽

Haytham Elghazel ◽

Dalila Taieb ◽

Orhan Yazar ◽

Amjad Kallel

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Air Pollutants ◽

Learning Framework

Download Full-text

Decision letter for "Combining machine‐learning algorithms for prediction of antidepressant treatment response"

10.1111/acps.13250/v2/decision1 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Treatment Response ◽

Antidepressant Treatment ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Feature Selection for Kick Detection With Machine Learning Using Laboratory Data

Volume 8: Polar and Arctic Sciences and Technology; Petroleum Technology ◽

10.1115/omae2019-95496 ◽

2019 ◽

Cited By ~ 1

Author(s):

Suranga C. H. Geekiyanage ◽

Adrian Ambrus ◽

Dan Sui

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Data Analysis ◽

Laboratory Data ◽

Surface Flow ◽

Full Scale ◽

Rate Of Change ◽

Laboratory Scale ◽

Detection Methods

Abstract Conventional kick detection methods mainly include monitoring pit gains, surface flow data (flow in and flow out), surface and down-hole pressure variations, and outputs from physics-based models. Kick detection times depend on a driller’s individual ability to interpret these drilling measurements, symptoms and model predictions. Furthermore, testing a novel data-driven solution in a full-scale operation may induce non-productive time, safety risks and crew fatigue adding to false alarms that inevitably occur during testing. Therefore, the development of better, faster and less human intervention-dependent kick detection on a laboratory scale system is a valuable step before full-scale testing. We have generated a dataset containing seven typical drilling measurements and a sequence of gas kicks from experiments conducted in the laboratory scale. First, we employ data analysis tools following data pre-processing steps, data scaling, outlier detection, and natural feature selection. Next, we consider additional “engineered features” and apply different feature combinations to logistic regression with an ensemble method (boosting) for developing kick detection algorithms. In our data analysis, ‘Delta flow’ (difference between flow in and flow out of the well) and ‘Rate of change of delta flow’ designed features, combined with logistic regression and boosting, give promising results in detecting kicks. Finally, we propose an intelligent algorithm and alarm architecture for a complete kick alarm system, which draws from both data analysis and machine learning models developed in this work.

Download Full-text

Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan

Landslides ◽

10.1007/s10346-019-01286-5 ◽

2019 ◽

Vol 17 (3) ◽

pp. 641-658 ◽

Cited By ~ 36

Author(s):

Jie Dou ◽

Ali P. Yunus ◽

Dieu Tien Bui ◽

Abdelaziz Merghadi ◽

Mehebub Sahana ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector ◽

Learning Framework ◽

Ensemble Machine Learning ◽

Mountainous Watershed

Download Full-text

A Comparative Study using Feature Selection to Predict the Behaviour of Bank Customers

E3S Web of Conferences ◽

10.1051/e3sconf/202018401011 ◽

2020 ◽

Vol 184 ◽

pp. 01011

Author(s):

Sreethi Musunuru ◽

Mahaalakshmi Mukkamala ◽

Latha Kunaparaju ◽

N V Ganapathi Raju

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Random Forest Classifier ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

The Status ◽

Personal Level ◽

Near Future ◽

Structure Communication

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.

Download Full-text