A machine-learning framework for predicting multiple air pollutants' concentrations via multi-target regression and feature selection

2020 ◽  
Vol 715 ◽  
pp. 136991 ◽  
Author(s):  
Sahar Masmoudi ◽  
Haytham Elghazel ◽  
Dalila Taieb ◽  
Orhan Yazar ◽  
Amjad Kallel
2020 ◽  
Vol 13 (10) ◽  
pp. 305
Author(s):  
Eugene Lin ◽  
Po-Hsiu Kuo ◽  
Yu-Li Liu ◽  
Younger W.-Y. Yu ◽  
Albert C. Yang ◽  
...  

In the wake of recent advances in machine learning research, the study of pharmacogenomics using predictive algorithms serves as a new paradigmatic application. In this work, our goal was to explore an ensemble machine learning approach which aims to predict probable antidepressant treatment response and remission in major depressive disorder (MDD). To discover the status of antidepressant treatments, we established an ensemble predictive model with a feature selection algorithm resulting from the analysis of genetic variants and clinical variables of 421 patients who were treated with selective serotonin reuptake inhibitors. We also compared our ensemble machine learning framework with other state-of-the-art models including multi-layer feedforward neural networks (MFNNs), logistic regression, support vector machine, C4.5 decision tree, naïve Bayes, and random forests. Our data revealed that the ensemble predictive algorithm with feature selection (using fewer biomarkers) performed comparably to other predictive algorithms (such as MFNNs and logistic regression) to derive the perplexing relationship between biomarkers and the status of antidepressant treatments. Our study demonstrates that the ensemble machine learning framework may present a useful technique to create bioinformatics tools for discriminating non-responders from responders prior to antidepressant treatments.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Peng-fei Ke ◽  
Dong-sheng Xiong ◽  
Jia-hui Li ◽  
Zhi-lin Pan ◽  
Jing Zhou ◽  
...  

AbstractFinding effective and objective biomarkers to inform the diagnosis of schizophrenia is of great importance yet remains challenging. Relatively little work has been conducted on multi-biological data for the diagnosis of schizophrenia. In this cross-sectional study, we extracted multiple features from three types of biological data, including gut microbiota data, blood data, and electroencephalogram data. Then, an integrated framework of machine learning consisting of five classifiers, three feature selection algorithms, and four cross validation methods was used to discriminate patients with schizophrenia from healthy controls. Our results show that the support vector machine classifier without feature selection using the input features of multi-biological data achieved the best performance, with an accuracy of 91.7% and an AUC of 96.5% (p < 0.05). These results indicate that multi-biological data showed better discriminative capacity for patients with schizophrenia than single biological data. The top 5% discriminative features selected from the optimal model include the gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), the blood features (superoxide dismutase level, monocyte-lymphocyte ratio, and neutrophil count), and the electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal brain areas). The proposed integrated framework may be helpful for understanding the pathophysiology of schizophrenia and developing biomarkers for schizophrenia using multi-biological data.


2021 ◽  
Author(s):  
Andrew Lamont Hinton ◽  
Peter J Mucha

The demand for tight integration of compositional data analysis and machine learning methodologies for predictive modeling in high-dimensional settings has increased dramatically with the increasing availability of metagenomics data. We develop the differential compositional variation machine learning framework (DiCoVarML) with robust multi-level log ratio bio-marker discovery for metagenomic datasets. Our framework makes use of the full set of pairwise log ratios, scoring ratios according to their variation between classes and then selecting out a small subset of log ratios to accurately predict classes. Importantly, DiCoVarML supports a targeted feature selection mode enabling researchers to define the number of predictors used to develop models. We demonstrate the performance of our framework for binary classification tasks using both synthetic and real datasets. Selecting from all pairwise log ratios within the DiCoVarML framework provides greater flexibility that can in demonstrated cases lead to higher accuracy and enhanced biological insight.


2021 ◽  
Vol 16 (24) ◽  
pp. 255-272
Author(s):  
Edmund Evangelista

Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students' performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students' performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students' performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study.


2021 ◽  
Author(s):  
Peng-fei Ke ◽  
Dong-sheng Xiong ◽  
Jia-hui Li ◽  
Shi-jia Li ◽  
Jie Song ◽  
...  

Abstract Finding effective and objective biomarkers to inform the diagnosis of schizophrenia is of great importance yet remains challenging. However, there is relatively little work on multi-biological data for the diagnosis of schizophrenia. This was a cross-sectional study in which we extracted multiple features from three types of biological data including gut microbiota data, blood data, and electroencephalogram data. Then, an integrated framework of machine learning, consisting of five classifiers, three feature selection algorithms, and four cross-validation methods was used to discriminate patients with schizophrenia from healthy controls. Our results showed that the performance of the classifier using multi-biological data was better than that of the classifiers using single biological data, with 91.7% accuracy and 96.5% AUC. The most discriminative features (top 5%) for the classification include gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), blood features (superoxide dismutase, monocyte-lymphocyte ratio, and neutrophil), and electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal areas).The proposed integrated framework may be help in understanding the pathophysiology of schizophrenia and developing biomarkers for schizophrenia using multi-biological data.


Sign in / Sign up

Export Citation Format

Share Document