Machine Learning Approach Reveals the Assembly of Activated Sludge Microbiome with Different Carbon Sources during Microcosm Startup

Activated sludge (AS) microcosm experiments usually begin with inoculating a bioreactor with an AS mixed culture. During the bioreactor startup, AS communities undergo, to some extent, a distortion in their characteristics (e.g., loss of diversity). This work aimed to provide a predictive understanding of the dynamic changes in the community structure and diversity occurring during aerobic AS microcosm startups. AS microcosms were developed using three frequently used carbon sources: acetate (A), glucose (G), and starch (S), respectively. A mathematical modeling approach quantitatively determined that 1.7–2.4 times the solid retention time (SRT) was minimally required for the microcosm startups, during which substantial divergences in the community biomass and diversity (33–45% reduction in species richness and diversity) were observed. A machine learning modeling application using AS microbiome data could successfully (>95% accuracy) predict the assembly pattern of aerobic AS microcosm communities responsive to each carbon source. A feature importance analysis pinpointed specific taxa that were highly indicative of a microcosm feed source (A, G, or S) and significantly contributed for the ML-based predictive classification. The results of this study have important implications on the interpretation and validity of microcosm experiments using AS.

Download Full-text

AB0652 MACHINE LEARNING TO PREDICT EARLY TNF INHIBITOR USERS IN PATIENTS WITH ANKYLOSING SPONDYLITIS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3743 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1620.1-1621

Author(s):

J. Lee ◽

H. Kim ◽

S. Y. Kang ◽

S. Lee ◽

Y. H. Eun ◽

...

Keyword(s):

Machine Learning ◽

Ankylosing Spondylitis ◽

Tnf Inhibitors ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

Background:Tumor necrosis factor (TNF) inhibitors are important drugs in treating patients with ankylosing spondylitis (AS). However, they are not used as a first-line treatment for AS. There is an insufficient treatment response to the first-line treatment, non-steroidal anti-inflammatory drugs (NSAIDs), in over 40% of patients. If we can predict who will need TNF inhibitors at an earlier phase, adequate treatment can be provided at an appropriate time and potential damages can be avoided. There is no precise predictive model at present. Recently, various machine learning methods show great performances in predictions using clinical data.Objectives:We aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis.Methods:The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early TNF inhibitor users treated by TNF inhibitors within six months of their follow-up (early-TNF users), and the others (non-early-TNF users). Machine learning models were formulated to predict the early-TNF users using the baseline data. Additionally, feature importance analysis was performed to delineate significant baseline characteristics.Results:The numbers of early-TNF and non-early-TNF users were 90 and 509, respectively. The best performing ANN model utilized 3 hidden layers with 50 hidden nodes each; its performance (area under curve (AUC) = 0.75) was superior to logistic regression model, support vector machine, and random forest model (AUC = 0.72, 0.65, and 0.71, respectively) in predicting early-TNF users. Feature importance analysis revealed erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), and height as the top significant baseline characteristics for predicting early-TNF users. Among these characteristics, height was revealed by machine learning models but not by conventional statistical techniques.Conclusion:Our model displayed superior performance in predicting early TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.Disclosure of Interests:None declared

Download Full-text

The Comparison and Interpretation of Machine-Learning Models in Post-Stroke Functional Outcome Prediction

Diagnostics ◽

10.3390/diagnostics11101784 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1784

Author(s):

Shih-Chieh Chang ◽

Chan-Lin Chu ◽

Chih-Kuang Chen ◽

Hsiang-Ning Chang ◽

Alice M. K. Wong ◽

...

Keyword(s):

Machine Learning ◽

Area Under The Curve ◽

Superior Performance ◽

Support Vector ◽

Balance Test ◽

Post Stroke ◽

Feature Importance ◽

Value Range ◽

Importance Analysis ◽

Partial Dependence

Prediction of post-stroke functional outcomes is crucial for allocating medical resources. In this study, a total of 577 patients were enrolled in the Post-Acute Care-Cerebrovascular Disease (PAC-CVD) program, and 77 predictors were collected at admission. The outcome was whether a patient could achieve a Barthel Index (BI) score of >60 upon discharge. Eight machine-learning (ML) methods were applied, and their results were integrated by stacking method. The area under the curve (AUC) of the eight ML models ranged from 0.83 to 0.887, with random forest, stacking, logistic regression, and support vector machine demonstrating superior performance. The feature importance analysis indicated that the initial Berg Balance Test (BBS-I), initial BI (BI-I), and initial Concise Chinese Aphasia Test (CCAT-I) were the top three predictors of BI scores at discharge. The partial dependence plot (PDP) and individual conditional expectation (ICE) plot indicated that the predictors’ ability to predict outcomes was the most pronounced within a specific value range (e.g., BBS-I < 40 and BI-I < 60). BI at discharge could be predicted by information collected at admission with the aid of various ML models, and the PDP and ICE plots indicated that the predictors could predict outcomes at a certain value range.

Download Full-text

Feature Importance Analysis of Non-coding DNA/RNA Sequences Based on Machine Learning Approaches

10.1007/978-3-030-91814-9_8 ◽

2021 ◽

pp. 81-92

Author(s):

Breno Lívio Silva de Almeida ◽

Alvaro Pedroso Queiroz ◽

Anderson Paulo Avila Santos ◽

Robson Parmezan Bonidia ◽

Ulisses Nunes da Rocha ◽

...

Keyword(s):

Machine Learning ◽

Learning Approaches ◽

Rna Sequences ◽

Feature Importance ◽

Importance Analysis

Download Full-text

Predicting antibody affinity changes upon mutations by combining multiple predictors

Scientific Reports ◽

10.1038/s41598-020-76369-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Yoichi Kurumida ◽

Yutaka Saito ◽

Tomoshi Kameda

Keyword(s):

Machine Learning ◽

Molecular Mechanics ◽

Prediction Method ◽

Previous Method ◽

Antibody Engineering ◽

Antibody Affinity ◽

High Affinity ◽

Feature Importance ◽

Importance Analysis ◽

Improved Accuracy

Abstract Antibodies are proteins working in our immune system with high affinity and specificity for target antigens, making them excellent tools for both biotherapeutic and bioengineering applications. The prediction of antibody affinity changes upon mutations ($${{\Delta \Delta {\mathrm{G}}}}_{\mathrm{binding}}$$ Δ Δ G binding ) is important for antibody engineering. Numerous computational methods have been proposed based on different approaches including molecular mechanics and machine learning. However, the accuracy by each individual predictor is not enough for efficient antibody development. In this study, we develop a new prediction method by combining multiple predictors based on machine learning. Our method was tested on the SiPMAB database, evaluating the Pearson’s correlation coefficient between predicted and experimental $${{\Delta \Delta {\mathrm{G}}}}_{\mathrm{binding}}$$ Δ Δ G binding . Our method achieved higher accuracy (R = 0.69) than previous molecular mechanics or machine-learning based methods (R = 0.59) and the previous method using the average of multiple predictors (R = 0.64). Feature importance analysis indicated that the improved accuracy was obtained by combining predictors with different importance, which have different protocols for calculating energies and for generating mutant and unbound state structures. This study demonstrates that machine learning is a powerful framework for combining different approaches to predict antibody affinity changes.

Download Full-text

Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis

Scientific Reports ◽

10.1038/s41598-020-75352-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Seulkee Lee ◽

Yeonghee Eun ◽

Hyungjin Kim ◽

Hoon-Suk Cha ◽

Eun-Mi Koh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.

Download Full-text

Predicting Depression from Smartphone Behavioral Markers Using Machine Learning Methods, Hyper-parameter Optimization, and Feature Importance Analysis: An Exploratory Study (Preprint)

JMIR mhealth and uhealth ◽

10.2196/26540 ◽

2020 ◽

Author(s):

Kennedy Opoku Asare ◽

Yannik Terhorst ◽

Julio Vega ◽

Ella Peltonen ◽

Eemil Lagerspetz ◽

...

Keyword(s):

Machine Learning ◽

Parameter Optimization ◽

Exploratory Study ◽

Learning Methods ◽

Machine Learning Methods ◽

Feature Importance ◽

Importance Analysis ◽

Behavioral Markers

Download Full-text

Prediction of the effective reproduction number of COVID-19 in Greece. A machine learning approach using Google mobility data.

10.1101/2021.05.14.21257209 ◽

2021 ◽

Author(s):

Athanasios Arvanitis ◽

Irini Furxhi ◽

Thomas Tasioulis ◽

Konstantinos Karatzas

Keyword(s):

Machine Learning ◽

Reproduction Number ◽

Short Term ◽

Attribute Importance ◽

Mobility Data ◽

Machine Learning Approach ◽

Health Related ◽

Importance Analysis ◽

Short Term Prediction ◽

Of Greece

This paper demonstrates how a short-term prediction of the effective reproduction number (Rt) of COVID-19 in regions of Greece is achieved based on online mobility data. Various machine learning methods are applied to predict Rt and attribute importance analysis is performed to reveal the most important variables that affect the accurate prediction of Rt. Our results are based on an ensemble of diverse Rt methodologies to provide non-precautious and non-indulgent predictions. The model demonstrates robust results and the methodology overall represents a promising approach towards COVID-19 outbreak prediction. This paper can help health related authorities when deciding non-nosocomial interventions to prevent the spread of COVID-19.

Download Full-text

Machine Learning and DWI Brain Communicability Networks for Alzheimer’s Disease Detection

Applied Sciences ◽

10.3390/app10030934 ◽

2020 ◽

Vol 10 (3) ◽

pp. 934 ◽

Cited By ~ 5

Author(s):

Eufemia Lella ◽

Angela Lombardi ◽

Nicola Amoroso ◽

Domenico Diacono ◽

Tommaso Maggipinto ◽

...

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Imaging Modality ◽

Machine Learning Techniques ◽

Support Vector ◽

Whole Brain ◽

Feature Importance ◽

Importance Analysis ◽

The Brain

Signal processing and machine learning techniques are changing the clinical practice based on medical imaging from many perspectives. A major topic is related to (i) the development of computer aided diagnosis systems to provide clinicians with novel, non-invasive and low-cost support-tools, and (ii) to the development of new methodologies for the analysis of biomedical data for finding new disease biomarkers. Advancements have been recently achieved in the context of Alzheimer’s disease (AD) diagnosis through the use of diffusion weighted imaging (DWI) data. When combined with tractography algorithms, this imaging modality enables the reconstruction of the physical connections of the brain that can be subsequently investigated through a complex network-based approach. A graph metric particularly suited to describe the disruption of the brain connectivity due to AD is communicability. In this work, we develop a machine learning framework for the classification and feature importance analysis of AD based on communicability at the whole brain level. We fairly compare the performance of three state-of-the-art classification models, namely support vector machines, random forests and artificial neural networks, on the connectivity networks of a balanced cohort of healthy control subjects and AD patients from the ADNI database. Moreover, we clinically validate the information content of the communicability metric by performing a feature importance analysis. Both performance comparison and feature importance analysis provide evidence of the robustness of the method. The results obtained confirm that the whole brain structural communicability alterations due to AD are a valuable biomarker for the characterization and investigation of pathological conditions.

Download Full-text

Machine learning classification models for fetal skeletal development performance prediction using maternal bone metabolic proteins in goats

PeerJ ◽

10.7717/peerj.7840 ◽

2019 ◽

Vol 7 ◽

pp. e7840

Author(s):

Yong Liu ◽

Cristian R. Munteanu ◽

Qiongxian Yan ◽

Nieves Pedreira ◽

Jinhe Kang ◽

...

Keyword(s):

Machine Learning ◽

Efficient Method ◽

Bone Development ◽

Late Gestation ◽

Classification Models ◽

Experimental Conditions ◽

Fetal Bone ◽

Nutritional Conditions ◽

Feature Importance ◽

Importance Analysis

Background In developing countries, maternal undernutrition is the major intrauterine environmental factor contributing to fetal development and adverse pregnancy outcomes. Maternal nutrition restriction (MNR) in gestation has proven to impact overall growth, bone development, and proliferation and metabolism of mesenchymal stem cells in offspring. However, the efficient method for elucidation of fetal bone development performance through maternal bone metabolic biochemical markers remains elusive. Methods We adapted goats to elucidate fetal bone development state with maternal serum bone metabolic proteins under malnutrition conditions in mid- and late-gestation stages. We used the experimental data to create 72 datasets by mixing different input features such as one-hot encoding of experimental conditions, metabolic original data, experimental-centered features and experimental condition probabilities. Seven Machine Learning methods have been used to predict six fetal bone parameters (weight, length, and diameter of femur/humerus). Results The results indicated that MNR influences fetal bone development (femur and humerus) and fetal bone metabolic protein levels (C-terminal telopeptides of collagen I, CTx, in middle-gestation and N-terminal telopeptides of collagen I, NTx, in late-gestation), and maternal bone metabolites (low bone alkaline phosphatase, BALP, in middle-gestation and high BALP in late-gestation). The results show the importance of experimental conditions (ECs) encoding by mixing the information with the serum metabolic data. The best classification models obtained for femur weight (Fw) and length (FI), and humerus weight (Hw) are Support Vector Machines classifiers with the leave-one-out cross-validation accuracy of 1. The rest of the accuracies are 0.98, 0.946 and 0.696 for the diameter of femur (Fd), diameter and length of humerus (Hd, Hl), respectively. With the feature importance analysis, the moving averages mixed ECs are generally more important for the majority of the models. The moving average of parathyroid hormone (PTH) within nutritional conditions (MA-PTH-experim) is important for Fd, Hd and Hl prediction models but its removal for enhancing the Fw, Fl and Hw model performance. Further, using one feature models, it is possible to obtain even more accurate models compared with the feature importance analysis models. In conclusion, the machine learning is an efficient method to confirm the important role of PTH and BALP mixed with nutritional conditions for fetal bone growth performance of goats. All the Python scripts including results and comments are available into an open repository at https://gitlab.com/muntisa/goat-bones-machine-learning.

Download Full-text

Interpretable Machine Learning for COVID-19 Diagnosis Through Clinical Variables

10.48011/asba.v2i1.1590 ◽

2020 ◽

Author(s):

Lucas M. Thimoteo ◽

Marley M. Vellasco ◽

Jorge M. do Amaral ◽

Karla Figueiredo ◽

Cátia Lie Yokoyama ◽

...

Keyword(s):

Machine Learning ◽

Linear Model ◽

Linear Models ◽

Learning Approach ◽

Clinical Variables ◽

Interpretable Machine Learning ◽

Machine Learning Approach ◽

Feature Importance ◽

Non Linear ◽

The Difference

This work proposes an interpretable machine learning approach to diagnosesuspected COVID-19 cases based on clinical variables. Results obtained for the proposed models have F-2 measure superior to 0.80 and accuracy superior to 0.85. Interpretation of the linear model feature importance brought insights about the most relevant features. Shapley Additive Explanations were used in the non-linear models. They were able to show the difference between positive and negative patients as well as offer a global interpretability sense of the models.

Download Full-text