Feature Importance Analysis of Non-coding DNA/RNA Sequences Based on Machine Learning Approaches

2021 ◽  
pp. 81-92
Author(s):  
Breno Lívio Silva de Almeida ◽  
Alvaro Pedroso Queiroz ◽  
Anderson Paulo Avila Santos ◽  
Robson Parmezan Bonidia ◽  
Ulisses Nunes da Rocha ◽  
...  
2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1620.1-1621
Author(s):  
J. Lee ◽  
H. Kim ◽  
S. Y. Kang ◽  
S. Lee ◽  
Y. H. Eun ◽  
...  

Background:Tumor necrosis factor (TNF) inhibitors are important drugs in treating patients with ankylosing spondylitis (AS). However, they are not used as a first-line treatment for AS. There is an insufficient treatment response to the first-line treatment, non-steroidal anti-inflammatory drugs (NSAIDs), in over 40% of patients. If we can predict who will need TNF inhibitors at an earlier phase, adequate treatment can be provided at an appropriate time and potential damages can be avoided. There is no precise predictive model at present. Recently, various machine learning methods show great performances in predictions using clinical data.Objectives:We aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis.Methods:The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early TNF inhibitor users treated by TNF inhibitors within six months of their follow-up (early-TNF users), and the others (non-early-TNF users). Machine learning models were formulated to predict the early-TNF users using the baseline data. Additionally, feature importance analysis was performed to delineate significant baseline characteristics.Results:The numbers of early-TNF and non-early-TNF users were 90 and 509, respectively. The best performing ANN model utilized 3 hidden layers with 50 hidden nodes each; its performance (area under curve (AUC) = 0.75) was superior to logistic regression model, support vector machine, and random forest model (AUC = 0.72, 0.65, and 0.71, respectively) in predicting early-TNF users. Feature importance analysis revealed erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), and height as the top significant baseline characteristics for predicting early-TNF users. Among these characteristics, height was revealed by machine learning models but not by conventional statistical techniques.Conclusion:Our model displayed superior performance in predicting early TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.Disclosure of Interests:None declared


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1784
Author(s):  
Shih-Chieh Chang ◽  
Chan-Lin Chu ◽  
Chih-Kuang Chen ◽  
Hsiang-Ning Chang ◽  
Alice M. K. Wong ◽  
...  

Prediction of post-stroke functional outcomes is crucial for allocating medical resources. In this study, a total of 577 patients were enrolled in the Post-Acute Care-Cerebrovascular Disease (PAC-CVD) program, and 77 predictors were collected at admission. The outcome was whether a patient could achieve a Barthel Index (BI) score of >60 upon discharge. Eight machine-learning (ML) methods were applied, and their results were integrated by stacking method. The area under the curve (AUC) of the eight ML models ranged from 0.83 to 0.887, with random forest, stacking, logistic regression, and support vector machine demonstrating superior performance. The feature importance analysis indicated that the initial Berg Balance Test (BBS-I), initial BI (BI-I), and initial Concise Chinese Aphasia Test (CCAT-I) were the top three predictors of BI scores at discharge. The partial dependence plot (PDP) and individual conditional expectation (ICE) plot indicated that the predictors’ ability to predict outcomes was the most pronounced within a specific value range (e.g., BBS-I < 40 and BI-I < 60). BI at discharge could be predicted by information collected at admission with the aid of various ML models, and the PDP and ICE plots indicated that the predictors could predict outcomes at a certain value range.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yoichi Kurumida ◽  
Yutaka Saito ◽  
Tomoshi Kameda

Abstract Antibodies are proteins working in our immune system with high affinity and specificity for target antigens, making them excellent tools for both biotherapeutic and bioengineering applications. The prediction of antibody affinity changes upon mutations ($${{\Delta \Delta {\mathrm{G}}}}_{\mathrm{binding}}$$ Δ Δ G binding ) is important for antibody engineering. Numerous computational methods have been proposed based on different approaches including molecular mechanics and machine learning. However, the accuracy by each individual predictor is not enough for efficient antibody development. In this study, we develop a new prediction method by combining multiple predictors based on machine learning. Our method was tested on the SiPMAB database, evaluating the Pearson’s correlation coefficient between predicted and experimental $${{\Delta \Delta {\mathrm{G}}}}_{\mathrm{binding}}$$ Δ Δ G binding . Our method achieved higher accuracy (R = 0.69) than previous molecular mechanics or machine-learning based methods (R = 0.59) and the previous method using the average of multiple predictors (R = 0.64). Feature importance analysis indicated that the improved accuracy was obtained by combining predictors with different importance, which have different protocols for calculating energies and for generating mutant and unbound state structures. This study demonstrates that machine learning is a powerful framework for combining different approaches to predict antibody affinity changes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Seulkee Lee ◽  
Yeonghee Eun ◽  
Hyungjin Kim ◽  
Hoon-Suk Cha ◽  
Eun-Mi Koh ◽  
...  

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.


2020 ◽  
Vol 10 (3) ◽  
pp. 934 ◽  
Author(s):  
Eufemia Lella ◽  
Angela Lombardi ◽  
Nicola Amoroso ◽  
Domenico Diacono ◽  
Tommaso Maggipinto ◽  
...  

Signal processing and machine learning techniques are changing the clinical practice based on medical imaging from many perspectives. A major topic is related to (i) the development of computer aided diagnosis systems to provide clinicians with novel, non-invasive and low-cost support-tools, and (ii) to the development of new methodologies for the analysis of biomedical data for finding new disease biomarkers. Advancements have been recently achieved in the context of Alzheimer’s disease (AD) diagnosis through the use of diffusion weighted imaging (DWI) data. When combined with tractography algorithms, this imaging modality enables the reconstruction of the physical connections of the brain that can be subsequently investigated through a complex network-based approach. A graph metric particularly suited to describe the disruption of the brain connectivity due to AD is communicability. In this work, we develop a machine learning framework for the classification and feature importance analysis of AD based on communicability at the whole brain level. We fairly compare the performance of three state-of-the-art classification models, namely support vector machines, random forests and artificial neural networks, on the connectivity networks of a balanced cohort of healthy control subjects and AD patients from the ADNI database. Moreover, we clinically validate the information content of the communicability metric by performing a feature importance analysis. Both performance comparison and feature importance analysis provide evidence of the robustness of the method. The results obtained confirm that the whole brain structural communicability alterations due to AD are a valuable biomarker for the characterization and investigation of pathological conditions.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7840
Author(s):  
Yong Liu ◽  
Cristian R. Munteanu ◽  
Qiongxian Yan ◽  
Nieves Pedreira ◽  
Jinhe Kang ◽  
...  

Background In developing countries, maternal undernutrition is the major intrauterine environmental factor contributing to fetal development and adverse pregnancy outcomes. Maternal nutrition restriction (MNR) in gestation has proven to impact overall growth, bone development, and proliferation and metabolism of mesenchymal stem cells in offspring. However, the efficient method for elucidation of fetal bone development performance through maternal bone metabolic biochemical markers remains elusive. Methods We adapted goats to elucidate fetal bone development state with maternal serum bone metabolic proteins under malnutrition conditions in mid- and late-gestation stages. We used the experimental data to create 72 datasets by mixing different input features such as one-hot encoding of experimental conditions, metabolic original data, experimental-centered features and experimental condition probabilities. Seven Machine Learning methods have been used to predict six fetal bone parameters (weight, length, and diameter of femur/humerus). Results The results indicated that MNR influences fetal bone development (femur and humerus) and fetal bone metabolic protein levels (C-terminal telopeptides of collagen I, CTx, in middle-gestation and N-terminal telopeptides of collagen I, NTx, in late-gestation), and maternal bone metabolites (low bone alkaline phosphatase, BALP, in middle-gestation and high BALP in late-gestation). The results show the importance of experimental conditions (ECs) encoding by mixing the information with the serum metabolic data. The best classification models obtained for femur weight (Fw) and length (FI), and humerus weight (Hw) are Support Vector Machines classifiers with the leave-one-out cross-validation accuracy of 1. The rest of the accuracies are 0.98, 0.946 and 0.696 for the diameter of femur (Fd), diameter and length of humerus (Hd, Hl), respectively. With the feature importance analysis, the moving averages mixed ECs are generally more important for the majority of the models. The moving average of parathyroid hormone (PTH) within nutritional conditions (MA-PTH-experim) is important for Fd, Hd and Hl prediction models but its removal for enhancing the Fw, Fl and Hw model performance. Further, using one feature models, it is possible to obtain even more accurate models compared with the feature importance analysis models. In conclusion, the machine learning is an efficient method to confirm the important role of PTH and BALP mixed with nutritional conditions for fetal bone growth performance of goats. All the Python scripts including results and comments are available into an open repository at https://gitlab.com/muntisa/goat-bones-machine-learning.


2021 ◽  
pp. 109352662110016
Author(s):  
John Booth ◽  
Ben Margetts ◽  
Will Bryant ◽  
Richard Issitt ◽  
Ciaran Hutchinson ◽  
...  

Introduction Sudden unexpected death in infancy (SUDI) represents the commonest presentation of postneonatal death. We explored whether machine learning could be used to derive data driven insights for prediction of infant autopsy outcome. Methods A paediatric autopsy database containing >7,000 cases, with >300 variables, was analysed by examination stage and autopsy outcome classified as ‘explained (medical cause of death identified)’ or ‘unexplained’. Decision tree, random forest, and gradient boosting models were iteratively trained and evaluated. Results Data from 3,100 infant and young child (<2 years) autopsies were included. Naïve decision tree using external examination data had performance of 68% for predicting an explained death. Core data items were identified using model feature importance. The most effective model was XG Boost, with overall predictive performance of 80%, demonstrating age at death, and cardiovascular and respiratory histological findings as the most important variables associated with determining medical cause of death. Conclusion This study demonstrates feasibility of using machine-learning to evaluate component importance of complex medical procedures (paediatric autopsy) and highlights value of collecting routine clinical data according to defined standards. This approach can be applied to a range of clinical and operational healthcare scenarios


PAMM ◽  
2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Brice Coffer ◽  
Michaela Kubacki ◽  
Yixin Wen ◽  
Ting Zhang ◽  
Carlos A. Barajas ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document