What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis?

Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.

Download Full-text

Machine learning models for drug–target interactions: current knowledge and future directions

Drug Discovery Today ◽

10.1016/j.drudis.2020.03.003 ◽

2020 ◽

Vol 25 (4) ◽

pp. 748-756 ◽

Cited By ~ 7

Author(s):

Sofia D’Souza ◽

K.V. Prema ◽

Seetharaman Balaji

Keyword(s):

Machine Learning ◽

Drug Target ◽

Current Knowledge ◽

Learning Models ◽

Future Directions ◽

Machine Learning Models

Download Full-text

Permutation-based identification of important biomarkers for complex diseases via machine learning models

Nature Communications ◽

10.1038/s41467-021-22756-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Xinlei Mi ◽

Baiming Zou ◽

Fei Zou ◽

Jianhua Hu

Keyword(s):

Machine Learning ◽

Human Disease ◽

Molecular Mechanisms ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Individual Feature ◽

Learning Models ◽

Efficient Manner ◽

Feature Importance ◽

Machine Learning Models

AbstractStudy of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Many machine learning-based methods have been developed and widely used to alleviate some analytic challenges in complex human disease studies. While enjoying the modeling flexibility and robustness, these model frameworks suffer from non-transparency and difficulty in interpreting each individual feature due to their sophisticated algorithms. However, identifying important biomarkers is a critical pursuit towards assisting researchers to establish novel hypotheses regarding prevention, diagnosis and treatment of complex human diseases. Herein, we propose a Permutation-based Feature Importance Test (PermFIT) for estimating and testing the feature importance, and for assisting interpretation of individual feature in complex frameworks, including deep neural networks, random forests, and support vector machines. PermFIT (available at https://github.com/SkadiEye/deepTL) is implemented in a computationally efficient manner, without model refitting. We conduct extensive numerical studies under various scenarios, and show that PermFIT not only yields valid statistical inference, but also improves the prediction accuracy of machine learning models. With the application to the Cancer Genome Atlas kidney tumor data and the HITChip atlas data, PermFIT demonstrates its practical usage in identifying important biomarkers and boosting model prediction performance.

Download Full-text

Mitochondria Dysfunction in Frontotemporal Dementia/Amyotrophic Lateral Sclerosis: Lessons From Drosophila Models

Frontiers in Neuroscience ◽

10.3389/fnins.2021.786076 ◽

2021 ◽

Vol 15 ◽

Author(s):

Sharifah Anoar ◽

Nathaniel S. Woodling ◽

Teresa Niccoli

Keyword(s):

Amyotrophic Lateral Sclerosis ◽

Frontotemporal Dementia ◽

Molecular Mechanisms ◽

Current Knowledge ◽

Mechanisms Of Toxicity ◽

Disease Spectrum ◽

Animal Disease Models ◽

Rapid Generation ◽

Induced Pluripotent ◽

Lateral Sclerosis

Frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS) are neurodegenerative disorders characterized by declining motor and cognitive functions. Even though these diseases present with distinct sets of symptoms, FTD and ALS are two extremes of the same disease spectrum, as they show considerable overlap in genetic, clinical and neuropathological features. Among these overlapping features, mitochondrial dysfunction is associated with both FTD and ALS. Recent studies have shown that cells derived from patients’ induced pluripotent stem cells (iPSC)s display mitochondrial abnormalities, and similar abnormalities have been observed in a number of animal disease models. Drosophila models have been widely used to study FTD and ALS because of their rapid generation time and extensive set of genetic tools. A wide array of fly models have been developed to elucidate the molecular mechanisms of toxicity for mutations associated with FTD/ALS. Fly models have been often instrumental in understanding the role of disease associated mutations in mitochondria biology. In this review, we discuss how mutations associated with FTD/ALS disrupt mitochondrial function, and we review how the use of Drosophila models has been pivotal to our current knowledge in this field.

Download Full-text

Using machine learning to predict quantitative phenotypes from protein and nucleic acid sequences

10.1101/677328 ◽

2019 ◽

Author(s):

David B. Sauer ◽

Da-Neng Wang

Keyword(s):

Machine Learning ◽

Nucleic Acid ◽

Molecular Mechanisms ◽

Mean Squared Error ◽

Fluorescent Proteins ◽

Optimal Growth ◽

Multilayer Perceptrons ◽

Learning Models ◽

The Relationship ◽

Machine Learning Models

AbstractThe link between sequence and phenotype is essential to understanding the molecular mechanisms of evolution, and the design of proteins and genes with specific properties. However, it is difficult to describe the relationship between sequence and protein or organismal phenotypes, due to the complex relationship between sequence, protein folding and activity, and organismal physiology. Here, we use machine learning models trained on individual families of proteins or nucleic acids to predict the originating species’ optimal growth temperatures or other quantitative phenotypes. Trained multilayer perceptrons (MLPs) outperformed linear regressions in predicting the originating species growth temperature from protein sequences, achieving a root mean squared error of 3.6 °C. Similar machine learning models were able to predict the binding affinity of mutant WW domain sequences, brightness of fluorescent proteins, and enzymatic activity of ribozymes. Notably, the trained models are protein or nucleic acid family specific and therefore useful in the design of biopolymers with particular properties. This method provides a new tool for the in silico prediction of quantitative biophysical and organismal phenotypes directly from sequence.

Download Full-text

Predicting Bone Metastasis Using Gene Expression-Based Machine Learning Models

Frontiers in Genetics ◽

10.3389/fgene.2021.771092 ◽

2021 ◽

Vol 12 ◽

Author(s):

Somayah Albaradei ◽

Mahmut Uludag ◽

Maha A. Thafar ◽

Takashi Gojobori ◽

Magbubah Essack ◽

...

Keyword(s):

Machine Learning ◽

Adverse Effects ◽

Predictive Value ◽

Molecular Mechanisms ◽

Malignant Tumors ◽

Life Quality ◽

Cord Compression ◽

Learning Models ◽

Tcga Dataset ◽

Machine Learning Models

Bone is the most common site of distant metastasis from malignant tumors, with the highest prevalence observed in breast and prostate cancers. Such bone metastases (BM) cause many painful skeletal-related events, such as severe bone pain, pathological fractures, spinal cord compression, and hypercalcemia, with adverse effects on life quality. Many bone-targeting agents developed based on the current understanding of BM onset’s molecular mechanisms dull these adverse effects. However, only a few studies investigated potential predictors of high risk for developing BM, despite such knowledge being critical for early interventions to prevent or delay BM. This work proposes a computational network-based pipeline that incorporates a ML/DL component to predict BM development. Based on the proposed pipeline we constructed several machine learning models. The deep neural network (DNN) model exhibited the highest prediction accuracy (AUC of 92.11%) using the top 34 featured genes ranked by betweenness centrality scores. We further used an entirely separate, “external” TCGA dataset to evaluate the robustness of this DNN model and achieved sensitivity of 85%, specificity of 80%, positive predictive value of 78.10%, negative predictive value of 80%, and AUC of 85.78%. The result shows the models’ way of learning allowed it to zoom in on the featured genes that provide the added benefit of the model displaying generic capabilities, that is, to predict BM for samples from different primary sites. Furthermore, existing experimental evidence provides confidence that about 50% of the 34 hub genes have BM-related functionality, which suggests that these common genetic markers provide vital insight about BM drivers. These findings may prompt the transformation of such a method into an artificial intelligence (AI) diagnostic tool and direct us towards mechanisms that underlie metastasis to bone events.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Cellular and molecular mechanisms of motor neuron death in amyotrophic lateral sclerosis

10.3389/978-2-88919-376-9 ◽

2015 ◽

Keyword(s):

Amyotrophic Lateral Sclerosis ◽

Motor Neuron ◽

Molecular Mechanisms ◽

Neuron Death ◽

Motor Neuron Death ◽

Lateral Sclerosis

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text