Machine learning algorithms for the identification of potential non-syndromic intellectual disability associated miRNAs

Stochastic Gradient Descent ◽

Temporal Gene Expression ◽

Spatio Temporal

Abstract Background: Intellectual disabilities (IDs) are a group of developmental disorders with high phenotypic and genotypic heterogeneity. Association of genetic elements to IDs has typically been empirically accomplished, however recently, machine learning (ML) has proved to be an excellent instrument to elucidate these associations. miRNAs are short non-coding molecules that participate in spatiotemporal gene regulation, making them relevant for the understanding ID causality. Methods: In this study we used the BrainSpan spatio-temporal expression database to develop a series of machine learning predictors: SVM, RF, FF-ANN, and Stochastic Gradient Descent Classifier. These models were capable of recognizing gene expression profiles. The best classifier was used to label miRNAs associated with NS-IDs using the BrainSpan expression profiles. Results: The model with the best performance was a FF-ANN with 0.78 of F1-score, 0.78 of weighted recall and 0.78 of weighted precision. We used this model to identify miRNAs with high probability to be associated with NS-IDs using the spatio-temporal gene expression profile in the human brain. Labeled miRNAs that were annotated were associated with processes related to either IDs and-or neurodevelopmental processes. Conclusions: The development of a machine learning framework that identified potential NS-ID miRNAs represents an interesting approach for the identification of a potential list of on genes that could be subject for further experimental validation. This study also reinforces the potential of machine learning frameworks in their discovery of potential biomarkers that could improve disease detection and management. Keywords: miRNA association; artificial intelligence; machine learning; intellectual disability; biomarker

Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms

Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease ◽

10.1016/j.bbadis.2020.165822 ◽

2020 ◽

Vol 1866 (8) ◽

pp. 165822 ◽

Cited By ~ 2

Author(s):

Fei Yuan ◽

Lin Lu ◽

Quan Zou

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Lung Cancer ◽

Expression Profiles ◽

Learning Algorithms ◽

Cancer Subtypes

Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2019-216599 ◽

2020 ◽

Vol 79 (9) ◽

pp. 1234-1242 ◽

Cited By ~ 5

Author(s):

Iago Pinal-Fernandez ◽

Maria Casal-Dominguez ◽

Assia Derfoul ◽

Katherine Pak ◽

Frederick W Miller ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Learning Algorithm ◽

Expression Profiles ◽

Recursive Feature Elimination ◽

Machine Learning Algorithm ◽

Unique Gene ◽

Muscle Biopsies

ObjectivesMyositis is a heterogeneous family of diseases that includes dermatomyositis (DM), antisynthetase syndrome (AS), immune-mediated necrotising myopathy (IMNM), inclusion body myositis (IBM), polymyositis and overlap myositis. Additional subtypes of myositis can be defined by the presence of myositis-specific autoantibodies (MSAs). The purpose of this study was to define unique gene expression profiles in muscle biopsies from patients with MSA-positive DM, AS and IMNM as well as IBM.MethodsRNA-seq was performed on muscle biopsies from 119 myositis patients with IBM or defined MSAs and 20 controls. Machine learning algorithms were trained on transcriptomic data and recursive feature elimination was used to determine which genes were most useful for classifying muscle biopsies into each type and MSA-defined subtype of myositis.ResultsThe support vector machine learning algorithm classified the muscle biopsies with >90% accuracy. Recursive feature elimination identified genes that are most useful to the machine learning algorithm and that are only overexpressed in one type of myositis. For example, CAMK1G (calcium/calmodulin-dependent protein kinase IG), EGR4 (early growth response protein 4) and CXCL8 (interleukin 8) are highly expressed in AS but not in DM or other types of myositis. Using the same computational approach, we also identified genes that are uniquely overexpressed in different MSA-defined subtypes. These included apolipoprotein A4 (APOA4), which is only expressed in anti-3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) myopathy, and MADCAM1 (mucosal vascular addressin cell adhesion molecule 1), which is only expressed in anti-Mi2-positive DM.ConclusionsUnique gene expression profiles in muscle biopsies from patients with MSA-defined subtypes of myositis and IBM suggest that different pathological mechanisms underly muscle damage in each of these diseases.

Response to: ‘Correspondence on ‘Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis’’ by Takanashi et al

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-219767 ◽

2021 ◽

pp. annrheumdis-2020-219767

Author(s):

Iago Pinal-Fernandez ◽

Maria Casal-Dominguez ◽

Jose Cesar Milisenda ◽

Andrew Lee Mammen

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Unique Gene ◽

Different Types ◽

Muscle Biopsies

Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms

Genomics ◽

10.1016/j.ygeno.2020.02.004 ◽

2020 ◽

Vol 112 (3) ◽

pp. 2524-2534 ◽

Cited By ~ 5

Author(s):

Lei Chen ◽

XiaoYong Pan ◽

Wei Guo ◽

Zijun Gan ◽

Yu-Hang Zhang ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Machine Learning Algorithms

Correspondence on ‘Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis’

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-219677 ◽

2020 ◽

pp. annrheumdis-2020-219677

Author(s):

Satoshi Takanashi

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Unique Gene ◽

Different Types ◽

Muscle Biopsies

A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence

Biomedicines ◽

10.3390/biomedicines9121937 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1937

Author(s):

Antonio Lacalamita ◽

Emanuele Piccinno ◽

Viviana Scalavino ◽

Roberto Bellotti ◽

Gianluigi Giannelli ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Selection Procedure ◽

Normal Mucosa ◽

Gene Expression Omnibus ◽

Microarray Gene Expression ◽

Potential Biomarker

Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray gene expression profiles of primary CRC, adenoma, and normal colon epithelial tissues. Four gene expression profiles from the Gene Expression Omnibus database, containing 465 samples (105 normal, 155 adenoma, and 205 CRC), were preprocessed to identify differentially expressed genes (DEGs) between adenoma tissue and primary CRC. The feature selection procedure, using the sequential Boruta algorithm and Stepwise Regression, determined 56 highly important genes. K-Means methods showed that, using the selected 56 DEGs, the three groups were clearly separate. The classification was performed with machine learning algorithms such as Linear Model (LM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Artificial Neural Network (ANN). The best classification method in terms of accuracy (88.06 ± 0.70) and AUC (92.04 ± 0.47) was k-NN. To confirm the relevance of the predictive models, we applied the four models on a validation cohort: the k-NN model remained the best model in terms of performance, with 91.11% accuracy. Among the 56 DEGs, we identified 17 genes with an ascending or descending trend through the normal mucosa–adenoma–carcinoma sequence. Moreover, using the survival information of the TCGA database, we selected six DEGs related to patient prognosis (SCARA5, PKIB, CWH43, TEX11, METTL7A, and VEGFA). The six-gene-based classifier described in the current study could be used as a potential biomarker for the early diagnosis of CRC.

Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

PeerJ ◽

10.7717/peerj.5285 ◽

2018 ◽

Vol 6 ◽

pp. e5285 ◽

Cited By ~ 9

Author(s):

Mei Sze Tan ◽

Siow-Wee Chang ◽

Phaik Leng Cheah ◽

Hwa Jen Yap

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Expression Profiles ◽

Hpv Infection ◽

Gene Set Enrichment Analysis ◽

Multiple Gene ◽

Cervical Cancers ◽

Learning Analysis

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).