Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis

ObjectivesMyositis is a heterogeneous family of diseases that includes dermatomyositis (DM), antisynthetase syndrome (AS), immune-mediated necrotising myopathy (IMNM), inclusion body myositis (IBM), polymyositis and overlap myositis. Additional subtypes of myositis can be defined by the presence of myositis-specific autoantibodies (MSAs). The purpose of this study was to define unique gene expression profiles in muscle biopsies from patients with MSA-positive DM, AS and IMNM as well as IBM.MethodsRNA-seq was performed on muscle biopsies from 119 myositis patients with IBM or defined MSAs and 20 controls. Machine learning algorithms were trained on transcriptomic data and recursive feature elimination was used to determine which genes were most useful for classifying muscle biopsies into each type and MSA-defined subtype of myositis.ResultsThe support vector machine learning algorithm classified the muscle biopsies with >90% accuracy. Recursive feature elimination identified genes that are most useful to the machine learning algorithm and that are only overexpressed in one type of myositis. For example, CAMK1G (calcium/calmodulin-dependent protein kinase IG), EGR4 (early growth response protein 4) and CXCL8 (interleukin 8) are highly expressed in AS but not in DM or other types of myositis. Using the same computational approach, we also identified genes that are uniquely overexpressed in different MSA-defined subtypes. These included apolipoprotein A4 (APOA4), which is only expressed in anti-3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) myopathy, and MADCAM1 (mucosal vascular addressin cell adhesion molecule 1), which is only expressed in anti-Mi2-positive DM.ConclusionsUnique gene expression profiles in muscle biopsies from patients with MSA-defined subtypes of myositis and IBM suggest that different pathological mechanisms underly muscle damage in each of these diseases.

Download Full-text

Response to: ‘Correspondence on ‘Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis’’ by Takanashi et al

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-219767 ◽

2021 ◽

pp. annrheumdis-2020-219767

Author(s):

Iago Pinal-Fernandez ◽

Maria Casal-Dominguez ◽

Jose Cesar Milisenda ◽

Andrew Lee Mammen

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Gene Expression Profiles ◽

Machine Learning Algorithms ◽

Unique Gene ◽

Different Types ◽

Muscle Biopsies

Download Full-text

Correspondence on ‘Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis’

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-219677 ◽

2020 ◽

pp. annrheumdis-2020-219677

Author(s):

Satoshi Takanashi

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Gene Expression Profiles ◽

Machine Learning Algorithms ◽

Unique Gene ◽

Different Types ◽

Muscle Biopsies

Download Full-text

Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms

Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease ◽

10.1016/j.bbadis.2020.165822 ◽

2020 ◽

Vol 1866 (8) ◽

pp. 165822 ◽

Cited By ~ 2

Author(s):

Fei Yuan ◽

Lin Lu ◽

Quan Zou

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Lung Cancer ◽

Expression Profiles ◽

Learning Algorithms ◽

Gene Expression Profiles ◽

Machine Learning Algorithms ◽

Cancer Subtypes

Download Full-text

Machine learning algorithms for the identification of potential non-syndromic intellectual disability associated miRNAs

10.21203/rs.3.rs-595856/v2 ◽

2021 ◽

Author(s):

Julián González Betancur ◽

José Guevara-Coto ◽

Adarli Romero

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Developmental Disorders ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Temporal Gene Expression ◽

Interesting Approach ◽

Spatio Temporal

Abstract Background: Intellectual disabilities (IDs) are a group of developmental disorders with high phenotypic and genotypic heterogeneity. Association of genetic elements to IDs has typically been empirically accomplished, however recently, machine learning (ML) has proved to be an excellent instrument to elucidate these associations. miRNAs are short non-coding molecules that participate in spatiotemporal gene regulation, making them relevant for the understanding ID causality. Methods: In this study we used the BrainSpan spatio-temporal expression database to develop a series of machine learning predictors: SVM, RF, FF-ANN, and Stochastic Gradient Descent Classifier. These models were capable of recognizing gene expression profiles. The best classifier was used to label miRNAs associated with NS-IDs using the BrainSpan expression profiles. Results: The model with the best performance was a FF-ANN with 0.78 of F1-score, 0.78 of weighted recall and 0.78 of weighted precision. We used this model to identify miRNAs with high probability to be associated with NS-IDs using the spatio-temporal gene expression profile in the human brain. Labeled miRNAs that were annotated were associated with processes related to either IDs and-or neurodevelopmental processes. Conclusions: The development of a machine learning framework that identified potential NS-ID miRNAs represents an interesting approach for the identification of a potential list of on genes that could be subject for further experimental validation. This study also reinforces the potential of machine learning frameworks in their discovery of potential biomarkers that could improve disease detection and management.

Download Full-text

Machine Learning Algorithms for the Identification of Potential Non-Syndromic Intellectual Disability Associated miRNAs

10.21203/rs.3.rs-595856/v1 ◽

2021 ◽

Author(s):

Julián González Betancur ◽

José A Guevara-Coto ◽

Adarli Romero

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Intellectual Disability ◽

Developmental Disorders ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Temporal Gene Expression ◽

Spatio Temporal

Download Full-text

New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data

Encyclopedia of Healthcare Information Systems ◽

10.4018/978-1-59904-889-5.ch122 ◽

2008 ◽

pp. 982-989 ◽

Cited By ~ 1

Author(s):

Ching Wei Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Experimental Results ◽

Machine Learning Algorithm ◽

Expression Data ◽

Ensemble Machine Learning

One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of classifiers. The main discovery is that the ensemble classifier often performs much better than single classifiers that make them up. Recent researches (Dettling, 2004, Tan & Gilbert, 2003) have confirmed the utility of ensemble machine learning algorithms for gene expression analysis. The motivation of this work is to investigate a suitable machine learning algorithm for classification and prediction on gene expression data. The research starts with analyzing the behavior and weaknesses of three popular ensemble machine learning methods—Bagging, Boosting, and Arcing—followed by presentation of a new ensemble machine learning algorithm. The proposed method is evaluated with the existing ensemble machine learning algorithms over 12 gene expression datasets (Alon et al., 1999; Armstrong et al., 2002; Ash et al., 2000; Catherine et al., 2003; Dinesh et al., 2002; Gavin et al., 2002; Golub et al., 1999; Scott et al., 2002; van ’t Veer et al., 2002; Yeoh et al., 2002; Zembutsu et al., 2002). The experimental results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy in classification. The outline of this chapter is as follows: Ensemble machine learning approach and three popular ensembles (i.e., Bagging, Boosting, and Arcing) are introduced first in the Background section; second, the analyses on existing ensembles, details of the proposed algorithm, and experimental results are presented in Method section, followed by discussions on the future trends and conclusion.

Download Full-text

Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms

Genomics ◽

10.1016/j.ygeno.2020.02.004 ◽

2020 ◽

Vol 112 (3) ◽

pp. 2524-2534 ◽

Cited By ~ 5

Author(s):

Lei Chen ◽

XiaoYong Pan ◽

Wei Guo ◽

Zijun Gan ◽

Yu-Hang Zhang ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Learning Algorithms ◽

Gene Expression Profiles ◽

Machine Learning Algorithms

Download Full-text

A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma—Carcinoma Sequence

Biomedicines ◽

10.3390/biomedicines9121937 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1937

Author(s):

Antonio Lacalamita ◽

Emanuele Piccinno ◽

Viviana Scalavino ◽

Roberto Bellotti ◽

Gianluigi Giannelli ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Selection Procedure ◽

Normal Mucosa ◽

Gene Expression Omnibus ◽

Machine Learning Algorithms ◽

Microarray Gene Expression ◽

Potential Biomarker

Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray gene expression profiles of primary CRC, adenoma, and normal colon epithelial tissues. Four gene expression profiles from the Gene Expression Omnibus database, containing 465 samples (105 normal, 155 adenoma, and 205 CRC), were preprocessed to identify differentially expressed genes (DEGs) between adenoma tissue and primary CRC. The feature selection procedure, using the sequential Boruta algorithm and Stepwise Regression, determined 56 highly important genes. K-Means methods showed that, using the selected 56 DEGs, the three groups were clearly separate. The classification was performed with machine learning algorithms such as Linear Model (LM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Artificial Neural Network (ANN). The best classification method in terms of accuracy (88.06 ± 0.70) and AUC (92.04 ± 0.47) was k-NN. To confirm the relevance of the predictive models, we applied the four models on a validation cohort: the k-NN model remained the best model in terms of performance, with 91.11% accuracy. Among the 56 DEGs, we identified 17 genes with an ascending or descending trend through the normal mucosa–adenoma–carcinoma sequence. Moreover, using the survival information of the TCGA database, we selected six DEGs related to patient prognosis (SCARA5, PKIB, CWH43, TEX11, METTL7A, and VEGFA). The six-gene-based classifier described in the current study could be used as a potential biomarker for the early diagnosis of CRC.

Download Full-text

A survey on prediction of diabetes using classification algorithms

Journal of Achievements of Materials and Manufacturing Engineering ◽

10.5604/01.3001.0014.8490 ◽

2021 ◽

Vol 2 (104) ◽

pp. 77-84

Author(s):

A. Khanwalkar ◽

R. Soni

Keyword(s):

Machine Learning ◽

Data Collection ◽

Learning Algorithm ◽

Algorithm Design ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Machine Learning Algorithm ◽

Collection Method ◽

Data Collection Method ◽

Diagnostic Center

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.

Download Full-text

Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

PeerJ ◽

10.7717/peerj.5285 ◽

2018 ◽

Vol 6 ◽

pp. e5285 ◽

Cited By ~ 9

Author(s):

Mei Sze Tan ◽

Siow-Wee Chang ◽

Phaik Leng Cheah ◽

Hwa Jen Yap

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Hpv Infection ◽

Gene Set Enrichment Analysis ◽

Multiple Gene ◽

Cervical Cancers ◽

Learning Analysis

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).

Download Full-text