scholarly journals Machine learning algorithms reveal unique gene expression profiles in muscle biopsies from patients with different types of myositis

2020 ◽  
Vol 79 (9) ◽  
pp. 1234-1242 ◽  
Author(s):  
Iago Pinal-Fernandez ◽  
Maria Casal-Dominguez ◽  
Assia Derfoul ◽  
Katherine Pak ◽  
Frederick W Miller ◽  
...  

ObjectivesMyositis is a heterogeneous family of diseases that includes dermatomyositis (DM), antisynthetase syndrome (AS), immune-mediated necrotising myopathy (IMNM), inclusion body myositis (IBM), polymyositis and overlap myositis. Additional subtypes of myositis can be defined by the presence of myositis-specific autoantibodies (MSAs). The purpose of this study was to define unique gene expression profiles in muscle biopsies from patients with MSA-positive DM, AS and IMNM as well as IBM.MethodsRNA-seq was performed on muscle biopsies from 119 myositis patients with IBM or defined MSAs and 20 controls. Machine learning algorithms were trained on transcriptomic data and recursive feature elimination was used to determine which genes were most useful for classifying muscle biopsies into each type and MSA-defined subtype of myositis.ResultsThe support vector machine learning algorithm classified the muscle biopsies with >90% accuracy. Recursive feature elimination identified genes that are most useful to the machine learning algorithm and that are only overexpressed in one type of myositis. For example, CAMK1G (calcium/calmodulin-dependent protein kinase IG), EGR4 (early growth response protein 4) and CXCL8 (interleukin 8) are highly expressed in AS but not in DM or other types of myositis. Using the same computational approach, we also identified genes that are uniquely overexpressed in different MSA-defined subtypes. These included apolipoprotein A4 (APOA4), which is only expressed in anti-3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) myopathy, and MADCAM1 (mucosal vascular addressin cell adhesion molecule 1), which is only expressed in anti-Mi2-positive DM.ConclusionsUnique gene expression profiles in muscle biopsies from patients with MSA-defined subtypes of myositis and IBM suggest that different pathological mechanisms underly muscle damage in each of these diseases.

2021 ◽  
Author(s):  
Julián González Betancur ◽  
José Guevara-Coto ◽  
Adarli Romero

Abstract Background: Intellectual disabilities (IDs) are a group of developmental disorders with high phenotypic and genotypic heterogeneity. Association of genetic elements to IDs has typically been empirically accomplished, however recently, machine learning (ML) has proved to be an excellent instrument to elucidate these associations. miRNAs are short non-coding molecules that participate in spatiotemporal gene regulation, making them relevant for the understanding ID causality. Methods: In this study we used the BrainSpan spatio-temporal expression database to develop a series of machine learning predictors: SVM, RF, FF-ANN, and Stochastic Gradient Descent Classifier. These models were capable of recognizing gene expression profiles. The best classifier was used to label miRNAs associated with NS-IDs using the BrainSpan expression profiles. Results: The model with the best performance was a FF-ANN with 0.78 of F1-score, 0.78 of weighted recall and 0.78 of weighted precision. We used this model to identify miRNAs with high probability to be associated with NS-IDs using the spatio-temporal gene expression profile in the human brain. Labeled miRNAs that were annotated were associated with processes related to either IDs and-or neurodevelopmental processes. Conclusions: The development of a machine learning framework that identified potential NS-ID miRNAs represents an interesting approach for the identification of a potential list of on genes that could be subject for further experimental validation. This study also reinforces the potential of machine learning frameworks in their discovery of potential biomarkers that could improve disease detection and management.


2021 ◽  
Author(s):  
Julián González Betancur ◽  
José A Guevara-Coto ◽  
Adarli Romero

Abstract Background: Intellectual disabilities (IDs) are a group of developmental disorders with high phenotypic and genotypic heterogeneity. Association of genetic elements to IDs has typically been empirically accomplished, however recently, machine learning (ML) has proved to be an excellent instrument to elucidate these associations. miRNAs are short non-coding molecules that participate in spatiotemporal gene regulation, making them relevant for the understanding ID causality. Methods: In this study we used the BrainSpan spatio-temporal expression database to develop a series of machine learning predictors: SVM, RF, FF-ANN, and Stochastic Gradient Descent Classifier. These models were capable of recognizing gene expression profiles. The best classifier was used to label miRNAs associated with NS-IDs using the BrainSpan expression profiles. Results: The model with the best performance was a FF-ANN with 0.78 of F1-score, 0.78 of weighted recall and 0.78 of weighted precision. We used this model to identify miRNAs with high probability to be associated with NS-IDs using the spatio-temporal gene expression profile in the human brain. Labeled miRNAs that were annotated were associated with processes related to either IDs and-or neurodevelopmental processes. Conclusions: The development of a machine learning framework that identified potential NS-ID miRNAs represents an interesting approach for the identification of a potential list of on genes that could be subject for further experimental validation. This study also reinforces the potential of machine learning frameworks in their discovery of potential biomarkers that could improve disease detection and management. Keywords: miRNA association; artificial intelligence; machine learning; intellectual disability; biomarker


Author(s):  
Ching Wei Wang

One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of classifiers. The main discovery is that the ensemble classifier often performs much better than single classifiers that make them up. Recent researches (Dettling, 2004, Tan & Gilbert, 2003) have confirmed the utility of ensemble machine learning algorithms for gene expression analysis. The motivation of this work is to investigate a suitable machine learning algorithm for classification and prediction on gene expression data. The research starts with analyzing the behavior and weaknesses of three popular ensemble machine learning methods—Bagging, Boosting, and Arcing—followed by presentation of a new ensemble machine learning algorithm. The proposed method is evaluated with the existing ensemble machine learning algorithms over 12 gene expression datasets (Alon et al., 1999; Armstrong et al., 2002; Ash et al., 2000; Catherine et al., 2003; Dinesh et al., 2002; Gavin et al., 2002; Golub et al., 1999; Scott et al., 2002; van ’t Veer et al., 2002; Yeoh et al., 2002; Zembutsu et al., 2002). The experimental results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy in classification. The outline of this chapter is as follows: Ensemble machine learning approach and three popular ensembles (i.e., Bagging, Boosting, and Arcing) are introduced first in the Background section; second, the analyses on existing ensembles, details of the proposed algorithm, and experimental results are presented in Method section, followed by discussions on the future trends and conclusion.


Genomics ◽  
2020 ◽  
Vol 112 (3) ◽  
pp. 2524-2534 ◽  
Author(s):  
Lei Chen ◽  
XiaoYong Pan ◽  
Wei Guo ◽  
Zijun Gan ◽  
Yu-Hang Zhang ◽  
...  

Biomedicines ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 1937
Author(s):  
Antonio Lacalamita ◽  
Emanuele Piccinno ◽  
Viviana Scalavino ◽  
Roberto Bellotti ◽  
Gianluigi Giannelli ◽  
...  

Colorectal cancer (CRC) carcinogenesis is generally the result of the sequential mutation and deletion of various genes; this is known as the normal mucosa–adenoma–carcinoma sequence. The aim of this study was to develop a predictor-classifier during the “adenoma-carcinoma” sequence using microarray gene expression profiles of primary CRC, adenoma, and normal colon epithelial tissues. Four gene expression profiles from the Gene Expression Omnibus database, containing 465 samples (105 normal, 155 adenoma, and 205 CRC), were preprocessed to identify differentially expressed genes (DEGs) between adenoma tissue and primary CRC. The feature selection procedure, using the sequential Boruta algorithm and Stepwise Regression, determined 56 highly important genes. K-Means methods showed that, using the selected 56 DEGs, the three groups were clearly separate. The classification was performed with machine learning algorithms such as Linear Model (LM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Artificial Neural Network (ANN). The best classification method in terms of accuracy (88.06 ± 0.70) and AUC (92.04 ± 0.47) was k-NN. To confirm the relevance of the predictive models, we applied the four models on a validation cohort: the k-NN model remained the best model in terms of performance, with 91.11% accuracy. Among the 56 DEGs, we identified 17 genes with an ascending or descending trend through the normal mucosa–adenoma–carcinoma sequence. Moreover, using the survival information of the TCGA database, we selected six DEGs related to patient prognosis (SCARA5, PKIB, CWH43, TEX11, METTL7A, and VEGFA). The six-gene-based classifier described in the current study could be used as a potential biomarker for the early diagnosis of CRC.


Author(s):  
A. Khanwalkar ◽  
R. Soni

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5285 ◽  
Author(s):  
Mei Sze Tan ◽  
Siow-Wee Chang ◽  
Phaik Leng Cheah ◽  
Hwa Jen Yap

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).


Sign in / Sign up

Export Citation Format

Share Document