A Machine Learning Based Method to Identify Differentially Expressed Genes

Integrated Bioinformatics and Machine Learning Algorithms Analyses Highlight Related Pathways and Genes Associated with Alzheimer's Disease

Current Bioinformatics ◽

10.2174/1574893617666211220154326 ◽

2021 ◽

Vol 17 ◽

Author(s):

Hui Zhang ◽

Qidong Liu ◽

Xiaoru Sun ◽

Yaru Xu ◽

Yiling Fang ◽

...

Keyword(s):

Machine Learning ◽

Network Analysis ◽

Predictive Model ◽

Differentially Expressed Genes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Differentially Expressed ◽

Diagnosis And Treatment ◽

Ppi Network ◽

Key Genes

Background: The pathophysiology of Alzheimer's disease (AD) is still not fully studied. Objective: This study aimed to explore the differently expressed key genes in AD and build a predictive model of diagnosis and treatment. Methods: Gene expression data of the entorhinal cortex of AD, asymptomatic AD, and control samples from the GEO database were analyzed to explore the relevant pathways and key genes in the progression of AD. Differentially expressed genes between AD and the other two groups in the module were selected to identify biological mechanisms in AD through KEGG and PPI network analysis in Metascape. Furthermore, genes with a high connectivity degree by PPI network analysis were selected to build a predictive model using different machine learning algorithms. Besides, model performance was tested with five-fold cross-validation to select the best fitting model. Results: A total of 20 co-expression gene clusters were identified after the network was constructed. Module 1 (in black) and module 2 (in royal blue) were most positively and negatively correlated with AD, respectively. Total 565 genes in module 1 and 215 genes in module 2, respectively, overlapped in two differentially expressed genes lists. They were enriched in the G protein-coupled receptor signaling pathway, immune-related processes, and so on. 11 genes were screened by using lasso logistic regression, and they were considered to play an important role in predicting AD samples. The model built by the support vector machine algorithm with 11 genes showed the best performance. Conclusion: This result shed light on the diagnosis and treatment of AD.

Download Full-text

Non-invasive diagnostic tool for Parkinson’s disease by sebum RNA profile with machine learning

Scientific Reports ◽

10.1038/s41598-021-98423-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yuya Uehara ◽

Shin-Ichi Ueno ◽

Haruka Amano-Takeshige ◽

Shuji Suzuki ◽

Yoko Imamichi ◽

...

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Differentially Expressed Genes ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Alpha Synuclein ◽

Differentially Expressed ◽

Healthy Controls ◽

Non Invasive

AbstractParkinson's disease (PD) is a progressive neurodegenerative disease presenting with motor and non-motor symptoms, including skin disorders (seborrheic dermatitis, bullous pemphigoid, and rosacea), skin pathological changes (decreased nerve endings and alpha-synuclein deposition), and metabolic changes of sebum. Recently, a transcriptome method using RNA in skin surface lipids (SSL-RNAs) which can be obtained non-invasively with an oil-blotting film was reported as a novel analytic method of sebum. Here we report transcriptome analyses using SSL-RNAs and the potential of these expression profiles with machine learning as diagnostic biomarkers for PD in double cohorts (PD [n = 15, 50], controls [n = 15, 50]). Differential expression analysis between the patients with PD and healthy controls identified more than 100 differentially expressed genes in the two cohorts. In each cohort, several genes related to oxidative phosphorylation were upregulated, and gene ontology analysis using differentially expressed genes revealed functional processes associated with PD. Furthermore, machine learning using the expression information obtained from the SSL-RNAs was able to efficiently discriminate patients with PD from healthy controls, with an area under the receiver operating characteristic curve of 0.806. This non-invasive gene expression profile of SSL-RNAs may contribute to early PD diagnosis based on the neurodegeneration background.

Download Full-text

Weighted gene co-expression network analysis combined with machine learning to identify prognostic biomarkers for cervical squamous cell carcinoma.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e17000 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e17000-e17000

Author(s):

Yimin Li ◽

Mei Lan ◽

Xinhao Peng ◽

Zijian Zhang ◽

Jin Yi Lang

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

High Risk ◽

Network Analysis ◽

Differentially Expressed Genes ◽

Squamous Cell ◽

Differentially Expressed ◽

Prognostic Biomarkers ◽

Cox Regression Analysis ◽

Testing Set

e17000 Background: Cervical cancer represents the fourth most frequently diagnosed malignancy affecting women all over the world. However, effective prognostic biomarkers are still limited for accurate identifying high-risk patients. Here, we provide a co-expression network and machine learning-based signature to predict the survival of cervical cancer. Methods: Utilizing expression profiles of The Cancer Genome Atlas datasets, we identified differentially expressed genes (DEGs) and the most significantly module by differential expression analysis and Weighted Gene Co-expression Network Analysis, respectively. The candidate genes was obtained by combining the both results. Then the prognostic classifier was constructed by LASSO COX regression analysis and validated in testing set. Finally, survival receiver operating characteristic and Cox proportional hazards analysis was used to assess the performance of prognostic prediction. Results: We identified 190 differentially expressed genes (DEGs) between cervical squamous cell cancer(CSCC) and normal samples in purple module. Next we built a 8-mRNA-based signature, and determined a optimal cutoff value with sensitivity of 0.889 and specificity of 0.785. Patients were classified into high-risk and low-risk group with significantly different overall survival(training set: p < 0.0001; testing set: p = 0.039). Furthermore, the prognostic classifier was an independent and powerful prognostic biomarker for OS (HR = 7.05, 95% CI: 2.52-19.71, p < 0.001). Conclusions: The prognostic classifier is a promising predictor of CSCC patients, the novel co-expression network and machine learning-based strategy described in the study may have a broad application in precision medicine.

Download Full-text

Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms

Genes ◽

10.3390/genes9030155 ◽

2018 ◽

Vol 9 (3) ◽

pp. 155 ◽

Cited By ~ 36

Author(s):

Deling Wang ◽

Jia-Rui Li ◽

Yu-Hang Zhang ◽

Lei Chen ◽

Tao Huang ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Differentially Expressed Genes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Differentially Expressed

Download Full-text

MLDEG: A Machine Learning Approach to Identify Differentially Expressed Genes Using Network Property and Network Propagation

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3067613 ◽

2021 ◽

pp. 1-1

Author(s):

Ji Hwan Moon ◽

Sangseon Lee ◽

Minwoo Pak ◽

Benjamin Hur ◽

Sun Kim

Keyword(s):

Machine Learning ◽

Differentially Expressed Genes ◽

Differentially Expressed ◽

Learning Approach ◽

Network Property ◽

Machine Learning Approach ◽

Network Propagation

Download Full-text

Biomarker Screening And Prediction Model Construction of Esophageal Carcinoma Based On Bioinformatics

10.21203/rs.3.rs-915949/v1 ◽

2021 ◽

Author(s):

Yanzhou Zhang ◽

Qing Zhu ◽

Xiufeng Cao ◽

Bin Ni

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Differentially Expressed Genes ◽

Cross Validation ◽

Prediction Models ◽

Differentially Expressed ◽

Expression Data ◽

Rna Seq ◽

Gene Modules ◽

Fold Cross Validation

Abstract Background and objective: Esophageal cancer(ESCA) ranks eleventh in incidence and eighth in mortality among malignant tumors in the world. Due to the paucity of effective early diagnostic approach, a lot of patients have missed the first-rank treatment time frame and were already in the advanced phase at their first diagnosis. The continuous reforming of high-throughput sequencing technologies and analytical techniques has provided novel concepts and approaches for the study of cancer biomarkers in esophageal cancer. The development of cancer is a complex biological process with multi-gene concernment, multi-factor mutual effect and multi-phase development. This process includes the mutations in proto-oncogenes, changes in transcript expression profiles, and abnormalities of protein structure, function, or expression levels. The study of the molecular mechanism of ESCA using high-throughput sequencing technology will lay theoretic foundation for the early diagnosis and targeted therapy of ESCA.Materials and methods: In this study, a search was conducted in tow commonly used public databases, UCSC XENA and GEO, one UCSC XENA RNA-seq data and tow GEO datasets were included in this study. Differential expression analysis was implemented by using limma in R software.Weighted gene co-expression network analysis (WGCNA) was used to analyze the gene transcriptome expression profile consisting of 181 ESCA tissues and 181 normal tissues as controls to construct topology network. We constructed gene modules and searched for gene modules that were closely participant to ESCA, and gene ontology (GO) and KEGG pathway enrichment analysis were implemented to probe into the functions of the DEGs and differentially expressed hub genes in key modules. By combining the consequences of differential gene expression analysis with WGCNA consequences(hub genes), we procured a 30 of differentially expressed genes in module that were closely participant to ESCA. Next, we procured the expression data of these genes from normalized transcriptome expression data to construct ESCA predictive model. Then, ten-fold cross validation combining with machine learning algorithms were used to construct prediction models for ESCA. Finally, we also verified the four screened biomarkers which used to build the predictive model with the GEO data sets.Results: Analysis of differentially expressed genes were conducted by using the limma packages and differentially expressed genes were defined as |log2FC|>1 and adj.P.Val < 0.01. After comparison the results from limma, a total of 15814 genes were up-regulated in ESCA, a total of 6176 gene were down-regulated in ESCA.A total of 7 gene modules were identified from WGCNA, 2 modules of them are strongly corelative with ESCA (Brown module: R2=0.87, Lightcyan module: R2=-0.75, both P <0.001). Brown module is closely related to ESCA.The consequences of WGCNA analysis combined with differentially expressed genes revealed that there were 4419 differentially expressed genes in the brown module which were closely related to ESCA. 30 hub gene were screened by kWithin top 30 from brown module, and all of them are differentially expressed.GO analysis of differetially expressed genes from brown module revealed that these genes are from immunoglobulin complex, “chromosome, centromeric region”, condensed chromosome, “immunoglobulin complex, circulating”, condensed chromosome, centromeric region, and other components, and they participated in biological function such as antigen binding, immunoglobulin receptor binding, ATPase activity, cadherin binding, DNA helicase activity, etc., involved in biological processes such as adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains, mitotic nuclear division, lymphocyte mediated immunity, nuclear division, and DNA replication; KEGG pathway analysis shows the brown module differentially expressed genes are mainly enriched in signal pathways such as cell cycle, pathogenic escherichia coli infection, DNA replication, IL-17 signaling pathway and human T-cell leukemia virus 1 infection. This shed new light on molecular mechanisms of the development of ESCA.Twelve ESCA prediction models constructed from 30 gene expression matrices from 362 subjects by using 10-fold cross-validation combined with machine learning algorithms revealed good prediction performance in validation dataset, among which models from gbm, BoostGLM, C5.0 algorithms revealed higher accuracy than from other algorithms. Although the transparent or semi-transparent models constructed by JRip, PART, and Rpart algorithms have acceptable accuracy in validation dataset, their sensitivity are lower. From a comprehensive perspective, two black box algorithm models including gbm and BoostGLM models are selected as the final model. This study has successfully constructed ESCA prediction models with accuracies higher than 0.97.Finally, three of the four screened biomarkers were validated.Conclusions: In current study, differential expression analysis and WGCNA of ESCA participant RNA-seq data available in public database were used to screen DEGs and genes that were closely participant with ESCA. Consequences from GO and KEGG analysis further revealed the underlying mechanisms of ESCA. Normalized gene expression data was feed to several different machine learning techniques and 10-fold cross validation was used to construct high accuracy ESCA predictive models. Eventually, several ESCA predictive models with accuracy higher than 0.96 in validation group were constructed. At the meantime, three biomarkers(G3BP1, CHEK1 and MOB1A) were screened and validated, in particular, G3BP1 may be a potential therapeutic target, as overall survival analysis have shown it to be an adverse prognostic factor. Current study has lay the basis of applying RNA-seq data in the early genetic diagnosis of ESCA, and a prognostic marker that might contribute to treatment of ESCA.

Download Full-text

Differentially expressed genes in mouse liver during development of fatty liver disease (FLD)

Zeitschrift für Gastroenterologie ◽

10.1055/s-2004-831736 ◽

2004 ◽

Vol 42 (08) ◽

Author(s):

M Villagrasa ◽

DM Klass ◽

KH Holzmann ◽

G Adler ◽

M Fuchs

Keyword(s):

Liver Disease ◽

Fatty Liver ◽

Differentially Expressed Genes ◽

Fatty Liver Disease ◽

Mouse Liver ◽

Differentially Expressed

Download Full-text

IDENTIFICATION OF HIGHLY DIFFERENTIALLY EXPRESSED GENES IN PRIMARY OVARIAN CANCER AND RELATED DISTANT METASTASIS USING RNA SEQUENCING

10.26226/morressier.599bdc79d462b80296ca16e8 ◽

2017 ◽

Author(s):

Hanna Sallinen

Keyword(s):

Ovarian Cancer ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Distant Metastasis ◽

Differentially Expressed ◽

Primary Ovarian Cancer

Download Full-text

Differentially expressed genes in peripheral blood of patients with dermatomyositis complicated by interstitial lung disease or malignant tumors

Chinese Journal of Dermatology ◽

10.35541/cjd.20190593 ◽

2019 ◽

Keyword(s):

Interstitial Lung Disease ◽

Lung Disease ◽

Differentially Expressed Genes ◽

Peripheral Blood ◽

Malignant Tumors ◽

Differentially Expressed

Download Full-text

Cloning and expression analysis of differentially expressed genes in Chinese fir stems treated by different concentrations of exogenous IAA

Hereditas (Beijing) ◽

10.3724/sp.j.1005.2012.00472 ◽

2012 ◽

Vol 34 (4) ◽

pp. 472-484

Author(s):

Li-Wei YANG ◽

Ji-Sen SHI

Keyword(s):

Differentially Expressed Genes ◽

Expression Analysis ◽

Chinese Fir ◽

Differentially Expressed ◽

Cloning And Expression ◽

Exogenous Iaa

Download Full-text