Stroke-associated pattern of gene expression previously identified by machine-learning is diagnostically robust in an independent patient population

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.

Download Full-text

A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation

Frontiers in Genetics ◽

10.3389/fgene.2018.00313 ◽

2018 ◽

Vol 9 ◽

Cited By ~ 6

Author(s):

Adam McDermaid ◽

Xin Chen ◽

Yiran Zhang ◽

Cankun Wang ◽

Shaopeng Gu ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Uncertainty Analysis ◽

Rna Seq ◽

Read Alignment ◽

New Machine

Download Full-text

A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency

Cell Cycle ◽

10.1080/15384101.2017.1417706 ◽

2018 ◽

Vol 17 (4) ◽

pp. 486-491 ◽

Cited By ~ 22

Author(s):

Nicolas Borisov ◽

Victor Tkachev ◽

Maria Suntsova ◽

Olga Kovalchuk ◽

Alex Zhavoronkov ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cancer Patients ◽

Cell Lines ◽

Gene Expression Data ◽

Data Transfer ◽

Expression Data ◽

Drug Efficiency

Download Full-text

Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

PeerJ ◽

10.7717/peerj.5285 ◽

2018 ◽

Vol 6 ◽

pp. e5285 ◽

Cited By ~ 9

Author(s):

Mei Sze Tan ◽

Siow-Wee Chang ◽

Phaik Leng Cheah ◽

Hwa Jen Yap

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Hpv Infection ◽

Gene Set Enrichment Analysis ◽

Multiple Gene ◽

Cervical Cancers ◽

Learning Analysis

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).

Download Full-text

Gene Expression Analysis for Early Lung Cancer Prediction Using Machine Learning Techniques: An Eco-Genomics Approach

IEEE Access ◽

10.1109/access.2018.2886604 ◽

2019 ◽

Vol 7 ◽

pp. 4232-4238 ◽

Cited By ~ 5

Author(s):

Jayadeep Pati

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Lung Cancer ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Machine Learning Techniques ◽

Cancer Prediction ◽

Early Lung Cancer ◽

Learning Techniques

Download Full-text

Machine learning applied to whole-blood RNA-sequencing data uncovers distinct subsets of patients with systemic lupus erythematosus

10.1101/647719 ◽

2019 ◽

Author(s):

William A Figgett ◽

Katherine Monaghan ◽

Milica Ng ◽

Monther Alhamdoosh ◽

Eugene Maraskovsky ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Systemic Lupus Erythematosus ◽

Clinical Trials ◽

Lupus Erythematosus ◽

Whole Blood ◽

Rna Seq ◽

Systemic Lupus ◽

Disease Heterogeneity ◽

Healthy Donors

ABSTRACTObjectiveSystemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease that is difficult to treat. There is currently no optimal stratification of patients with SLE, and thus responses to available treatments are unpredictable. Here, we developed a new stratification scheme for patients with SLE, based on the whole-blood transcriptomes of patients with SLE.MethodsWe applied machine learning approaches to RNA-sequencing (RNA-seq) datasets to stratify patients with SLE into four distinct clusters based on their gene expression profiles. A meta-analysis on two recently published whole-blood RNA-seq datasets was carried out and an additional similar dataset of 30 patients with SLE and 29 healthy donors was contributed in this research; 141 patients with SLE and 51 healthy donors were analysed in total.ResultsExamination of SLE clusters, as opposed to unstratified SLE patients, revealed underappreciated differences in the pattern of expression of disease-related genes relative to clinical presentation. Moreover, gene signatures correlated to flare activity were successfully identified.ConclusionGiven that disease heterogeneity has confounded research studies and clinical trials, our approach addresses current unmet medical needs and provides a greater understanding of SLE heterogeneity in humans. Stratification of patients based on gene expression signatures may be a valuable strategy to harness disease heterogeneity and identify patient populations that may be at an increased risk of disease symptoms. Further, this approach can be used to understand the variability in responsiveness to therapeutics, thereby improving the design of clinical trials and advancing personalised therapy.

Download Full-text

Predicting lineage-specific differences in open chromatin across dozens of mammalian genomes

10.1101/2020.12.04.410795 ◽

2020 ◽

Author(s):

Irene M. Kaplow ◽

Morgan E. Wirthlin ◽

Alyssa J. Lawler ◽

Ashley R. Brown ◽

Michael Kleyman ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Genome Sequence ◽

Evaluation Metrics ◽

Open Chromatin ◽

Learning Models ◽

Cell Type ◽

Mammalian Genomes ◽

Multiple Species ◽

Machine Learning Models

ABSTRACTMany phenotypes have evolved through gene expression, meaning that differences between species are caused in part by differences in enhancers. Here, we demonstrate that we can accurately predict differences between species in open chromatin status at putative enhancers using machine learning models trained on genome sequence across species. We present a new set of criteria that we designed to explicitly demonstrate if models are useful for studying open chromatin regions whose orthologs are not open in every species. Our approach and evaluation metrics can be applied to any tissue or cell type with open chromatin data available from multiple species.

Download Full-text

Peer Review #1 of "Integrative machine learning analysis of multiple gene expression profiles in cervical cancer (v0.1)"

10.7287/peerj.5285v0.1/reviews/1 ◽

2018 ◽

Author(s):

K Wong

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Peer Review ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Multiple Gene ◽

Learning Analysis

Download Full-text

Identifying Subgroups of Patients With Autism by Gene Expression Profiles Using Machine Learning Algorithms

Frontiers in Psychiatry ◽

10.3389/fpsyt.2021.637022 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ping-I Lin ◽

Mohammad Ali Moni ◽

Susan Shur-Fen Gau ◽

Valsamma Eapen

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Microarray Data ◽

Social Communication ◽

Language Impairment ◽

Cholesterol Biosynthesis ◽

Learning Algorithms ◽

Fold Change ◽

Machine Learning Algorithms ◽

Communication Deficits

Objectives: The identification of subgroups of autism spectrum disorder (ASD) may partially remedy the problems of clinical heterogeneity to facilitate the improvement of clinical management. The current study aims to use machine learning algorithms to analyze microarray data to identify clusters with relatively homogeneous clinical features.Methods: The whole-genome gene expression microarray data were used to predict communication quotient (SCQ) scores against all probes to select differential expression regions (DERs). Gene set enrichment analysis was performed for DERs with a fold-change >2 to identify hub pathways that play a role in the severity of social communication deficits inherent to ASD. We then used two machine learning methods, random forest classification (RF) and support vector machine (SVM), to identify two clusters using DERs. Finally, we evaluated how accurately the clusters predicted language impairment.Results: A total of 191 DERs were initially identified, and 54 of them with a fold-change >2 were selected for the pathway analysis. Cholesterol biosynthesis and metabolisms pathways appear to act as hubs that connect other trait-associated pathways to influence the severity of social communication deficits inherent to ASD. Both RF and SVM algorithms can yield a classification accuracy level >90% when all 191 DERs were analyzed. The ASD subtypes defined by the presence of language impairment, a strong indicator for prognosis, can be predicted by transcriptomic profiles associated with social communication deficits and cholesterol biosynthesis and metabolism.Conclusion: The results suggest that both RF and SVM are acceptable options for machine learning algorithms to identify AD subgroups characterized by clinical homogeneity related to prognosis.

Download Full-text