Comparative Study of Disease Classification Using Multiple Machine Learning Models Based on Landmark and Non-Landmark Gene Expression Data

Abstract Background Interferon regulatory factor-8 (IRF8) and nuclear factor-activated T cells c1 (NFATc1) are two transcription factors that have an important role in osteoclast differentiation. Thanks to ChIP-seq technology, scientists can now estimate potential genome-wide target genes of IRF8 and NFATc1. However, finding target genes that are consistently up-regulated or down-regulated across different studies is hard because it requires analysis of a large number of high-throughput expression studies from a comparable context. Method We have developed a machine learning based method, called, Cohort-based TF target prediction system (cTAP) to overcome this problem. This method assumes that the pathway involving the transcription factors of interest is featured with multiple “functional groups” of marker genes pertaining to the concerned biological process. It uses two notions, Gene-Present Sufficiently (GP) and Gene-Absent Insufficiently (GA), in addition to log2 fold changes of differentially expressed genes for the prediction. Target prediction is made by applying multiple machine-learning models, which learn the patterns of GP and GA from log2 fold changes and four types of Z scores from the normalized cohort’s gene expression data. The learned patterns are then associated with the putative transcription factor targets to identify genes that consistently exhibit Up/Down gene regulation patterns within the cohort. We applied this method to 11 publicly available GEO data sets related to osteoclastgenesis. Result Our experiment identified a small number of Up/Down IRF8 and NFATc1 target genes as relevant to osteoclast differentiation. The machine learning models using GP and GA produced NFATc1 and IRF8 target genes different than simply using a log2 fold change alone. Our literature survey revealed that all predicted target genes have known roles in bone remodeling, specifically related to the immune system and osteoclast formation and functions, suggesting confidence and validity in our method. Conclusion cTAP was motivated by recognizing that biologists tend to use Z score values present in data sets for the analysis. However, using cTAP effectively presupposes assembling a sizable cohort of gene expression data sets within a comparable context. As public gene expression data repositories grow, the need to use cohort-based analysis method like cTAP will become increasingly important.

Download Full-text

Leveraging TCGA gene expression data to build predictive models for cancer drug response

BMC Bioinformatics ◽

10.1186/s12859-020-03690-4 ◽

2020 ◽

Vol 21 (S14) ◽

Cited By ~ 3

Author(s):

Evan A. Clayton ◽

Toyya A. Pujol ◽

John F. McDonald ◽

Peng Qiu

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Predictive Models ◽

Drug Response ◽

Cancer Drug ◽

Expression Data ◽

Classification Methods ◽

Clustering And Classification ◽

Machine Learning Models

Abstract Background Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients’ primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. Results We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study’s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Conclusions Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.

Download Full-text

A comparative study of different machine learning methods on microarray gene expression data

BMC Genomics ◽

10.1186/1471-2164-9-s1-s13 ◽

2008 ◽

Vol 9 (Suppl 1) ◽

pp. S13 ◽

Cited By ~ 111

Author(s):

Mehdi Pirooznia ◽

Jack Y Yang ◽

Mary Qu Yang ◽

Youping Deng

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Comparative Study ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Learning Methods ◽

Microarray Gene Expression ◽

Machine Learning Methods ◽

Microarray Gene

Download Full-text

Peer Review #3 of "A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data (v0.1)"

10.7287/peerj-cs.270v0.1/reviews/3 ◽

2020 ◽

Author(s):

MM Zegarra Pimenta

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Comparative Study ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Cancer Types ◽

Microarray Gene

Download Full-text

Peer Review #1 of "A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data (v0.1)"

10.7287/peerj-cs.270v0.1/reviews/1 ◽

2020 ◽

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Comparative Study ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Cancer Types ◽

Microarray Gene

Download Full-text

Peer Review #3 of "A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data (v0.2)"

10.7287/peerj-cs.270v0.2/reviews/3 ◽

2020 ◽

Author(s):

MM Zegarra Pimenta

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Comparative Study ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Cancer Types ◽

Microarray Gene

Download Full-text

A Machine Learning-Based Comparative Study for the Classification of Septic Shock Using Microarray Gene Expression Data

2019 International Conference on Frontiers of Information Technology (FIT) ◽

10.1109/fit47737.2019.00022 ◽

2019 ◽

Author(s):

Arslan Ali ◽

Ayesha Hanif ◽

Arsalan Tahir ◽

Hafiz Umar Iftikhar ◽

Huma Shehwana ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Septic Shock ◽

Comparative Study ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text