Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning

AbstractEarly detection of breast cancer and its correct stage determination are important for prognosis and rendering appropriate personalized clinical treatment to breast cancer patients. However, despite considerable efforts and progress, there is a need to identify the specific genomic factors responsible for, or accompanying Invasive Ductal Carcinoma (IDC) progression stages, which can aid the determination of the correct cancer stages. We have developed two-class machine-learning classification models to differentiate the early and late stages of invasive ductal carcinoma. The prediction models are trained with RNA-seq gene expression profiles representing different IDC stages of 610 patients, obtained from The Cancer Genome Atlas (TCGA). Different supervised learning algorithms were trained and evaluated with an enriched model learning, facilitated by different feature selection methods. We also developed a machine-learning classifier trained on the same datasets with training sets reduced data corresponding to IDC driver genes. Based on these two classifiers, we have developed a web-server Duct-BRCA-CSP to predict early stage from late stages of IDC based on input RNA-seq gene expression profiles. The analysis conducted by us also enables deeper insights into the stage-dependent molecular events accompanying breast ductal carcinoma progression. The server is publicly available at http://bioinfo.icgeb.res.in/duct-BRCA-CSP.

Download Full-text

Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning

Scientific Reports ◽

10.1038/s41598-020-60740-w ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Shikha Roy ◽

Rakesh Kumar ◽

Vaibhav Mittal ◽

Dinesh Gupta

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Invasive Ductal Carcinoma ◽

Gene Expression Data ◽

Ductal Carcinoma ◽

Supervised Machine Learning ◽

Expression Data ◽

Classification Models

Download Full-text

Gene expression profiles of epithelial cells microscopically isolated from a breast-invasive ductal carcinoma and a nodal metastasis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.0408260101 ◽

2004 ◽

Vol 101 (52) ◽

pp. 18147-18152 ◽

Cited By ~ 74

Author(s):

I. Zucchi ◽

E. Mento ◽

V. A. Kuznetsov ◽

M. Scotti ◽

V. Valsecchi ◽

...

Keyword(s):

Gene Expression ◽

Epithelial Cells ◽

Invasive Ductal Carcinoma ◽

Ductal Carcinoma ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Nodal Metastasis ◽

Breast Invasive Ductal Carcinoma

Download Full-text

Bulk and single-cell RNA-seq reveal dmrtb1 gene expression profiles during sex change in zig-zag eel (Mastacembelus armatus)

Aquaculture ◽

10.1016/j.aquaculture.2021.737194 ◽

2021 ◽

pp. 737194

Author(s):

Lingzhan Xue ◽

Dan Jia ◽

Luohao Xu ◽

Zhen Huang ◽

Haiping Fan ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sex Change ◽

Rna Seq ◽

Mastacembelus Armatus

Download Full-text

Breast Cancer Type Classification Using Machine Learning

Journal of Personalized Medicine ◽

10.3390/jpm11020061 ◽

2021 ◽

Vol 11 (2) ◽

pp. 61

Author(s):

Jiande Wu ◽

Chindo Hicks

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Genomic Research ◽

Support Vector ◽

Cancer Type ◽

Classification Models

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.

Download Full-text

Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

PeerJ ◽

10.7717/peerj.5285 ◽

2018 ◽

Vol 6 ◽

pp. e5285 ◽

Cited By ~ 9

Author(s):

Mei Sze Tan ◽

Siow-Wee Chang ◽

Phaik Leng Cheah ◽

Hwa Jen Yap

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Hpv Infection ◽

Gene Set Enrichment Analysis ◽

Multiple Gene ◽

Cervical Cancers ◽

Learning Analysis

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).

Download Full-text

Molecular classification of selective oestrogen receptor modulators on the basis of gene expression profiles of breast cancer cells expressing oestrogen receptor α

British Journal of Cancer ◽

10.1038/sj.bjc.6600477 ◽

2002 ◽

Vol 87 (4) ◽

pp. 449-456 ◽

Cited By ~ 16

Author(s):

A S Levenson ◽

I L Kliakhandler ◽

K M Svoboda ◽

K M Pease ◽

S A Kaiser ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Oestrogen Receptor ◽

Breast Cancer Cells ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selective Oestrogen Receptor Modulators ◽

Receptor Modulators

Download Full-text

A tensor decomposition-based integrated analysis applicable to multiple gene expression profiles without sample matching

10.21203/rs.3.rs-766884/v2 ◽

2021 ◽

Author(s):

Taguchi Y-h. ◽

Turki Turki

Keyword(s):

Gene Expression ◽

Learning Strategies ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Tensor Decomposition ◽

Integrated Analysis ◽

Rna Seq ◽

Prior Learning ◽

Multiple Gene ◽

Machine Learning Applications

Abstract The integrated analysis of multiple gene expression profiles measured in distinct studies is always problematic. Especially, missing sample matching and missing common labeling between distinct studies prevent the integration of multiple studies in fully data-driven and unsupervised manner. In this study, we propose a strategy enabling the integration of multiple gene expression profiles among multiple independent studies without either labeling or sample matching, using tensor decomposition-based unsupervised feature extraction. As an example, we applied this strategy to Alzheimer’s disease (AD)-related gene expression profiles that lack exact correspondence among samples as well as AD single-cell RNA-seq (scRNA-seq) data. We found that we could select biologically reasonable genes with integrated analysis. Overall, integrated gene expression profiles can function analogously to prior learning and/or transfer learning strategies in other machine learning applications. For scRNA-seq, the proposed approach was able to drastically reduce the required computational memory.

Download Full-text

Peer Review #1 of "Integrative machine learning analysis of multiple gene expression profiles in cervical cancer (v0.1)"

10.7287/peerj.5285v0.1/reviews/1 ◽

2018 ◽

Author(s):

K Wong

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cervical Cancer ◽

Peer Review ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Multiple Gene ◽

Learning Analysis

Download Full-text

Real-World Data Explores New Gene Expression Profiles in Breast Cancer

Oncology Times ◽

10.1097/01.cot.0000734340.12249.66 ◽

2021 ◽

Vol 43 (3) ◽

pp. 33-33

Author(s):

Sarah LaCorte

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Real World ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Real World Data ◽

World Data ◽

New Gene

Download Full-text

The characteristics of mesenteric adipose tissue attached to different intestinal segments and their roles in immune regulation

AJP Gastrointestinal and Liver Physiology ◽

10.1152/ajpgi.00256.2021 ◽

2022 ◽

Author(s):

Haowei Zhang ◽

Yujin Ding ◽

Qin Zeng ◽

Dandan Wang ◽

Ganglei Liu ◽

...

Keyword(s):

Gene Expression ◽

Adipose Tissue ◽

Immune Regulation ◽

Expression Profiles ◽

Critical Role ◽

Gene Expression Profiles ◽

Rna Seq ◽

Mesenteric Adipose Tissue ◽

Cell Components ◽

Subsequent Effect

Background: Mesenteric adipose tissue (MAT) plays a critical role in the intestinal physiological ecosystems. Small and large intestines have evidently intrinsic and distinct characteristics. However, whether there exist any mesenteric differences adjacent to the small and large intestines (SMAT and LMAT) has not been properly characterized. We studied the important facets of these differences, such as morphology, gene expression, cell components and immune regulation of MATs, to characterize the mesenteric differences. Methods: The SMAT and LMAT of mice were utilized for comparison of tissue morphology. Paired mesenteric samples were analyzed by RNA-seq to clarify gene expression profiles. MAT partial excision models were constructed to illustrate the immune regulation roles of MATs, and 16S-seq was applied to detect the subsequent effect on microbiota. Results: Our data show that different segments of mesenteries have different morphological structures. SMAT not only has smaller adipocytes but also contains more fat-associated lymphoid clusters than LMAT. The gene expression profile is also discrepant between these two MATs in mice. B-cell markers were abundantly expressed in SMAT, while development-related genes were highly expressed in LMAT. Adipose-derived stem cells of LMAT exhibited higher adipogenic potential and lower proliferation rates than those of SMAT. In addition, SMAT and LMAT play different roles in immune regulation and subsequently affect microbiota components. Finally, our data clarified the described differences between SMAT and LMAT in humans. Conclusions: There were significant differences in cell morphology, gene expression profiles, cell components, biological characteristics, and immune and microbiota regulation roles between regional MATs.

Download Full-text