Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches

AbstractTriple-negative breast cancer (TNBC) heterogeneity represents one of the main impediment to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be splitted into at least three subtypes with potential therapeutic implications. Although, a few studies have been done to predict TNBC subtype by means of transcriptomics data, subtyping was partially sensitive and limited by batch effect and dependence to a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e. intra-patient diagnosis) based on machine learning algorithm with a limited number of probes. To this end, we started by introducing probe binary comparison for each patient (indicators). We based predictive analysis on this transformed data. Probe selection was first performed by combining both filter and wrapper methods for variable selection using cross validation. We thus tested three prediction models (random forest, gradient boosting [GB] and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation allowed us to consistently choose the best model. Results showed that the 50 selected indicators highlighted biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators has better performances as compared to the other models.

Download Full-text

Breast Cancer Type Classification Using Machine Learning

Journal of Personalized Medicine ◽

10.3390/jpm11020061 ◽

2021 ◽

Vol 11 (2) ◽

pp. 61

Author(s):

Jiande Wu ◽

Chindo Hicks

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Genomic Research ◽

Support Vector ◽

Cancer Type ◽

Classification Models

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.

Download Full-text

Machine Learning Helps Identify New Drug Mechanisms in Triple-Negative Breast Cancer

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2018.2851997 ◽

2018 ◽

Vol 17 (3) ◽

pp. 251-259 ◽

Cited By ~ 2

Author(s):

Arjun P. Athreya ◽

Alan J. Gaglio ◽

Junmei Cairns ◽

Krishna R. Kalari ◽

Richard M. Weinshilboum ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

New Drug

Download Full-text

Abstract P2-06-25: A phenotypic screening and machine learning platform efficiently identifies triple negative breast cancer-selective and readily druggable targets

10.1158/1538-7445.sabcs18-p2-06-25 ◽

2019 ◽

Author(s):

P Gautam ◽

A Jaiswal ◽

T Aittokallio ◽

A-A Hassan ◽

K Wennerberg

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Phenotypic Screening ◽

Learning Platform ◽

Druggable Targets

Download Full-text

Machine learning predicts rapid relapse of triple negative breast cancer

10.1101/613604 ◽

2019 ◽

Author(s):

Yiqing Zhang ◽

William Nock ◽

Meghan Wyse ◽

Zachary Weber ◽

Elizabeth Adams ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Support Vector ◽

Chi Square ◽

Genomic Features ◽

Immune Signatures ◽

Metastatic Relapse ◽

Significant Difference

ABSTRACTPurposeMetastatic relapse of triple-negative breast cancer (TNBC) within 2 years of diagnosis is associated with particularly aggressive disease and a distinct clinical course relative to TNBCs that relapse beyond 2 years. We hypothesized that rapid relapse TNBCs (rrTNBC; metastatic relapse or death <2 years) reflect unique genomic features relative to late relapse (lrTNBC; >2 years).Patients and MethodsWe identified 453 primary TNBCs from three publicly-available datasets and characterized each as rrTNBc, lrTNBC, or ‘no relapse’ (nrTNBC: no relapse/death with at least 5 years follow-up). We compiled primary tumor clinical and multi-omic data, including transcriptome (n=453), copy number alterations (CNAs; n=317), and mutations in 171 cancer-related genes (n=317), then calculated published gene expression and immune signatures.ResultsPatients with rrTNBC were higher stage at diagnosis (Chi-square p<0.0001) while lrTNBC were more likely to be non-basal PAM50 subtype (Chi-square p=0.03). Among 125 expression signatures, five immune signatures were significantly higher in nrTNBCs while lrTNBC were enriched for eight estrogen/luminal signatures (all FDR p<0.05). There was no significant difference in tumor mutation burden or percent genome altered across the groups. Among mutations, onlyTP53mutations were significantly more frequent in rrTNBC compared to lrTNBC (Fisher exact FDR p=0.009). To develop an optimal classifier, we used 77 significant clinical and ‘omic features to evaluate six modeling approaches encompassing simple, machine learning, and artificial neural network (ANN). Support vector machine outperformed other models with average receiver-operator characteristic area under curve >0.75.ConclusionsWe provide a new approach to define TNBCs based on timing of relapse. We identify distinct clinical and genomic features that can be incorporated into machine learning models to predict rapid relapse of TNBC.

Download Full-text

Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning

Journal of Personalized Medicine ◽

10.3390/jpm11090881 ◽

2021 ◽

Vol 11 (9) ◽

pp. 881

Author(s):

Rassanee Bissanum ◽

Sitthichok Chaichulee ◽

Rawikant Kamolphiwong ◽

Raphatphorn Navakanitworakul ◽

Kanyanatt Kanokwiroon

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Triple Negative Breast Cancer ◽

Triple Negative ◽

Gene Expression Pattern ◽

Classification Models ◽

Gene Set ◽

Svm Algorithm

Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (EN1, PROM1, and CCL2). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC.

Download Full-text