SSA.ME Detection of cancer mutual exclusivity patterns by small subnetwork analysis

Mapping Intimacies ◽

10.1101/034124 ◽

2015 ◽

Author(s):

Sergio Pulido-Tamayo ◽

Bram Weytjens ◽

Dries De Maeyer ◽

Kathleen Marchal

Keyword(s):

Clonal Evolution ◽

Complex Problem ◽

Added Value ◽

Breast Cancer Dataset ◽

Mutual Exclusivity ◽

Driver Genes ◽

Cancer Dataset ◽

Fitness Advantage ◽

Mutational Frequency ◽

Subnetwork Analysis

Because of its clonal evolution a tumor rarely contains multiple genomic alterations in the same pathway, as disrupting the pathway by one gene often is sufficient to confer the complete fitness advantage. As a result mutated genes display patterns of mutual exclusivity across tumors. The identification of such patterns have been exploited to detect cancer drivers. The complex problem of searching for mutual exclusivity across individuals has previously been solved by filtering the input data upfront, analyzing only genes mutated in numerous samples. These stringent filtering criteria come at the expense of missing rarely mutated driver genes. To overcome this problem, we present SSA.ME, a network-based method to detect mutually exclusive genes across tumors that does not depend on stringent filtering. Analyzing the TCGA breast cancer dataset illustrates the added value of SSA.ME: despite not using mutational frequency based-prefiltering, well-known recurrently mutated drivers could still be highly prioritized. In addition, we prioritized several genes that displayed mutual exclusivity and pathway connectivity with well-known drivers, but that were rarely mutated. We expect the proposed framework to be applicable to other complex biological problems because of its capability to process large datasets in polynomial time and its intuitive implementation.

SSA-ME Detection of cancer driver genes using mutual exclusivity by small subnetwork analysis

Scientific Reports ◽

10.1038/srep36257 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 10

Author(s):

Sergio Pulido-Tamayo ◽

Bram Weytjens ◽

Dries De Maeyer ◽

Kathleen Marchal

Keyword(s):

Mutual Exclusivity ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Driver Genes ◽

Subnetwork Analysis

The landscape of actionable genomic alterations in cell-free circulating tumor DNA from 21,807 advanced cancer patients

10.1101/233205 ◽

2017 ◽

Cited By ~ 2

Author(s):

Oliver A. Zill ◽

Kimberly C. Banks ◽

Stephen R. Fairclough ◽

Stefanie A. Mortimer ◽

James V. Vowles ◽

...

Keyword(s):

Deep Sequencing ◽

Clonal Evolution ◽

Patient Treatment ◽

Large Set ◽

Sequencing Analysis ◽

Cancer Genes ◽

Sequencing Data ◽

Mutual Exclusivity ◽

Advanced Cancer Patients ◽

Driver Genes

AbstractCell-free DNA (cfDNA) sequencing provides a non-invasive method for obtaining actionable genomic information to guide personalized cancer treatment, but the presence of multiple alterations in circulation related to treatment and tumor heterogeneity pose analytical challenges. We present the somatic mutation landscape of 70 cancer genes from cfDNA deep-sequencing analysis of 21,807 patients with treated, late-stage cancers across >50 cancer types. Patterns and prevalence of cfDNA alterations in major driver genes for non-small cell lung, breast, and colorectal cancer largely recapitulated those from tumor tissue sequencing compendia (TCGA and COSMIC), with the principle differences in alteration prevalence being due to patient treatment. This highly sensitive cfDNA sequencing assay revealed numerous subclonal tumor-derived alterations, expected as a result of clonal evolution, but leading to an apparent departure from mutual exclusivity in treatment-naïve tumors. To facilitate interpretation of this added complexity, we developed methods to identify cfDNA copy-number driver alterations and cfDNA clonality. Upon applying these methods, robust mutual exclusivity was observed among predicted truncal driver cfDNA alterations, in effect distinguishing tumor-initiating alterations from secondary alterations. Treatment-associated resistance, including both novel alterations and parallel evolution, was common in the cfDNA cohort and was enriched in patients with targetable driver alterations. Together these retrospective analyses of a large set of cfDNA deep-sequencing data reveal subclonal structures and emerging resistance in advanced solid tumors.

RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa172 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3299-3306

Author(s):

Ziwei Chen ◽

Fuzhou Gong ◽

Lin Wan ◽

Liang Ma

Keyword(s):

Single Cell ◽

Large Scale ◽

Clonal Evolution ◽

Low Rank ◽

Supplementary Information ◽

Breast Cancer Dataset ◽

Sequencing Data ◽

Cancer Dataset ◽

Single Cell Sequencing ◽

Model Free

Abstract Motivation Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. Results To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. Availability and implementation RobustClone software is available at https://github.com/ucasdp/RobustClone. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Mutational patterns and clonal evolution from diagnosis to relapse in pediatric acute lymphoblastic leukemia

Scientific Reports ◽

10.1038/s41598-021-95109-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shumaila Sayyab ◽

Anders Lundmark ◽

Malin Larsson ◽

Markus Ringnér ◽

Sara Nystedt ◽

...

Keyword(s):

Acute Lymphoblastic Leukemia ◽

Large Scale ◽

Somatic Mutations ◽

Lymphoblastic Leukemia ◽

Clonal Evolution ◽

Point Mutations ◽

Driver Genes ◽

Protein Coding ◽

Pediatric Acute Lymphoblastic Leukemia ◽

Evolutionary Trajectories

AbstractThe mechanisms driving clonal heterogeneity and evolution in relapsed pediatric acute lymphoblastic leukemia (ALL) are not fully understood. We performed whole genome sequencing of samples collected at diagnosis, relapse(s) and remission from 29 Nordic patients. Somatic point mutations and large-scale structural variants were called using individually matched remission samples as controls, and allelic expression of the mutations was assessed in ALL cells using RNA-sequencing. We observed an increased burden of somatic mutations at relapse, compared to diagnosis, and at second relapse compared to first relapse. In addition to 29 known ALL driver genes, of which nine genes carried recurrent protein-coding mutations in our sample set, we identified putative non-protein coding mutations in regulatory regions of seven additional genes that have not previously been described in ALL. Cluster analysis of hundreds of somatic mutations per sample revealed three distinct evolutionary trajectories during ALL progression from diagnosis to relapse. The evolutionary trajectories provide insight into the mutational mechanisms leading relapse in ALL and could offer biomarkers for improved risk prediction in individual patients.

Prediction of benign and malignant breast cancer using data mining techniques

Journal of Algorithms & Computational Technology ◽

10.1177/1748301818756225 ◽

2018 ◽

Vol 12 (2) ◽

pp. 119-126 ◽

Cited By ~ 43

Author(s):

Vikas Chaurasia ◽

Saurabh Pal ◽

BB Tiwari

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Low Income ◽

Prediction Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Low Income Countries ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Rbf Network

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.

PERFORMANCE ANALYSIS OF BREAST CANCER CLASSIFICATION USING DECISION TREE CLASSIFIERS

International Journal of Current Pharmaceutical Research ◽

10.22159/ijcpr.2017v9i2.17383 ◽

2017 ◽

Vol 9 (2) ◽

pp. 19 ◽

Cited By ~ 6

Author(s):

P. Hamsagayathri ◽

P. Sampath

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Ductal Carcinoma ◽

Research Work ◽

The United States ◽

Breast Cancer Dataset ◽

Decision Tree Classifier ◽

Cancer Dataset ◽

Term Survival ◽

Tree Classifier

Breast cancer is one of the dangerous cancers among world’s women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in the United States. Also, 246,660 new cases of women with cancer are estimated for the year 2016. Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification plays an important role in breast cancer detection and used by researchers to analyse and classify the medical data. In this research work, priority-based decision tree classifier algorithm has been implemented for Wisconsin Breast cancer dataset. This paper analyzes the different decision tree classifier algorithms for Wisconsin original, diagnostic and prognostic dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.

Breast Cancer Prediction using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8292.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4879-4881

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Data Science ◽

Breast Cancer Dataset ◽

Random Forest Algorithm ◽

Medical Field ◽

Cancer Dataset ◽

Cancer Prediction ◽

Time Consumption ◽

Simulated Environment

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Comparison of Imputation Methods on Retrospective Breast Cancer Data in Tanzania: A Case Study of Muhimbili and Ocean Road Hospitals

10.21203/rs.3.rs-820770/v1 ◽

2021 ◽

Author(s):

Rahibu A. Abassi ◽

Amina S. Msengwa ◽

Rocky R. J. Akarro

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Missing Data ◽

Binary Logistic Regression ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Imputation Methods ◽

Multiple Imputations ◽

Predictive Mean Matching ◽

Mean Square Errors

Abstract Background Clinical data are at risk of having missing or incomplete values for several reasons including patients’ failure to attend clinical measurements, wrong interpretations of measurements, and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission of missed observation during statistical analysis especially if a dataset is considerably small. The objective of this study is to compare several imputation methods in terms of efficiency in filling-in the missing data so as to increase the prediction and classification accuracy in breast cancer dataset. Methods Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, and multiple imputations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion The predictive mean matching imputation showed higher accuracy in estimating and replacing missing/incomplete data values in a real breast cancer dataset under the study. It is a more effective and good method to handle missing data in this scenario. We recommend to replace missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables, as it improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.

Subclonal diversity arises early even in small colorectal tumours and contributes to differential growth fates

Gut ◽

10.1136/gutjnl-2016-312232 ◽

2016 ◽

Vol 66 (12) ◽

pp. 2132-2140 ◽

Cited By ~ 23

Author(s):

Chelsie K Sievers ◽

Luli S Zou ◽

Perry J Pickhardt ◽

Kristina A Matkowskyj ◽

Dawn M Albrecht ◽

...

Keyword(s):

Computer Modelling ◽

Clonal Evolution ◽

Colorectal Polyps ◽

Colon Polyps ◽

Growth Behaviour ◽

Differential Growth ◽

Polyp Size ◽

Driver Genes ◽

Targeted Next Generation Sequencing ◽

Pathogenic Mutations

Objective and designThe goal of the study was to determine whether the mutational profile of early colorectal polyps correlated with growth behaviour. The growth of small polyps (6–9 mm) that were first identified during routine screening of patients was monitored over time by interval imaging with CT colonography. Mutations in these lesions with known growth rates were identified by targeted next-generation sequencing. The timing of mutational events was estimated using computer modelling and statistical inference considering several parameters including allele frequency and fitness.ResultsThe mutational landscape of small polyps is varied both within individual polyps and among the group as a whole but no single alteration was correlated with growth behaviour. Polyps carried 0–3 pathogenic mutations with the most frequent being inAPC,KRAS/NRAS,BRAF,FBXW7andTP53. In polyps with two or more pathogenic mutations, allele frequencies were often variable, indicating the presence of multiple populations within a single tumour. Based on computer modelling, detectable mutations occurred at a mean polyp size of 30±35 crypts, well before the tumour is of a clinically detectable size.ConclusionsThese data indicate that small colon polyps can have multiple pathogenic mutations in crucial driver genes that arise early in the existence of a tumour. Understanding the molecular pathway of tumourigenesis and clonal evolution in polyps that are at risk for progressing to invasive cancers will allow us to begin to better predict which polyps are more likely to progress into adenocarcinomas and which patients are at greater risk of developing advanced disease.