scholarly journals Identifying cancer pathway dysregulations using differential causal effects

2021 ◽  
Author(s):  
Kim Philipp Jablonski ◽  
Martin Franz-Xaver Pirkl ◽  
Domagoj Cevid ◽  
Peter Buehlmann ◽  
Niko Beerenwinkel

Signaling pathways control cellular behavior. Dysregulated pathways, for example due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, artificial signals from, for example, batch effects have less influence on the result and dce has a higher chance to detect the biological signals. We show that dce outperforms competing methods on synthetic data sets and on CRISPR knockout screens. In an exploratory analysis on breast cancer data from TCGA, we recover known and discover new genes involved in breast cancer progression.

2020 ◽  
Author(s):  
Michael Allen ◽  
Andrew Salmon

ABSTRACTBackgroundOpen science is a movement seeking to make scientific research accessible to all, including publication of code and data. Publishing patient-level data may, however, compromise the confidentiality of that data if there is any significant risk that data may later be associated with individuals. Use of synthetic data offers the potential to be able to release data that may be used to evaluate methods or perform preliminary research without risk to patient confidentiality.MethodsWe have tested five synthetic data methods:A technique based on Principal Component Analysis (PCA) which samples data from distributions derived from the transformed data.Synthetic Minority Oversampling Technique, SMOTE which is based on interpolation between near neighbours.Generative Adversarial Network, GAN, an artificial neural network approach with competing networks - a discriminator network trained to distinguish between synthetic and real data., and a generator network trained to produce data that can fool the discriminator network.CT-GAN, a refinement of GANs specifically for the production of structured tabular synthetic data.Variational Auto Encoders, VAE, a method of encoding data in a reduced number of dimensions, and sampling from distributions based on the encoded dimensions.Two data sets are used to evaluate the methods:The Wisconsin Breast Cancer data set, a histology data set where all features are continuous variables.A stroke thrombolysis pathway data set, a data set describing characteristics for patients where a decision is made whether to treat with clot-busting medication. Features are mostly categorical, binary, or integers.Methods are evaluated in three ways:The ability of synthetic data to train a logistic regression classification model.A comparison of means and standard deviations between original and synthetic data.A comparison of covariance between features in the original and synthetic data.ResultsUsing the Wisconsin Breast Cancer data set, the original data gave 98% accuracy in a logistic regression classification model. Synthetic data sets gave between 93% and 99% accuracy. Performance (best to worst) was SMOTE > PCA > GAN > CT-GAN = VAE. All methods produced a high accuracy in reproducing original data means and stabdard deviations (all R-square > 0.96 for all methods and data classes). CT-GAN and VAE suffered a significant loss of covariance between features in the synthetic data sets.Using the Stroke Pathway data set, the original data gave 82% accuracy in a logistic regression classification model. Synthetic data sets gave between 66% and 82% accuracy. Performance (best to worst) was SMOTE > PCA > CT-GAN > GAN > VAE. CT-GAN and VAE suffered loss of covariance between features in the synthetic data sets, though less pronounced than with the Wisconsin Breast Cancer data set.ConclusionsThe pilot work described here shows, as proof of concept, that synthetic data may be produced, which is of sufficient quality to publish with open methodology, to allow people to better understand and test methodology. The quality of the synthetic data also gives promise of data sets that may be used for screening of ideas, or for research project (perhaps especially in an education setting).More work is required to further refine and test methods across a broader range of patient-level data sets.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Na Li ◽  
Belle W. X. Lim ◽  
Ella R. Thompson ◽  
Simone McInerny ◽  
Magnus Zethoven ◽  
...  

AbstractBreast cancer (BC) has a significant heritable component but the genetic contribution remains unresolved in the majority of high-risk BC families. This study aims to investigate the monogenic causes underlying the familial aggregation of BC beyond BRCA1 and BRCA2, including the identification of new predisposing genes. A total of 11,511 non-BRCA familial BC cases and population-matched cancer-free female controls in the BEACCON study were investigated in two sequencing phases: 1303 candidate genes in up to 3892 cases and controls, followed by validation of 145 shortlisted genes in an additional 7619 subjects. The coding regions and exon–intron boundaries of all candidate genes and 14 previously proposed BC genes were sequenced using custom designed sequencing panels. Pedigree and pathology data were analysed to identify genotype-specific associations. The contribution of ATM, PALB2 and CHEK2 to BC predisposition was confirmed, but not RAD50 and NBN. An overall excess of loss-of-function (LoF) (OR 1.27, p = 9.05 × 10−9) and missense (OR 1.27, p = 3.96 × 10−73) variants was observed in the cases for the 145 candidate genes. Leading candidates harbored LoF variants with observed ORs of 2–4 and individually accounted for no more than 0.79% of the cases. New genes proposed by this study include NTHL1, WRN, PARP2, CTH and CDK9. The new candidate BC predisposition genes identified in BEACCON indicate that much of the remaining genetic causes of high-risk BC families are due to genes in which pathogenic variants are both very rare and convey only low to moderate risk.


2019 ◽  
Author(s):  
Runpu Chen ◽  
Steve Goodison ◽  
Yijun Sun

AbstractThe interpretation of accumulating genomic data with respect to tumor evolution and cancer progression requires integrated models. We developed a computational approach that enables the construction of disease progression models using static sample data. Application to breast cancer data revealed a linear, branching evolutionary model with two distinct trajectories for malignant progression. Here, we used the progression model as a foundation to investigate the relationships between matched primary and metastasis breast tumor samples. Mapping paired data onto the model confirmed that molecular breast cancer subtypes can shift during progression, and supported directional tumor evolution through luminal subtypes to increasingly malignant states. Cancer progression modeling through the analysis of available static samples represents a promising breakthrough. Further refinement of a roadmap of breast cancer progression will facilitate the development of improved cancer diagnostics, prognostics and targeted therapeutics.


2021 ◽  
pp. 389-403
Author(s):  
S. Venkata Achuta Rao ◽  
Pamarthi Rama Koteswara Rao

2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 11021-11021
Author(s):  
Tatyana A. Grushko ◽  
Maria J. Gomez-Vega ◽  
Aleix Prat ◽  
Jeffrey Mueller ◽  
Mariann Coyle ◽  
...  

11021 Background: PTK6 gene on chromosome 20q13 encodes the intracellular non-receptor tyrosine kinase. Studies in vivo and in vitro revealed a role for PTK6 in cell proliferation and survival, particularly in HER2+ breast cancer cells suggesting that PTK6 may associate with the HER2 pathway and confer resistance to HER2-targeted therapy. PTK6 protein is frequently overexpressed in breast cancer, however, the mechanism(s) underlying PTK6 overexpression and its role in cancer remains unclear. To address this problem, we analyzed the frequency of PTK6 gene copy number variation (CNV) and expression in association with breast cancer subtypes. Methods: Retrospectiveparaffinsamples of invasive tumor and normal epithelium, and matching DCIS and metastases were mounted on TMA. PTK6 CNV was determined using PTK6:CEP20 FISH assay. Tumor subtypes were defined using the five-marker IHC classifier. The correlation between PTK6 CNV and mRNA expression and association of both with the intrinsic PAM50 tumor subtype were studied using TCGA database (547 cases) and publicly available seven breast cancer data sets (1005 cases). Data were normalized, gene median centered and standardized for the purpose of the study. Results: By FISH, 20% of 41 invasive tumors carried PTK6 CNV: amplification (10%) and gene polysomy (10%). The proportion of PTK6 amplified cases differed by subtype, with the largest proportion in HER2-enriched (17%) and LumB (14%). Strikingly, amplified invasive cases also showed amplification in matching DCIS and metastases. Analysis of the public datasets confirmed the frequent PTK6 amplification in breast cancer. Both low and high levels of amplification were detected with the largest proportion in HER2+ tumors (HER2-enriched and LumB; p=2.05e-26). None of the basal-like tumors showed high levels of PTK6 amplification. A high correlation between PTK6 gene copies and mRNA expression was observed (p=1.13e-08). Conclusions: PTK6 gene is amplified early in breast cancer progression, particularly in HER2+ tumors. Further studies on PTK6 biology may help clinicians to understand its potential role in HER2 resistance. Supported by BREAST CANCER SPORE, NCI K12CA139160 and CTSA-ITM CS UL1 RR024999.


Author(s):  
Ojekudo, Nathaniel Akpofure ◽  
Akpan, Nsikan Paul

Count data regression models exhibit different strengths and weaknesses in their bids to solving problems. The study considers six count models namely Poisson Regression Model (PRM), Negative Regression Model (NBRM), Zero Inflated Poisson (ZIP), Zero Inflated Negative Binomial (ZINB), Zero Truncated Poisson (ZTP) and Zero Truncated Negative Binomial (ZTNB) and an additional model called hurdle_T. These models are used to analyze two health data sets. The data on male breast cancer reveals that male breast cancer cuts across all age brackets or categories but it is more prevalent between the ages of 50 and 60. The PRM yields a better result than the NBRM in the case of cancer data as shown by the information criteria. The analysis of the second data, which is on doctor’s visit reveals that ZINB yields a better result than the other five models, followed by NBRM, then the ZTNB before their Poisson counterparts. The hurdle_T model shows the propensity of each coefficient as reflected by the positive count in the Tobit (Binary) model. The study also shows that at 65 years and above, gender has significant effect on doctor’s visit. In particular, females, more than males attract more doctors’ visit in the said age range. Government policies should provide more funds in the health sector to accommodate cancer cases in terms of the provision of awareness, studies/ research and infrastructural development. Males should be encouraged to visit clinics especially in their late forties and above for breast cancer related checkup. At age 65 and above, doctors visit to patients are frequent, especially to females. Policy of government in the health sector should accommodate a favourable adjustment in the budget to take care of doctors’ visit.


2021 ◽  
Vol 10 (1) ◽  
pp. 60
Author(s):  
Mahsa Dehghani Soufi ◽  
Reza Ferdousi

Introduction: Growing evidence has shown that some overweight factors could be implicated in tumor genesis, higher recurrence and mortality. In addition, association of various overweight factors and breast cancer has not been extensively explored. The goal of this research was to explore and evaluate the association of various overweight/obesity factors and breast cancer, based on obesity breast cancer data set.Material and Methods: Several studies show that a significantly stronger association is obvious between overweight and higher breast cancer incidence, but the role of some overweight factors such as BMI, insulin-resistance, Homeostasis Model Assessment (HOMA), Leptin, adiponectin, glucose and MCP.1 is still debatable, So for experiment of research work several clinical and biochemical overweight factors, including age, Body Mass Index (BMI), Glucose, Insulin, Homeostatic Model Assessment (HOMA), Leptin, Adiponectin, Resistin and Monocyte chemo attractant protein-1(MCP-1) were analyzed. Data mining algorithms including k-means, Apriori, Hierarchical clustering algorithm (HCM) were applied using orange version 3.22 as an open source data mining tool.Results: The Apriori algorithm generated a list of frequent item sets and some strong rules from dataset and found that insulin, HOMA and leptin are two items often simultaneously were seen for BC patients that leads to cancer progression. K-means algorithm applied and it divided samples on three clusters and its results showed that the pair of andlt;Adiponectin, MCP.1andgt;  has the highest effect on seperation of clusters. In addition HCM was carried out and classified BC patients into 1-32 clusters to So this research apply HCM algorithm. We carried out hierarchical clustering with average linkage without purning and classified BC patients into 1–32 clusters in order to identify BC patients with similar charestrictics.Conclusion: These finding provide the employed algorithms in this study can be helpful to our aim.


2008 ◽  
Vol 123 (6) ◽  
pp. 1327-1338 ◽  
Author(s):  
Daniela Cimino ◽  
Luca Fuso ◽  
Christian Sfiligoi ◽  
Nicoletta Biglia ◽  
Riccardo Ponzone ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document