scholarly journals Machine Learning Algorithms and Whole Exome Sequencing Data from Breast Cancer Patients in the UK Biobank Predict Survival

Abstract The authors have requested that this preprint be withdrawn due to erroneous posting.

2021 ◽  
Author(s):  
Bum-Sup Jang ◽  
In Ah Kim

Aim: We tested whether machine-learning algorithm could find biomarkers predicting overall survival in breast cancer patients using blood-based whole-exome sequencing data. Materials & methods: Whole-exome sequencing data derived from 1181 female breast cancer patients within the UK Biobank was collected. We found feature genes (n = 50) regarding total mutation burden using the long short-term memory model. Then, we developed the XGBoost survival model with selected feature genes. Results: The XGBoost survival model performed acceptably, with a concordance index of 0.75 and a scaled Brier score of 0.146 in terms of overall survival prediction. The high-mutation group exhibited inferior overall survival compared with the low-mutation group in patients ≥56 years (log-rank test, p = 0.042). Conclusion: We showed that machine-learning algorithms can be used to predict overall survival in breast cancer patients from blood-based whole-exome sequencing data.


2020 ◽  
Author(s):  
Bum-Sup Jang ◽  
In Ah Kim

Abstract Background: Using by machine learning algorithms, we aimed to identify the mutated gene set from the whole exome sequencing (WES) data of blood in the cancer, which is associated with overall survival in breast cancer patients.Methods: WES data from 1,181 female breast cancer patients within the UK Biobank cohort was collected. The number of mutations for each gene was summed and defined as the blood-based mutation burden per patient. Using by Long short-term memory (LSTM) machine learning algorithm and a XGBoost—a gradient-boosted tree algorithm, we developed the model to predict patient overall survival. Results: From the UK biobank-breast cancer cohort, most altered genes in blood samples were related with the TP53 pathway. In the LSTM model, the minimum 50 genes were found to predict high vs. low mutation burden. In the XGBoost survival model, the gene-set could predict overall survival showing the concordance index of 0.75 and the scaled Brier-score of 0.146 from the held-out testing set (20%, N=236). In older patients (≥ 56 years), the high mutation group based on this gene-set showed inferior overall survival compared to the low mutation group (log-rank test, P=0.042)Conclusion: The machine learning algorithms revealed the gene-signature in the UK biobank breast cancer cohort. Mutational burden observed in blood was associated with overall survival in relatively old patients. This gene-signature should be verified in prospective setting.


2019 ◽  
Author(s):  
Robert Maier ◽  
Ali Akbari ◽  
Xinzhu Wei ◽  
Nick Patterson ◽  
Rasmus Nielsen ◽  
...  

AbstractA recent study reported that a 32-base-pair deletion in the CCR5 gene (CCR5-∆32) is deleterious in the homozygous state in humans. Evidence for this came from a survival analysis in the UK Biobank cohort, and from deviations from Hardy-Weinberg equilibrium at a polymorphism tagging the deletion (rs62625034). Here, we carry out a joint analysis of whole-genome genotyping data and whole-exome sequencing data from the UK Biobank, which reveals that technical artifacts are a more plausible cause for deviations from Hardy-Weinberg equilibrium at this polymorphism. Specifically, we find that individuals homozygous for the deletion in the sequencing data are underrepresented in the genotyping data due to an elevated rate of missing data at rs62625034, possibly because the probe for this SNP overlaps with the ∆32 deletion. Another variant which has a higher concordance with the deletion in the sequencing data shows no associations with mortality. A phenome-wide scan for effects of variants tagging this deletion shows an overall inflation of association p-values, but identifies only one trait at p < 5×10−8, and no mediators for an effect on mortality. These analyses show that the original reports of a recessive deleterious effect of CCR5-∆32 are affected by a technical artifact, and that a closer investigation of the same data provides no positive evidence for an effect on lifespan.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jong Seop Kim ◽  
Hyoungseok Jeon ◽  
Hyeran Lee ◽  
Jung Min Ko ◽  
Yonghwan Kim ◽  
...  

AbstractAn 11-year-old Korean boy presented with short stature, hip dysplasia, radial head dislocation, carpal coalition, genu valgum, and fixed patellar dislocation and was clinically diagnosed with Steel syndrome. Scrutinizing the trio whole-exome sequencing data revealed novel compound heterozygous mutations of COL27A1 (c.[4229_4233dup]; [3718_5436del], p.[Gly1412Argfs*157];[Gly1240_Lys1812del]) in the proband, which were inherited from heterozygous parents. The maternal mutation was a large deletion encompassing exons 38–60, which was challenging to detect.


2021 ◽  
Vol 5 (1) ◽  
pp. 49-53
Author(s):  
Steven Lehrer ◽  
Peter H. Rheinstein

Background: Cognitive problems are common in breast cancer patients. The apolipoprotein E4 (APOE4) gene, a risk factor for Alzheimer’s disease (AD), may be associated with cancer-related cognitive decline. Objective: To further evaluate the effects of the APOE4 allele, we studied a cohort of patients from the UK Biobank (UKB) who had breast cancer; some also had AD. Methods: Our analysis included all subjects with invasive breast cancer. Single nucleotide polymorphism (SNP) data for rs 429358 and rs 7412 was used to determine APOE genotypes. Cognitive function as numeric memory was assessed with an online test (UKB data field 20240). Results: We analyzed data from 2,876 women with breast cancer. Of the breast cancer subjects, 585 (20%) carried the APOE4 allele. Numeric memory scores were significantly lower in APOE4 carriers and APOE4 homozygotes than non-carriers (p = 0.046). 34 breast cancer subjects (1.1%) had AD. There was no significant difference in survival among genotypes ɛ3/ɛ3, ɛ3/ɛ4, and ɛ4/ɛ4. Conclusion: UKB data suggest that cognitive problems in women with breast cancer are, for the most part, mild, compared with other sequelae of the disease. AD, the worst cognitive problem, is relatively rare (1.1%) and, when it occurs, APOE genotype has little impact on survival.


2017 ◽  
Vol 33 (15) ◽  
pp. 2402-2404 ◽  
Author(s):  
Alessandro Romanel ◽  
Tuo Zhang ◽  
Olivier Elemento ◽  
Francesca Demichelis

Sign in / Sign up

Export Citation Format

Share Document