scholarly journals Genetic risk score for ovarian cancer based on chromosomal-scale length variation

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract Introduction Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce ovarian cancer death rates. However, current ovarian cancer genetic risk scores do not work that well. We developed a genetic risk score based on variations in the length of chromosomes. Methods We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. We synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer. Results The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (95% confidence interval 0.86–0.91). Conclusion A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.

2020 ◽  
Author(s):  
Chris Toh ◽  
James P Brody

Introduction. Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes. Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset. Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91). Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.


2020 ◽  
Author(s):  
Chris Toh ◽  
James Brody

Abstract Introduction.Twin studies indicate thata substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes.Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset.Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91).Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.


2021 ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract Studies indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We aimed to determine how well one could predict that a person will develop schizophrenia based on their germ line genetics. We compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of numbers. Each number characterized the length of segments of chromosomes. We tested several machine learning algorithms to determine which was most effective in predicting schizophrenia and if any improvement in prediction occurs by breaking the chromosomes into smaller chunks. We found that the stacked ensemble, performed best with an area under the receiver operating characteristic curve (AUC) of 0.545 (95% CI 0.539-0.550). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. We conclude that germ line chromosomal scale length variation data could provide an effective genetic risk score for schizophrenia which performs better than chance.


2018 ◽  
Author(s):  
Chris Toh ◽  
James P. Brody

AbstractInherited factors are thought to be responsible for a substantial fraction of many different forms of cancer. However, individual cancer risk cannot currently be well quantified by analyzing germ line DNA. Most analyses of germline DNA focus on the additive effects of single nucleotide polymorphisms (SNPs) found. Here we show that chromosomal-scale length variation of germline DNA can be used to predict whether a person will develop cancer. In two independent datasets, the Cancer Genome Atlas (TCGA) project and the UK Biobank, we could classify whether or not a patient had a certain cancer based solely on chromosomal scale length variation. In the TCGA data, we found that all 32 different types of cancer could be predicted better than chance using chromosomal scale length variation data. We found a model that could predict ovarian cancer in women with an area under the receiver operator curve, AUC=0.89. In the UK Biobank data, we could predict breast cancer in women with an AUC=0.83. This method could be used to develop genetic risk scores for other conditions known to have a substantial genetic component and complements genetic risk scores derived from SNPs.


2021 ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract IntroductionSchizophrenia is a neurological disorder that often manifests itself as a combination of psychotic symptoms such as delusions, hallucinations, and disorganized cognitive functions. Several lines of evidence indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We set out to determine how well one could predict that a person will develop schizophrenia based on their germ line DNA.MethodsWe compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of a sequence of numbers. Each number characterized the length of a segment of one of their chromosomes. We tested several machine learning algorithms using the h2o.ai framework to determine which was most effective in predicting schizophrenia. We also tested whether there was any improvement in prediction by breaking the chromosomes into smaller chunks. We used SHAP values to better understand features important to the predictive model.ResultsWe found that the stacked ensemble, a combination of four different machine learning algorithms, performed best with an area under the receiver operating characteristic curve (AUC) of 0.583 (95% CI 0.581-0.586). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. ConclusionWe conclude that germ line chromosomal scale length variation data can provide an effective genetic risk score for schizophrenia. Length variations of several regions of the X Chromosome are the greatest contributing factor.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

AbstractStudies indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We aimed to determine how well one could predict that a person will develop schizophrenia based on their germ line genetics. We compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of numbers. Each number characterized the length of segments of chromosomes. We tested several machine learning algorithms to determine which was most effective in predicting schizophrenia and if any improvement in prediction occurs by breaking the chromosomes into smaller chunks. We found that the stacked ensemble, performed best with an area under the receiver operating characteristic curve (AUC) of 0.545 (95% CI 0.539–0.550). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. We conclude that germ line chromosomal scale length variation data could provide an effective genetic risk score for schizophrenia which performs better than chance.


2020 ◽  
Author(s):  
Chris Toh ◽  
James Brody

Abstract Introduction.The course of COVID-19 varies from asymptomatic to severe in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is partly responsible for the highly variable response. We evaluated how well a genetic risk score based on chromosomal-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection. Methods.We compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-CoV-2 infection before 27 April 2020 to a similar number of age matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosomal-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to COVID-19 based only on their 88-number classification.Results. We found that the XGBoost classifier could differentiate between the two classes at a significant level (I as measured against a randomized control and (n as measured against the expected value of a random guessing algorithm (AUC=0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.Conclusion.Genetics play a role in the severity of COVID-19, but we cannot yet develop a useful genetic test to predict severity.


2020 ◽  
Author(s):  
Chris Toh ◽  
James Brody

Abstract Introduction.The course of COVID-19 varies from asymptomatic to severe in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is partly responsible for the highly variable response. We evaluated how well a genetic risk score based on chromosomal-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection. Methods.We compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-CoV-2 infection before 27 April 2020 to a similar number of age matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosomal-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to COVID-19 based only on their 88-number classification.Results. We found that the XGBoost classifier could differentiate between the two classes at a significant level (I as measured against a randomized control and (n as measured against the expected value of a random guessing algorithm (AUC=0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.Conclusion.Genetics play a role in the severity of COVID-19, but we cannot yet develop a useful genetic test to predict severity.


2020 ◽  
pp. jrheum.200002
Author(s):  
Daniela Dominguez ◽  
Sylvia Kamphuis ◽  
Joseph Beyene ◽  
Joan Wither ◽  
John B. Harley ◽  
...  

Objective Specific risk alleles for childhood-onset SLE (cSLE) versus adult-onset SLE (aSLE) patients have not been identified. The aims of this study were to determine if: 1) There is an association between non-HLA-related genetic risk score (GRS) and age of SLE diagnosis; and if 2) There is an association between HLA-related genetic risk score and age of SLE diagnosis. Methods Genomic DNA was obtained from 2,001 multi-ethnic patients and genotyped using the Immunochip. Following quality control, genetic risk counting (GRCS), weighted (GRWS) and standardized counting (GRSCS) and standardized weighted (GRSWS) scores were calculated based on independent SNPs from validated SLE-loci. Scores were analyzed in a regression model and adjusted by sex and ancestral population. Results The analysed cohort consisted of 1,540 patients: 1,351 females and 189 males (675 cSLE and 865 aSLE). There were significant negative associations with age of SLE diagnosis p=0.011 and r2=0.175 for GRWS, p=0.008 and r2=0.178 for GRSCS, p=0.002 and r2=0.176 for GRSWS for all non-HLA genetic risk scores (higher GRS the lower the age of diagnosis.) All HLA genetic risk scores showed significant positive associations with age of diagnosis p=0.049 and r2=0.176 for GRCS, p=0.022 and r2=0.176 for GRWS, p=0.022 and r2=0.176 for GRSCS, p=0.011 and r2=0.177 for GRSWS: higher genetic scores correlated with higher age of diagnosis. Conclusion Our data suggested that there is a linear relationship between genetic risk and age of SLE diagnosis and that HLA and non-HLA genetic risk scores are associated with age of diagnosis in opposite directions.


2020 ◽  
Vol 14 (1) ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract Introduction The course of COVID-19 varies from asymptomatic to severe in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is partly responsible for the highly variable response. We evaluated how well a genetic risk score based on chromosomal-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection. Methods We compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-CoV-2 infection before 27 April 2020 to a similar number of age-matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosomal-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to COVID-19 based only on their 88-number classification. Results We found that the XGBoost classifier could differentiate between the two classes at a significant level (p = 2 · 10−11) as measured against a randomized control and (p = 3 · 10−14) as measured against the expected value of a random guessing algorithm (AUC = 0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test. Conclusion Genetics play a role in the severity of COVID-19, but we cannot yet develop a useful genetic test to predict severity.


Sign in / Sign up

Export Citation Format

Share Document