Exome-Wide Pan-Cancer Analysis of Germline Variants in 8,719 Individuals Finds Little Evidence of Rare Variant Associations

2021 ◽  
pp. 1-10
Author(s):  
Zoe Guan ◽  
Ronglai Shen ◽  
Colin B. Begg

<b><i>Background:</i></b> Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The “rare variant hypothesis” proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. <b><i>Objectives:</i></b> In this study, we investigated associations between rare variants and 14 cancer types. <b><i>Methods:</i></b> We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). <b><i>Results:</i></b> We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). <b><i>Conclusions:</i></b> Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.

2021 ◽  
Vol 12 ◽  
Author(s):  
Shiyue Tao ◽  
Xiangyu Ye ◽  
Lulu Pan ◽  
Minghan Fu ◽  
Peng Huang ◽  
...  

Pan-cancer strategy, an integrative analysis of different cancer types, can be used to explain oncogenesis and identify biomarkers using a larger statistical power and robustness. Fine-mapping defines the casual loci, whereas genome-wide association studies (GWASs) typically identify thousands of cancer-related loci and not necessarily have a fine-mapping component. In this study, we develop a novel strategy to identify the causal loci using a pan-cancer and fine-mapping assumption, constructing the CAusal Pan-cancER gene (CAPER) score and validating its performance using internal and external validation on 1,287 individuals and 985 cell lines. Summary statistics of 15 cancer types were used to define 54 causal loci in 15 potential genes. Using the Cancer Genome Atlas (TCGA) training set, we constructed the CAPER score and divided cancer patients into two groups. Using the three validation sets, we found that 19 cancer-related variables were statistically significant between the two CAPER score groups and that 81 drugs had significantly different drug sensitivity between the two CAPER score groups. We hope that our strategies for selecting causal genes and for constructing CAPER score would provide valuable clues for guiding the management of different types of cancers.


2019 ◽  
Author(s):  
Rodrigo R.R. Duarte ◽  
Matthew L. Bendall ◽  
Miguel de Mulder ◽  
Christopher E. Ormsby ◽  
Greta A. Beckerle ◽  
...  

AbstractSchizophrenia genome-wide association studies highlight the substantial contribution of risk attributed to the non-coding genome where human endogenous retroviruses (HERVs) are encoded. These ancient viral elements have previously been overlooked in genetic and transcriptomic studies due to their poor annotation and repetitive nature. Using a new, comprehensive HERV annotation, we found that the fraction of the genome where HERVs are located (the ‘retrogenome’) is enriched for schizophrenia risk variants, and that there are 148 disparate HERVs involved in susceptibility. Analysis of RNA-sequencing data from the dorsolateral prefrontal cortex of 259 schizophrenia cases and 279 controls from the CommonMind Consortium showed that HERVs are actively expressed in the brain (n = 3,979), regulated in cis by common genetic variants (n = 1,759), and differentially expressed in patients (n = 81). Convergent analyses implicate LTR25_6q21 and ERVLE_8q24.3h as HERVs of etiological relevance to schizophrenia, which are co-regulated with genes involved in neuronal and mitochondrial function, respectively. Our findings provide a strong rationale for exploring the retrogenome and the expression of these locus-specific HERVs as novel risk factors for schizophrenia and potential diagnostic biomarkers and treatment targets.


2019 ◽  
Author(s):  
Sara R. Rashkin ◽  
Rebecca E. Graff ◽  
Linda Kachuri ◽  
Khanh K. Thai ◽  
Stacey E. Alexeeff ◽  
...  

AbstractDeciphering the shared genetic basis of distinct cancers has the potential to elucidate carcinogenic mechanisms and inform broadly applicable risk assessment efforts. However, no studies have investigated pan-cancer pleiotropy within single, well-defined populations unselected for phenotype. We undertook novel genome-wide association studies (GWAS) and comprehensive evaluations of heritability and pleiotropy across 18 cancer types in two large, population-based cohorts: the UK Biobank (413,870 European ancestry individuals; 48,961 cancer cases) and the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging cohorts (66,526 European ancestry individuals; 16,001 cancer cases). The GWAS detected 21 novel genome-wide significant risk variants. In addition, numerous cancer sites exhibited clear heritability. Investigations of pleiotropy identified 12 cancer pairs exhibiting either positive or negative genetic correlations and 43 pleiotropic loci. We identified 158 pleiotropic variants, many of which were enriched for regulatory elements and influenced cross-tissue gene expression. Our findings demonstrate widespread pleiotropy and offer further insight into the complex genetic architecture of cross-cancer susceptibility.


2020 ◽  
Author(s):  
Juliet Luft ◽  
Robert S. Young ◽  
Alison M. Meynert ◽  
Martin S. Taylor

AbstractBackgroundThe loss of genetic diversity in segments over a genome (loss-of-heterozygosity, LOH) is a common occurrence in many types of cancer. By analysing patterns of preferential allelic retention during LOH in approximately 10,000 cancer samples from The Cancer Genome Atlas (TCGA), we sought to systematically identify genetic polymorphisms currently segregating in the human population that are preferentially selected for, or against during cancer development.ResultsExperimental batch effects and cross-sample contamination were found to be substantial confounders in this widely used and well studied dataset. To mitigate these we developed a generally applicable classifier (GenomeArtiFinder) to quantify contamination and other abnormalities. We provide these results as a resource to aid further analysis of TCGA whole exome sequencing data. In total, 1,678 pairs of samples (14.7%) were found to be contaminated or affected by systematic experimental error. After filtering, our analysis of LOH revealed an overall trend for biased retention of cancer-associated risk alleles previously identified by genome wide association studies. Analysis of predicted damaging germline variants identified highly significant oncogenic selection for recessive tumour suppressor alleles. These are enriched for biological pathways involved in genome maintenance and stability.ConclusionsOur results identified predicted damaging germline variants in genes responsible for the repair of DNA strand breaks and homologous repair as the most common targets of allele biased LOH. This suggests a ratchet-like process where heterozygous germline mutations in these genes reduce the efficacy of DNA double-strand break repair, increasing the likelihood of a second hit at the locus removing the wild-type allele and triggering an oncogenic mutator phenotype.


2015 ◽  
Vol 112 (4) ◽  
pp. 1019-1024 ◽  
Author(s):  
Yi-Juan Hu ◽  
Yun Li ◽  
Paul L. Auer ◽  
Dan-Yu Lin

In the large cohorts that have been used for genome-wide association studies (GWAS), it is prohibitively expensive to sequence all cohort members. A cost-effective strategy is to sequence subjects with extreme values of quantitative traits or those with specific diseases. By imputing the sequencing data from the GWAS data for the cohort members who are not selected for sequencing, one can dramatically increase the number of subjects with information on rare variants. However, ignoring the uncertainties of imputed rare variants in downstream association analysis will inflate the type I error when sequenced subjects are not a random subset of the GWAS subjects. In this article, we provide a valid and efficient approach to combining observed and imputed data on rare variants. We consider commonly used gene-level association tests, all of which are constructed from the score statistic for assessing the effects of individual variants on the trait of interest. We show that the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for nonsequenced subjects is unbiased. We derive a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, such that the corresponding association tests always have correct type I error. We demonstrate through extensive simulation studies that the proposed tests are substantially more powerful than the use of accurately imputed variants only and the use of sequencing data alone. We provide an application to the Women’s Health Initiative. The relevant software is freely available.


2017 ◽  
Vol 11 ◽  
pp. 117793221773509 ◽  
Author(s):  
Baishali Bandyopadhyay ◽  
Veda Chanda ◽  
Yupeng Wang

Thousands of genome-wide association studies (GWAS) have been conducted to identify the genetic variants associated with complex disorders. However, only a small proportion of phenotypic variances can be explained by the reported variants. Moreover, many GWAS failed to identify genetic variants associated with disorders displaying hereditary features. The “missing heritability” problem can be partly explained by rare variants. We simulated a causality scenario that gestational ages, a quantitative trait that can distinguish preterm (<37 weeks) and term births, were significantly correlated with the rare variant aggregations at 1000 single-nucleotide polymorphism loci. These 1000 simulated causal rare variants were embedded into randomly selected subsets of 9642 promoter regions from the 1000 Genomes Project genotypic data according to different proportions of causal rare variants within the embedded promoters. Through analysis of the correlations between rare variant aggregations and gestational ages, we found that the embedded promoters as a whole showed weaker genetic association when the proportion of causal rare variants decreased, and no individual embedded promoters showed genetic association when the proportion of causal rare variants was smaller than 0.4. Our analyses indicate that association signals can be greatly diluted when causal rare variants are dispersedly and sparsely distributed in the genome, accounting for an important source of missing heritability.


2019 ◽  
Author(s):  
Mart Kals ◽  
Tiit Nikopensius ◽  
Kristi Läll ◽  
Kalle Pärn ◽  
Timo Tõnis Sikka ◽  
...  

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.


Blood ◽  
2016 ◽  
Vol 127 (23) ◽  
pp. 2814-2823 ◽  
Author(s):  
Claire Lentaigne ◽  
Kathleen Freson ◽  
Michael A. Laffan ◽  
Ernest Turro ◽  
Willem H. Ouwehand

Abstract Variations in platelet number, volume, and function are largely genetically controlled, and many loci associated with platelet traits have been identified by genome-wide association studies (GWASs).1 The genome also contains a large number of rare variants, of which a tiny fraction underlies the inherited diseases of humans. Research over the last 3 decades has led to the discovery of 51 genes harboring variants responsible for inherited platelet disorders (IPDs). However, the majority of patients with an IPD still do not receive a molecular diagnosis. Alongside the scientific interest, molecular or genetic diagnosis is important for patients. There is increasing recognition that a number of IPDs are associated with severe pathologies, including an increased risk of malignancy, and a definitive diagnosis can inform prognosis and care. In this review, we give an overview of these disorders grouped according to their effect on platelet biology and their clinical characteristics. We also discuss the challenge of identifying candidate genes and causal variants therein, how IPDs have been historically diagnosed, and how this is changing with the introduction of high-throughput sequencing. Finally, we describe how integration of large genomic, epigenomic, and phenotypic datasets, including whole genome sequencing data, GWASs, epigenomic profiling, protein–protein interaction networks, and standardized clinical phenotype coding, will drive the discovery of novel mechanisms of disease in the near future to improve patient diagnosis and management.


2021 ◽  
Author(s):  
Tanya Ramdal Techlo ◽  
Mona Ameri Chalmer ◽  
Peter Loof Møller ◽  
Lisette Johanna Antonia Kogelman ◽  
Isa Amalie Olofsson ◽  
...  

Migraine has a heritability of up to 65%. Genome-wide association studies (GWAS) on migraine have identified 123 risk loci, explaining only 10.6% of migraine heritability. Thus, there is a considerable genetic component not identified with GWAS. Further, the causality of the identified risk loci remains inconclusive. Rare variants contribute to the risk of migraine but GWAS are often underpowered to detect these. Whole genome sequencing is reliable for analyzing rare variants but is not frequently used in large-scale. We assessed if rare variants in the migraine risk loci associated with migraine. We used a large cohort of whole genome sequenced migraine patients (1,040 individuals from 155 families). The findings were replicated in an independent case-control cohort (2,027 migraine patients, 1,650 controls). We found rare variants (minor allele frequency<0.1%) associated with migraine in a Polycomb Response Element in the ASTN2 locus. The association was independent of the GWAS lead risk variant in the locus. The findings place rare variants as risk factors for migraine. We propose a biological mechanism by which epigenetic regulation by Polycomb Response Elements plays a crucial role in migraine etiology.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Toshima Z. Parris

AbstractThe human nuclear receptor (NR) superfamily comprises 48 ligand-dependent transcription factors that play regulatory roles in physiology and pathophysiology. In cancer, NRs have long served as predictors of disease stratification, treatment response, and clinical outcome. The Cancer Genome Atlas (TCGA) Pan-Cancer project provides a wealth of genetic data for a large number of human cancer types. Here, we examined NR transcriptional activity in 8,526 patient samples from 33 TCGA ‘Pan-Cancer’ diseases and 11 ‘Pan-Cancer’ organ systems using RNA sequencing data. The web-based Kaplan-Meier (KM) plotter tool was then used to evaluate the prognostic potential of NR gene expression in 21/33 cancer types. Although, most NRs were significantly underexpressed in cancer, NR expression (moderate to high expression levels) was predominantly restricted (46%) to specific tissues, particularly cancers representing gynecologic, urologic, and gastrointestinal ‘Pan-Cancer’ organ systems. Intriguingly, a relationship emerged between recurrent positive pairwise correlation of Class IV NRs in most cancers. NR expression was also revealed to play a profound effect on patient overall survival rates, with ≥5 prognostic NRs identified per cancer type. Taken together, these findings highlighted the complexity of NR transcriptional networks in cancer and identified novel therapeutic targets for specific cancer types.


Sign in / Sign up

Export Citation Format

Share Document