CoCoRV: a rare variant analysis framework using publicly available genotype summary counts to prioritize germline disease-predisposition genes

2021 ◽  
Author(s):  
Wenan Chen ◽  
Shuoguo Wang ◽  
Saima Sultana Tithi ◽  
David Ellison ◽  
Gang Wu

Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not addressed. We propose a new framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV has consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and can detect rare variants in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.

2020 ◽  
Author(s):  
Ricky Lali ◽  
Michael Chong ◽  
Arghavan Omidi ◽  
Pedrum Mohammadi-Shemirani ◽  
Ann Le ◽  
...  

ABSTRACTRare variants are collectively numerous and may underlie a considerable proportion of complex disease risk. However, identifying genuine rare variant associations is challenging due to small effect sizes, presence of technical artefacts, and heterogeneity in population structure. We hypothesized that rare variant burden over a large number of genes can be combined into predictive rare variant genetic risk score (RVGRS). We propose a novel method (RV-EXCALIBER) that leverages summary-level data from a large public exome sequencing database (gnomAD) as controls and robustly calibrates rare variant burden to account for the aforementioned biases. A RVGRS was found to strongly associate with coronary artery disease (CAD) in European and South Asian populations. Calibrated RVGRS capture the aggregate effect of rare variants through a polygenic model of inheritance, identifies 1.5% of the population with substantial risk of early CAD, and confers risk even when adjusting for known Mendelian CAD genes, clinical risk factors, and common variant gene scores.


2021 ◽  
Author(s):  
Iain S. Forrest ◽  
Kumardeep Chaudhary ◽  
Ha My T. Vy ◽  
Shantanu Bafna ◽  
Daniel M. Jordan ◽  
...  

ABSTRACTA major goal of genomic medicine is to quantify the disease risk of genetic variants. Here, we report the penetrance of 37,772 clinically relevant variants (including those reported in ClinVar1 and of loss-of-function consequence) for 197 diseases in an analysis of exome sequence data for 72,434 individuals over five ancestries and six decades of ages from two large-scale population-based biobanks (BioMe Biobank and UK Biobank). With a high-quality set of 5,359 clinically impactful variants, we evaluate disease prevalence in carriers and non-carriers to interrogate major determinants and implications of penetrance. First, we associate biomarker levels with penetrance of variants in known disease-predisposition genes and illustrate their clear biological link to disease. We then systematically uncover large numbers of ClinVar pathogenic variants that confer low risk of disease, even among those reviewed by experts, while delineating stark differences in variant penetrance by molecular consequence. Furthermore, we ascertain numerous variants present in non-European ancestries and reveal how increasing carrier age modifies penetrance estimates. Lastly, we examine substantial heterogeneity of penetrance among variants in known disease-predisposition genes for conditions such as familial hypercholesterolemia and breast cancer. These data indicate that existing categorical systems for variant classification do not adequately capture disease risk and warrant consideration of a more quantitative system based on population-based penetrance to evaluate clinical impact.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ricky Lali ◽  
Michael Chong ◽  
Arghavan Omidi ◽  
Pedrum Mohammadi-Shemirani ◽  
Ann Le ◽  
...  

AbstractRare variants are collectively numerous and may underlie a considerable proportion of complex disease risk. However, identifying genuine rare variant associations is challenging due to small effect sizes, presence of technical artefacts, and heterogeneity in population structure. We hypothesize that rare variant burden over a large number of genes can be combined into a predictive rare variant genetic risk score (RVGRS). We propose a method (RV-EXCALIBER) that leverages summary-level data from a large public exome sequencing database (gnomAD) as controls and robustly calibrates rare variant burden to account for the aforementioned biases. A calibrated RVGRS strongly associates with coronary artery disease (CAD) in European and South Asian populations by capturing the aggregate effect of rare variants through a polygenic model of inheritance. The RVGRS identifies 1.5% of the population with substantial risk of early CAD and confers risk even when adjusting for known Mendelian CAD genes, clinical risk factors, and a common variant genetic risk score.


2021 ◽  
pp. 1-10
Author(s):  
Zoe Guan ◽  
Ronglai Shen ◽  
Colin B. Begg

<b><i>Background:</i></b> Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The “rare variant hypothesis” proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. <b><i>Objectives:</i></b> In this study, we investigated associations between rare variants and 14 cancer types. <b><i>Methods:</i></b> We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). <b><i>Results:</i></b> We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). <b><i>Conclusions:</i></b> Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.


Author(s):  
Miss Payal W. Paratpure

Tracking of public bus location requires a GPS device to be installed, and lots of bus operators in developing countries don't have such an answer in situ to supply an accurate estimation of bus time of arrival (ETA). Without ETA information, it's very difficult for the overall public to plan their journey effectively. In this paper, implementation of an innovative IOT solution to trace the real time location of buses without requiring the deployment of a GPS device is discussed. It uses Bluetooth Low Energy (BLE) proximity beacon to trace the journey of a bus by deploying an Estimate location beacon on the bus. BLE detection devices (Raspberry Pi 4) are installed at selected bus stops along the path to detect the arrival of buses. Once detected, the situation of the bus is submitted to a cloud server to compute the bus ETAs. A field trial is currently being conducted in Johor, Malaysia together with an area bus operator on one single path. Our test results showed that the detection of BLE beacons is extremely accurate and it's feasible to trace the situation of buses without employing a GPS device during a cost-effective way.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Nicolas Maillard ◽  
Veronique Fremeaux Bacchi ◽  
Paula Vieira-Martins ◽  
Perrine Jullien ◽  
Eric Alamartine ◽  
...  

Abstract Background and Aims IgA nephropathy is the most frequent primary glomerulonephritis leading to end stage renal disease (ESRD) in about 30% of cases within 20 years after diagnosis. Complement activation through alternative and lectin pathways has been described to impact the pathogeny of the disease. We hypothesized in this study that rare variants of alternative pathways regulatory genes could be overrepresented and could play a role at initiating the disease and could harm the prognosis of IgA Nephropathy. Method Patients with biopsy proven IgA nephropathy with markers of severity comprising an evolution through ESRD and/or a proteinuria &gt;0.5g/day with available DNA sample were included. All coding sequences of CFH, CFI, MCP, C3, Factor B THBD and CFHR5 genes were analyzed by next generation sequencing. We defined a variant as rare when its minor allele frequency was below 0.1% in the general population. Frequencies were compared to a French volonteers cohort (n=80) and a European large cohort (n=503) Results We screened 128 patients with IgA N, with following characteristics at diagnosis: median age 42.4 yo, proteinuria (median) 1.4g/day, hypertension 66%, median eGFR 48.7 mL/min/1.73m². The median follow-up was 99 months and 58% of patients progressed to ESRD. We identified rare variants with MAF&lt;0.1% in 10.2 % (n=13) including 1 patient with two rare variants. The functional consequences of the 12 out the 14 variants are unknown. Two variants in CFH are located in function domains and are pathogenic. Patients with IgA N have high rates of rare variants in CFH (n=9/128 ; 7 %) versus normal controls (n=9/503 ; 1.8%) (p=0.004); Pathogenic Variants with minor allele frequency &lt;0.1% in CFH were found in 2 IgA N (2 out of 128, 1.5%) versus 1 European controls (1 out of 503) In total, 11 % (14/128), 3.8 % (5/128) and 0.8 % (1/128) of the 128 patients were homozygous for the at-risk haplotype MCP ggaac, CFH tgtgt or both, respectively (versus 6.2 % (5/80), 3.8 % (3/80) and 0% in the controls) 6 patients carried the pathogenic variant in THDM gene p.Ala43Thr (6/128) versus 5 in 508 controls population (p=0.01). No difference in term of hypertension, proteinuria, eGFR, Oxford classification, vascular score at diagnosis was noticed between patients without any rare variant compared to patients with at least one rare variant. The progression through ESRD was not different between groups. Conclusion In this cohort of Caucasian IgA nephropathy patients, rare variants of CFH and THBD were found significantly overrepresented compared to a French and European control cohort. Rare variants of alternative pathway regulatory genes were not associated with particular severity or prognosis.


2015 ◽  
Vol 8 (11) ◽  
pp. 4817-4830 ◽  
Author(s):  
X. Xi ◽  
V. Natraj ◽  
R. L. Shia ◽  
M. Luo ◽  
Q. Zhang ◽  
...  

Abstract. The Geostationary Fourier Transform Spectrometer (GeoFTS) is designed to measure high-resolution spectra of reflected sunlight in three near-infrared bands centered around 0.76, 1.6, and 2.3 μm and to deliver simultaneous retrievals of column-averaged dry air mole fractions of CO2, CH4, CO, and H2O (denoted XCO2, XCH4, XCO, and XH2O, respectively) at different times of day over North America. In this study, we perform radiative transfer simulations over both clear-sky and all-sky scenes expected to be observed by GeoFTS and estimate the prospective performance of retrievals based on results from Bayesian error analysis and characterization. We find that, for simulated clear-sky retrievals, the average retrieval biases and single-measurement precisions are < 0.2 % for XCO2, XCH4, and XH2O, and < 2 % for XCO, when the a priori values have a bias of 3 % and an uncertainty of 3 %. In addition, an increase in the amount of aerosols and ice clouds leads to a notable increase in the retrieval biases and slight worsening of the retrieval precisions. Furthermore, retrieval precision is a strong function of signal-to-noise ratio and spectral resolution. This simulation study can help guide decisions on the design of the GeoFTS observing system, which can result in cost-effective measurement strategies while achieving satisfactory levels of retrieval precisions and biases. The simultaneous retrievals at different times of day will be important for more accurate estimation of carbon sources and sinks on fine spatiotemporal scales and for studies related to the atmospheric component of the water cycle.


Author(s):  
S. Rubinacci ◽  
D.M. Ribeiro ◽  
R. Hofmeister ◽  
O. Delaneau

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.


2020 ◽  
Author(s):  
Roni Rasnic ◽  
Nathan Linial ◽  
Michal Linial

AbstractIt is estimated that up to 10% of cancer incidents are attributed to inherited genetic alterations. Despite extensive research, there are still gaps in our understanding of genetic predisposition to cancer. It was theorized that ultra-rare variants partially account for the missing heritable component. We harness the UK BioBank dataset of ∼500,000 individuals, 14% of which were diagnosed with cancer, to detect ultra-rare, possibly high-penetrance cancer predisposition variants. We report on 115 cancer-exclusive ultra-rare variations (CUVs) and nominate 26 variants with additional independent evidence as cancer predisposition variants. We conclude that population cohorts are valuable source for expanding the collection of novel cancer predisposition genes.


Author(s):  
Jami Jackson ◽  
Alison Motsinger-Reif

Rapid progress in genotyping technologies, including the scaling up of assay technologies to genome-wide levels and next generation sequencing, has motivated a burst in methods development and application to detect genotype-phenotype associations in a wide array of diseases and other phenotypes. In this chapter, the authors review the study design and genotyping options that are used in association mapping, along with the appropriate methods to perform mapping within these study designs. The authors discuss both candidate gene and genome-wide studies, focused on DNA level variation. Quality control, genotyping technologies, and single-SNP and multiple-SNP analyses have facilitated the successes in identifying numerous loci influence disease risk. However, variants identified have generally explained only a small fraction of the heritable component of disease risk. The authors discuss emerging trends and future directions in performing analysis for rare variants to detect these variants that predict these traits with more complex etiologies.


Sign in / Sign up

Export Citation Format

Share Document