A Bayesian method using sparse data to estimate penetrance of disease-associated genetic variants

AbstractPurposeA major challenge in genomic medicine is how to best predict risk of disease from rare variants discovered in Mendelian disease genes but with limited phenotypic data. We have recently used Bayesian methods to show that in vitro functional measurements and computational pathogenicity classification of variants in the cardiac gene SCN5A correlate with rare arrhythmia penetrance. We hypothesized that similar predictors could be used to impute variant-specific penetrance prior probabilities.MethodsFrom a review of 756 publications, we developed a pattern mixture algorithm, based on a Bayesian Beta-Binomial model, to generate SCN5A variant-specific penetrance priors for the heart arrhythmia Brugada syndrome (BrS).ResultsThe resulting priors correlate with mean BrS penetrance posteriors (cross validated R2= 0.41). SCN5A variant function and structural context provide the most information predictive of BrS penetrance. The resulting priors are interpretable as equivalent to the observation of affected and unaffected carriers.ConclusionsBayesian estimates of penetrance can efficiently integrate variant-specific data (e.g. functional, structural, and sequence) to accurately estimate disease risk attributable to individual variants. We suggest this formulation of penetrance is quantitative, probabilistic, and more precise than, but consistent with, discrete pathogenicity classification approaches.

Download Full-text

Arrhythmia Variant Associations and Reclassifications in the eMERGE-III Sequencing Study

Circulation ◽

10.1161/circulationaha.121.055562 ◽

2021 ◽

Author(s):

Andrew M Glazer ◽

Giovanni E. Davogustto ◽

Christian M. Shaffer ◽

Carlos G Vanoye ◽

Reshma R. Desai ◽

...

Keyword(s):

Genetic Testing ◽

Disease Risk ◽

Large Population ◽

Hek293 Cells ◽

Disease Genes ◽

Mendelian Disease ◽

Functional Evaluation ◽

Variants Of Uncertain Significance ◽

Uncertain Significance

Background: Sequencing Mendelian arrhythmia genes in individuals without an indication for arrhythmia genetic testing can identify carriers of pathogenic or likely pathogenic (P/LP) variants. However, the extent to which these variants are associated with clinically meaningful phenotypes before or after return of variant results (RoR) is unclear. In addition, the majority of discovered variants are currently classified as Variants of Uncertain Significance (VUS), limiting clinical actionability. Methods: The eMERGE-III study is a multi-center prospective cohort which included 21,846 participants without prior indication for cardiac genetic testing. Participants were sequenced for 109 Mendelian disease genes, including 10 linked to arrhythmia syndromes. Variant carriers were assessed with Electronic Health Record (EHR)-derived phenotypes and follow-up clinical examination. Selected VUS (n=50) were characterized in vitro with automated electrophysiology experiments in HEK293 cells. Results: As previously reported, 3.0% of participants had pathogenic or likely pathogenic (P/LP) variants in the 109 genes. Herein, we report 120 participants (0.6%) with P/LP arrhythmia variants. Compared to non-carriers, arrhythmia P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their EHRs. Fifty four participants had variant results returned. Nineteen of these 54 participants had inherited arrhythmia syndrome diagnoses (primarily long QT syndrome), and 12/19 of these diagnoses were made only after variant results were returned (0.05%). After in vitro functional evaluation of 50 variants of uncertain significance (VUS), we reclassified 11 variants: 3 to likely benign and 8 to P/LP. Conclusions: Genome sequencing in a large population without indication for arrhythmia genetic testing identified phenotype-positive carriers of variants in congenital arrhythmia syndrome disease genes. As large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, EHR phenotypes, and in vitro functional studies.

Download Full-text

Arrhythmia variant associations and reclassifications in the eMERGE-III sequencing study

10.1101/2021.03.30.21254549 ◽

2021 ◽

Author(s):

Andrew M Glazer ◽

Giovanni Davogustto ◽

Christian M Shaffer ◽

Carlos G Vanoye ◽

Reshma R Desai ◽

...

Keyword(s):

Rare Variants ◽

Disease Risk ◽

Disease Genes ◽

Functional Evaluation ◽

Functional Studies ◽

Large Numbers ◽

Inherited Arrhythmia ◽

Genomic Screening ◽

Uncertain Significance

In 21,846 eMERGE-III participants, sequencing 10 arrhythmia syndrome disease genes identified 123 individuals with pathogenic or likely pathogenic (P/LP) variants. Compared to non-carriers, P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their electronic health records (EHRs). Fifty one participants had variant results returned. Eighteen of these 51 participants had inherited arrhythmia syndrome diagnoses (primarily long QT syndrome), and 11/18 of these diagnoses were made only after variant results were returned. After in vitro functional evaluation of 50 variants of uncertain significance (VUS), we reclassified 11 variants: 3 to likely benign and 8 to P/LP. As large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, EHR phenotypes, and in vitro functional studies.

Download Full-text

Leveraging Health Systems Data to Characterize a Large Effect Variant Conferring Risk for Liver Disease in Puerto Ricans

10.1101/2021.03.31.21254662 ◽

2021 ◽

Author(s):

Gillian Belbin ◽

Stephanie Rutledge ◽

Tetyana Dodatko ◽

Sinead Cullina ◽

Michael C Turchin ◽

...

Keyword(s):

Liver Disease ◽

Puerto Rican ◽

Health Systems ◽

Disease Risk ◽

Genomic Medicine ◽

Genomic Data ◽

Dependent Manner ◽

Increased Risk ◽

Population Scale

Broad-scale adoption of genomic data in health systems offers opportunities for extending methods for the discovery of variation linked to underlying genomic disease risk. We applied a population-scale linkage mapping approach in a large multi-ethnic biobank to a spectrum of disease outcomes derived from Electronic Health Records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population-scale can facilitate novel strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.

Download Full-text

Exome sequencing in families with severe mental illness identifies novel and rare variants in genes implicated in Mendelian neuropsychiatric syndromes

10.1101/310821 ◽

2018 ◽

Cited By ~ 1

Author(s):

Suhas Ganesh ◽

Ahmed P Husayn ◽

Ravi Kumar Nadella ◽

Ravi Prabhakar More ◽

Manasa Sheshadri ◽

...

Keyword(s):

Bipolar Disorder ◽

Exome Sequencing ◽

Rare Variants ◽

Association Studies ◽

Mental Illnesses ◽

Disease Genes ◽

Mendelian Disease ◽

Genome Wide Association Studies ◽

Complex Disorders ◽

Family Based

AbstractIntroductionSevere Mental Illnesses (SMI), such as bipolar disorder and schizophrenia, are highly heritable, and have a complex pattern of inheritance. Genome wide association studies detect a part of the heritability, which can be attributed to common genetic variation. Examination of rare variants with Next Generation Sequencing (NGS) may add to the understanding of genetic architecture of SMIs.MethodsWe analyzed 32 ill subjects (with diagnosis of Bipolar Disorder, n=26; schizophrenia, n=4; schizoaffective disorder, n=1 schizophrenia like psychosis, n=1) from 8 multiplex families; and 33 healthy individuals by whole exome sequencing. Prioritized variants were selected by a 4-step filtering process, which included deleteriousness by 5 in silico algorithms; sharing within families, absence in the controls and rarity in South Asian sample of Exome Aggregation Consortium.ResultsWe identified a total of 42 unique rare, non-synonymous deleterious variants in this study with an average of 5 variants per family. None of the variants were shared across families, indicating a ‘private’ mutational profile. Twenty (47.6%) of the variant harboring genes identified in this sample have been previously reported to contribute to the risk of neuropsychiatric syndromes. These include genes which are related to neurodevelopmental processes, or have been implicated in different monogenic syndromes with a severe neurodevelopmental phenotype.ConclusionNGS approaches in family based studies are useful to identify novel and rare variants in genes for complex disorders like SMI. The study further validates the phenotypic burden of rare variants in Mendelian disease genes, indicating pleiotropic effects in the etiology of severe mental illnesses.

Download Full-text

Assessing the role of polygenic background on the penetrance of monogenic forms in Parkinson's disease.

10.1101/2021.06.06.21253270 ◽

2021 ◽

Author(s):

Emadeldin Hassanin ◽

Patrick May ◽

Rana Aldisi ◽

Peter Krawitz ◽

Carlo Maj ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Family History ◽

Rare Variants ◽

Disease Risk ◽

Risk Groups ◽

Disease Genes ◽

Incomplete Penetrance ◽

Carrier Status ◽

Common Variants

Background: Several rare and common variants are associated with Parkinson's disease. However, there is still an incomplete penetrance in the carriers of rare variants associated with Parkinson's disease. To address this issue, we investigated whether a PRS calculated from significant GWAS SNPs affects the penetrance of Parkinson's disease among carriers of rare monogenic variants in known Parkinson's disease genes and those with a family history. Methods: We calculated the PRS based on common variants and selected the carriers of rare monogenic variants by using the exome data from UK Biobank. Individuals were divided into three risk categories based on PRS: low (<10%), intermediate (10%-90%), and high (>90%) risk groups. We then compared how PRS affects Parkinson's disease risk among carriers of rare monogenic variants and those with family-history. Results: We observed a two-fold higher odds ratio for a carrier of a monogenic variant that had a high PRS (OR 4.07,95% CI, 1.72-8.08) compared to carriers with a low PRS (OR 1.91, 95% CI, 0.31-6.05). In the same line, carriers with a first-degree family history and with >90% PRS have even a higher risk of developing PD (OR 23.53, 95%CI 5.39-71.54) compared to those with <90% PRS (OR 9.54, 95% CI 3.32-21.65). Conclusions: Our results show that PRS, carrier status, and family history contribute independently and additively to the Parkinson's disease risk.

Download Full-text

High Frequency Actionable Pathogenic Exome Mutations in an Average-Risk Cohort

10.1101/151225 ◽

2017 ◽

Cited By ~ 1

Author(s):

Shannon Rego ◽

Orit Dagan-Rosenfeld ◽

Wenyu Zhou ◽

M. Reza Sailani ◽

Patricia Limcaoco ◽

...

Keyword(s):

General Population ◽

Exome Sequencing ◽

Health Management ◽

Disease Risk ◽

Mendelian Inheritance ◽

Disease Genes ◽

Mendelian Disease ◽

Average Risk ◽

Pathogenic Variants ◽

Actionable Findings

AbstractWhole exome sequencing (WES) is increasingly utilized in both clinical and non-clinical settings, but little is known about the utility of WES in healthy individuals. In order to determine the frequency of both medically actionable and non-actionable but medically relevant exome findings in the general population we assessed the exomes of 70 participants who have been extensively characterized over the past several years as part of a longitudinal integrated multi-omics profiling study at Stanford University. We assessed exomes for rare likely pathogenic and pathogenic variants in genes associated with Mendelian disease in the Online Mendelian Inheritance in Man (OMIM) database. We used American College of Medical Genetics (ACMG) guidelines were used for the classification of rare sequence variants, and additionally we assessed pharmacogenetic variants. Twelve out of 70 (17%) participants had medically actionable findings in Mendelian disease genes, including 6 (9%) with mutations in genes not currently included in the ACMG’s list of 59 actionable genes. This number is higher than that reported in previous studies and suggests added benefit from utilizing expanded gene lists and manual curation to assess actionable findings. A total of 60 participants (89%) had non-actionable findings identified including 57 who were found to be mutation carriers for recessive diseases and 21 who have increased Alzheimer’s disease risk due to heterozyg ous or homozygousAPOEe4 alleles (18 participants had both). These results suggest that exome sequencing may have considerably more utility for health management in the general population than previously thought.

Download Full-text

Quantitative disease risk scores from EHR with applications to clinical risk stratification and genetic studies

npj Digital Medicine ◽

10.1038/s41746-021-00488-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Danqing Xu ◽

Chen Wang ◽

Atlas Khan ◽

Ning Shang ◽

Zihuai He ◽

...

Keyword(s):

Risk Stratification ◽

Disease Risk ◽

Association Studies ◽

Large Datasets ◽

Risk Scores ◽

Sequencing Data ◽

Case Definitions ◽

Phenotypic Data ◽

Clinical Risk ◽

Phenotypic Features

AbstractLabeling clinical data from electronic health records (EHR) in health systems requires extensive knowledge of human expert, and painstaking review by clinicians. Furthermore, existing phenotyping algorithms are not uniformly applied across large datasets and can suffer from inconsistencies in case definitions across different algorithms. We describe here quantitative disease risk scores based on almost unsupervised methods that require minimal input from clinicians, can be applied to large datasets, and alleviate some of the main weaknesses of existing phenotyping algorithms. We show applications to phenotypic data on approximately 100,000 individuals in eMERGE, and focus on several complex diseases, including Chronic Kidney Disease, Coronary Artery Disease, Type 2 Diabetes, Heart Failure, and a few others. We demonstrate that relative to existing approaches, the proposed methods have higher prediction accuracy, can better identify phenotypic features relevant to the disease under consideration, can perform better at clinical risk stratification, and can identify undiagnosed cases based on phenotypic features available in the EHR. Using genetic data from the eMERGE-seq panel that includes sequencing data for 109 genes on 21,363 individuals from multiple ethnicities, we also show how the new quantitative disease risk scores help improve the power of genetic association studies relative to the standard use of disease phenotypes. The results demonstrate the effectiveness of quantitative disease risk scores derived from rich phenotypic EHR databases to provide a more meaningful characterization of clinical risk for diseases of interest beyond the prevalent binary (case-control) classification.

Download Full-text

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Download Full-text

Biallelic variants in COPB1 cause a novel, severe intellectual disability syndrome with cataracts and variable microcephaly

Genome Medicine ◽

10.1186/s13073-021-00850-w ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

William L. Macken ◽

Annie Godwin ◽

Gabrielle Wheway ◽

Karen Stals ◽

Liliya Nazlamova ◽

...

Keyword(s):

Genome Sequencing ◽

Donor Site ◽

Xenopus Tropicalis ◽

Splice Donor Site ◽

Homologous Region ◽

Disease Genes ◽

Coat Proteins ◽

Splice Donor ◽

Severe Intellectual Disability

Abstract Background Coat protein complex 1 (COPI) is integral in the sorting and retrograde trafficking of proteins and lipids from the Golgi apparatus to the endoplasmic reticulum (ER). In recent years, coat proteins have been implicated in human diseases known collectively as “coatopathies”. Methods Whole exome or genome sequencing of two families with a neuro-developmental syndrome, variable microcephaly and cataracts revealed biallelic variants in COPB1, which encodes the beta-subunit of COPI (β-COP). To investigate Family 1’s splice donor site variant, we undertook patient blood RNA studies and CRISPR/Cas9 modelling of this variant in a homologous region of the Xenopus tropicalis genome. To investigate Family 2’s missense variant, we studied cellular phenotypes of human retinal epithelium and embryonic kidney cell lines transfected with a COPB1 expression vector into which we had introduced Family 2’s mutation. Results We present a new recessive coatopathy typified by severe developmental delay and cataracts and variable microcephaly. A homozygous splice donor site variant in Family 1 results in two aberrant transcripts, one of which causes skipping of exon 8 in COPB1 pre-mRNA, and a 36 amino acid in-frame deletion, resulting in the loss of a motif at a small interaction interface between β-COP and β’-COP. Xenopus tropicalis animals with a homologous mutation, introduced by CRISPR/Cas9 genome editing, recapitulate features of the human syndrome including microcephaly and cataracts. In vitro modelling of the COPB1 c.1651T>G p.Phe551Val variant in Family 2 identifies defective Golgi to ER recycling of this mutant β-COP, with the mutant protein being retarded in the Golgi. Conclusions This adds to the growing body of evidence that COPI subunits are essential in brain development and human health and underlines the utility of exome and genome sequencing coupled with Xenopus tropicalis CRISPR/Cas modelling for the identification and characterisation of novel rare disease genes.

Download Full-text

“Guilt by association” is not competitive with genetic association for identifying autism risk genes

Scientific Reports ◽

10.1038/s41598-021-95321-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Margot Gunning ◽

Paul Pavlidis

Keyword(s):

Machine Learning ◽

Genetic Association ◽

Gene Networks ◽

Rare Variants ◽

Association Studies ◽

Genetic Disorders ◽

Autism Spectrum ◽

Biological Data ◽

Disease Genes ◽

Risk Genes

AbstractDiscovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.

Download Full-text