A Bayesian method using sparse data to estimate penetrance of disease-associated genetic variants
AbstractPurposeA major challenge in genomic medicine is how to best predict risk of disease from rare variants discovered in Mendelian disease genes but with limited phenotypic data. We have recently used Bayesian methods to show that in vitro functional measurements and computational pathogenicity classification of variants in the cardiac gene SCN5A correlate with rare arrhythmia penetrance. We hypothesized that similar predictors could be used to impute variant-specific penetrance prior probabilities.MethodsFrom a review of 756 publications, we developed a pattern mixture algorithm, based on a Bayesian Beta-Binomial model, to generate SCN5A variant-specific penetrance priors for the heart arrhythmia Brugada syndrome (BrS).ResultsThe resulting priors correlate with mean BrS penetrance posteriors (cross validated R2= 0.41). SCN5A variant function and structural context provide the most information predictive of BrS penetrance. The resulting priors are interpretable as equivalent to the observation of affected and unaffected carriers.ConclusionsBayesian estimates of penetrance can efficiently integrate variant-specific data (e.g. functional, structural, and sequence) to accurately estimate disease risk attributable to individual variants. We suggest this formulation of penetrance is quantitative, probabilistic, and more precise than, but consistent with, discrete pathogenicity classification approaches.