scholarly journals The impact of rare variation on gene expression across tissues

2016 ◽  
Author(s):  
Xin Li ◽  
Yungil Kim ◽  
Emily K. Tsang ◽  
Joe R. Davis ◽  
Farhan N. Damani ◽  
...  

AbstractRare genetic variants are abundant in humans yet their functional effects are often unknown and challenging to predict. The Genotype-Tissue Expression (GTEx) project provides a unique opportunity to identify the functional impact of rare variants through combined analyses of whole genomes and multi-tissue RNA-sequencing data. Here, we identify gene expression outliers, or individuals with extreme expression levels, across 44 human tissues, and characterize the contribution of rare variation to these large changes in expression. We find 58% of underexpression and 28% of overexpression outliers have underlying rare variants compared with 9% of non-outliers. Large expression effects are enriched for proximal loss-of-function, splicing, and structural variants, particularly variants near the TSS and at evolutionarily conserved sites. Known disease genes have expression outliers, underscoring that rare variants can contribute to genetic disease risk. To prioritize functional rare regulatory variants, we develop RIVER, a Bayesian approach that integrates RNA and whole genome sequencing data from the same individual. RIVER predicts functional variants significantly better than models using genomic annotations alone, and is an extensible tool for personal genome interpretation. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues with potential health consequences, and provide an integrative method for interpreting rare variants in individual genomes.

Nature ◽  
2017 ◽  
Vol 550 (7675) ◽  
pp. 239-243 ◽  
Author(s):  
Xin Li ◽  
◽  
Yungil Kim ◽  
Emily K. Tsang ◽  
Joe R. Davis ◽  
...  

Abstract Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk1,2,3,4. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants1,5. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles1,6,7, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues8,9,10,11, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release12. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Chun-Yu Wei ◽  
Jenn-Hwai Yang ◽  
Erh-Chan Yeh ◽  
Ming-Fang Tsai ◽  
Hsiao-Jung Kao ◽  
...  

AbstractPersonalized medical care focuses on prediction of disease risk and response to medications. To build the risk models, access to both large-scale genomic resources and human genetic studies is required. The Taiwan Biobank (TWB) has generated high-coverage, whole-genome sequencing data from 1492 individuals and genome-wide SNP data from 103,106 individuals of Han Chinese ancestry using custom SNP arrays. Principal components analysis of the genotyping data showed that the full range of Han Chinese genetic variation was found in the cohort. The arrays also include thousands of known functional variants, allowing for simultaneous ascertainment of Mendelian disease-causing mutations and variants that affect drug metabolism. We found that 21.2% of the population are mutation carriers of autosomal recessive diseases, 3.1% have mutations in cancer-predisposing genes, and 87.3% carry variants that affect drug response. We highlight how TWB data provide insight into both population history and disease burden, while showing how widespread genetic testing can be used to improve clinical care.


2021 ◽  
Vol 11 (2) ◽  
pp. 131
Author(s):  
Laura B. Scheinfeldt ◽  
Andrew Brangan ◽  
Dara M. Kusic ◽  
Sudhir Kumar ◽  
Neda Gharani

Pharmacogenomics holds the promise of personalized drug efficacy optimization and drug toxicity minimization. Much of the research conducted to date, however, suffers from an ascertainment bias towards European participants. Here, we leverage publicly available, whole genome sequencing data collected from global populations, evolutionary characteristics, and annotated protein features to construct a new in silico machine learning pharmacogenetic identification method called XGB-PGX. When applied to pharmacogenetic data, XGB-PGX outperformed all existing prediction methods and identified over 2000 new pharmacogenetic variants. While there are modest pharmacogenetic allele frequency distribution differences across global population samples, the most striking distinction is between the relatively rare putatively neutral pharmacogene variants and the relatively common established and newly predicted functional pharamacogenetic variants. Our findings therefore support a focus on individual patient pharmacogenetic testing rather than on clinical presumptions about patient race, ethnicity, or ancestral geographic residence. We further encourage more attention be given to the impact of common variation on drug response and propose a new ‘common treatment, common variant’ perspective for pharmacogenetic prediction that is distinct from the types of variation that underlie complex and Mendelian disease. XGB-PGX has identified many new pharmacovariants that are present across all global communities; however, communities that have been underrepresented in genomic research are likely to benefit the most from XGB-PGX’s in silico predictions.


2018 ◽  
Author(s):  
Zachary A. Szpiech ◽  
Angel C.Y. Mak ◽  
Marquitta J. White ◽  
Donglei Hu ◽  
Celeste Eng ◽  
...  

AbstractRuns of homozygosity (ROH) are important genomic features that manifest when an individual inherits two haplotypes that are identical-by-descent. Their length distributions are informative about population history, and their genomic locations are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. We have previously shown that ROH, and especially long ROH that are likely the result of recent parental relatedness, are enriched for homozygous deleterious coding variation in a worldwide sample of outbred individuals. However, the distribution of ROH in admixed populations and their relationship to deleterious homozygous genotypes is understudied. Here we analyze whole genome sequencing data from 1,441 individuals from self-identified African American, Puerto Rican, and Mexican American populations. These populations are three-way admixed between European, African, and Native American ancestries and provide an opportunity to study the distribution of deleterious alleles partitioned by local ancestry and ROH. We re-capitulate previous findings that long ROH are enriched for deleterious variation genome-wide. We then partition by local ancestry and show that deleterious homozygotes arise at a higher rate when ROH overlap African ancestry segments than when they overlap European or Native American ancestry segments of the genome. These results suggest that, while ROH on any haplotype background are associated with an inflation of deleterious homozygous variation, African haplotype backgrounds may play a particularly important role in the genetic architecture of complex diseases for admixed individuals, highlighting the need for further study of these populations.


2020 ◽  
Author(s):  
Andrew C. Bishop ◽  
Kimberly D. Spradling-Reeves ◽  
Robert E. Shade ◽  
Kenneth J. Lange ◽  
Shifra Birnbaum ◽  
...  

AbstractBackgroundPoor nutrition during development programs kidney function. No studies on postnatal consequences of decreased perinatal nutrition exist in nonhuman primates (NHP) for translation to human renal disease. Our baboon model of moderate maternal nutrient restriction (MNR) produces intrauterine growth restricted (IUGR) and programs renal fetal phenotype. We hypothesized that the IUGR phenotype persists postnatally, influencing responses to a high-fat, high-carbohydrate, high-salt (HFCS) diet.MethodsPregnant baboons ate chow (Control; CON) or 70% of control intake (MNR) from 0.16 gestation through lactation. MNR offspring were IUGR at birth. At weaning, all offspring (CON and IUGR females and males, n=3/group) ate chow. At ~4.5 years of age, blood, urine, and kidney biopsies were collected before and after a 7-week HFCS diet challenge. Kidney function, unbiased kidney gene expression, and untargeted urine metabolomics were evaluated.ResultsIUGR female and male kidney transcriptome and urine metabolome differed from CON at 3.5 years, prior to HFCS. After the challenge, we observed sex-specific and fetal exposure-specific responses in urine creatinine, urine metabolites, and renal signaling pathways.ConclusionsWe previously showed mTOR signaling dysregulation in IUGR fetal kidneys. Before HFCS, gene expression analysis indicated that dysregulation persists postnatally in IUGR females. IUGR male offspring response to HFCS showed uncoordinated signaling pathway responses suggestive of proximal tubule injury. To our knowledge, this is the first study comparing CON and IUGR postnatal juvenile NHP and the impact of fetal and postnatal life caloric mismatch. Perinatal history needs to be taken into account when assessing renal disease risk.


Author(s):  
Gregory McInnes ◽  
Andrew G. Sharo ◽  
Megan L. Koleske ◽  
Julie E. H. Brown ◽  
Matthew Norstad ◽  
...  

Genome sequencing is enabling precision medicine—tailoring treatment to the unique constellation of variants in an individual’s genome. The impact of recurrent pathogenic variants is often understood, leaving a long tail of rare genetic variants that are uncharacterized. The problem of uncharacterized rare variation is especially acute when it occurs in genes of known clinical importance with functionally consequent frequent variants and associated mechanisms. Variants of unknown significance (VUS) in these genes are discovered at a rate that outpaces current ability to classify them using databases of previous cases, experimental evaluation, and computational predictors. Clinicians are thus left without guidance about the significance of variants that may have actionable consequences. Computational prediction of the impact of rare genetic variation is increasingly becoming an important capability. In this paper, we review the technical and ethical challenges of interpreting the function of rare variants in two settings: inborn errors of metabolism in newborns, and pharmacogenomics. We propose a framework for a genomic learning healthcare system with an initial focus on early-onset treatable disease in newborns and actionable pharmacogenomics. We argue that (1) a genomic learning healthcare system must allow for continuous collection and assessment of rare variants, (2) emerging machine learning methods will enable algorithms to predict the clinical impact of rare variants on protein function, and (3) ethical considerations must inform the construction and deployment of all rare-variation triage strategies, particularly with respect to health disparities arising from unbalanced ancestry representation.


2020 ◽  
Author(s):  
Ammar Zaghlool ◽  
Adnan Niazi ◽  
Åsa K. Björklund ◽  
Jakub Orzechowski Westholm ◽  
Adam Ameur ◽  
...  

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we performed a comprehensive analysis of cytosolic and nuclear transcriptomes in human fetal and adult brain samples. We show significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. Transcripts displaying differential subcellular localization belong to particular functional categories and display tissue-specific localization patterns. We also show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Further investigation of the use of the cytosolic or the nuclear transcriptome for differential gene expression analysis indicates important differences in results depending on the cellular compartment. These differences were manifested at the level of transcript types and the number of differentially expressed genes. Our data provide a resource of RNA subcellular localization in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for differential expression analysis.


Author(s):  
S. Rubinacci ◽  
D.M. Ribeiro ◽  
R. Hofmeister ◽  
O. Delaneau

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.


2020 ◽  
Vol 127 (9) ◽  
Author(s):  
Jennifer VanOudenhove ◽  
Tara N. Yankee ◽  
Andrea Wilderman ◽  
Justin Cotney

Rationale: There is growing evidence that common variants and rare sequence alterations in regulatory sequences can result in birth defects or predisposition to disease. Congenital heart defects are the most common birth defect and have a clear genetic component, yet only a third of cases can be attributed to structural variation in the genome or a mutation in a gene. The remaining unknown cases could be caused by alterations in regulatory sequences. Objective: Identify regulatory sequences and gene expression networks that are active during organogenesis of the human heart. Determine whether these sites and networks are enriched for disease-relevant genes and associated genetic variation. Methods and Results: We characterized ChromHMM (chromatin state) and gene expression dynamics during human heart organogenesis. We profiled 7 histone modifications in embryonic hearts from each of 9 distinct Carnegie stages (13–14, 16–21, and 23), annotated chromatin states, and compared these maps to over 100 human tissues and cell types. We also generated RNA-sequencing data, performed differential expression, and constructed weighted gene coexpression networks. We identified 177 412 heart enhancers; 12 395 had not been previously annotated as strong enhancers. We identified 92% of all functionally validated heart-positive enhancers (n=281; 7.5× enrichment; P <2.2×10 −16 ). Integration of these data demonstrated novel heart enhancers are enriched near genes expressed more strongly in cardiac tissue and are enriched for variants associated with ECG measures and atrial fibrillation. Our gene expression network analysis identified gene modules strongly enriched for heart-related functions, regulatory control by heart-specific enhancers, and putative disease genes. Conclusions: Well-connected hub genes with heart-specific expression targeted by embryonic heart-specific enhancers are likely disease candidates. Our functional annotations will allow for better interpretation of whole genome sequencing data in the large number of patients affected by congenital heart defects.


Sign in / Sign up

Export Citation Format

Share Document