Ultra-rare variants drive substantial cis-heritability of human gene expression

ABSTRACTThe vast majority of human mutations have minor allele frequencies (MAF) under 1%, with the plurality observed only once (i.e., “singletons”). While Mendelian diseases are predominantly caused by rare alleles, their cumulative contribution to complex phenotypes remains largely unknown. We develop and rigorously validate an approach to jointly estimate the contribution of all alleles, including singletons, to phenotypic variation. We apply our approach to transcriptional regulation, an intermediate between genetic variation and complex disease. Using whole genome DNA and lymphoblastoid cell line RNA sequencing data from 360 European individuals, we conservatively estimate that singletons contribute ~25% of cis-heritability across genes (dwarfing the contributions of other frequencies). Strikingly, the majority (~76%) of singleton heritability derives from ultra-rare variants absent from thousands of additional samples. We develop a novel inference procedure to demonstrate that our results are consistent with rampant purifying selection shaping the regulatory architecture of most human genes.

Download Full-text

An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People

Science ◽

10.1126/science.1217876 ◽

2012 ◽

Vol 337 (6090) ◽

pp. 100-104 ◽

Cited By ~ 488

Author(s):

Matthew R. Nelson ◽

Daniel Wegmann ◽

Margaret G. Ehm ◽

Darren Kessner ◽

Pamela St. Jean ◽

...

Keyword(s):

Population Growth ◽

Drug Targets ◽

Complex Disease ◽

Target Genes ◽

Rare Variants ◽

Disease Risk ◽

Growth Parameters ◽

Purifying Selection ◽

Human Populations ◽

Functional Variants

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.

Download Full-text

Faculty Opinions recommendation of Family-based association test using both common and rare variants and accounting for directions of effects for sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718882382.793500875 ◽

2014 ◽

Author(s):

Melanie Bahlo

Keyword(s):

Rare Variants ◽

Association Test ◽

Sequencing Data ◽

Family Based

Download Full-text

Human-lineage-specific genomic elements are associated with neurodegenerative disease and APOE transcript usage

Nature Communications ◽

10.1038/s41467-021-22262-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zhongbo Chen ◽

◽

David Zhang ◽

Regina H. Reynolds ◽

Emil K. Gustavsson ◽

...

Keyword(s):

Neurological Diseases ◽

Purifying Selection ◽

Whole Genome Sequencing Data ◽

Human Lineage ◽

Sequencing Data ◽

Protein Coding ◽

Potential Association ◽

High Depth ◽

Specific Sequences ◽

Human Specific

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.

Download Full-text

Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1705859115 ◽

2017 ◽

Vol 115 (2) ◽

pp. 379-384 ◽

Cited By ~ 17

Author(s):

Goo Jun ◽

Alisa Manning ◽

Marcio Almeida ◽

Matthew Zawistowski ◽

Andrew R. Wood ◽

...

Keyword(s):

Type 2 Diabetes ◽

Mexican American ◽

Complex Disease ◽

Rare Variants ◽

Whole Genome Analysis ◽

Mexican American Families ◽

Extended Pedigrees ◽

High Prevalence ◽

Study Designs

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.

Download Full-text

Exome-Wide Pan-Cancer Analysis of Germline Variants in 8,719 Individuals Finds Little Evidence of Rare Variant Associations

Human Heredity ◽

10.1159/000519355 ◽

2021 ◽

pp. 1-10

Author(s):

Zoe Guan ◽

Ronglai Shen ◽

Colin B. Begg

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Association Studies ◽

The Cancer Genome Atlas ◽

Considerable Proportion ◽

Genome Wide Association Studies ◽

Sequencing Data ◽

Risk Variants ◽

Cancer Types ◽

Pan Cancer

Background: Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The “rare variant hypothesis” proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. Objectives: In this study, we investigated associations between rare variants and 14 cancer types. Methods: We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). Results: We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). Conclusions: Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.

Download Full-text

Ancestry-dependent Enrichment of Deleterious Homozygotes in Runs of Homozygosity

10.1101/382721 ◽

2018 ◽

Author(s):

Zachary A. Szpiech ◽

Angel C.Y. Mak ◽

Marquitta J. White ◽

Donglei Hu ◽

Celeste Eng ◽

...

Keyword(s):

Native American ◽

Mexican American ◽

Complex Disease ◽

Disease Risk ◽

African Ancestry ◽

Population History ◽

Whole Genome Sequencing Data ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Local Ancestry

AbstractRuns of homozygosity (ROH) are important genomic features that manifest when an individual inherits two haplotypes that are identical-by-descent. Their length distributions are informative about population history, and their genomic locations are useful for mapping recessive loci contributing to both Mendelian and complex disease risk. We have previously shown that ROH, and especially long ROH that are likely the result of recent parental relatedness, are enriched for homozygous deleterious coding variation in a worldwide sample of outbred individuals. However, the distribution of ROH in admixed populations and their relationship to deleterious homozygous genotypes is understudied. Here we analyze whole genome sequencing data from 1,441 individuals from self-identified African American, Puerto Rican, and Mexican American populations. These populations are three-way admixed between European, African, and Native American ancestries and provide an opportunity to study the distribution of deleterious alleles partitioned by local ancestry and ROH. We re-capitulate previous findings that long ROH are enriched for deleterious variation genome-wide. We then partition by local ancestry and show that deleterious homozygotes arise at a higher rate when ROH overlap African ancestry segments than when they overlap European or Native American ancestry segments of the genome. These results suggest that, while ROH on any haplotype background are associated with an inflation of deleterious homozygous variation, African haplotype backgrounds may play a particularly important role in the genetic architecture of complex diseases for admixed individuals, highlighting the need for further study of these populations.

Download Full-text

Divergent and convergent evolution of housekeeping genes in human–pig lineage

PeerJ ◽

10.7717/peerj.4840 ◽

2018 ◽

Vol 6 ◽

pp. e4840 ◽

Cited By ~ 4

Author(s):

Kai Wei ◽

Tingting Zhang ◽

Lei Ma

Keyword(s):

Active Sites ◽

Evolutionary Dynamics ◽

Purifying Selection ◽

Housekeeping Genes ◽

Neutral Evolution ◽

Structure Evolution ◽

Tissue Cell ◽

Sequencing Data ◽

Cellular Functions ◽

Species Specific

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.

Download Full-text

Efficient phasing and imputation of low-coverage sequencing data using large reference panels

10.1101/2020.04.14.040329 ◽

2020 ◽

Cited By ~ 2

Author(s):

S. Rubinacci ◽

D.M. Ribeiro ◽

R. Hofmeister ◽

O. Delaneau

Keyword(s):

Paradigm Shift ◽

Rare Variants ◽

Association Studies ◽

Cost Effective ◽

Human Populations ◽

Sequencing Data ◽

Snp Arrays ◽

Genomic Studies ◽

Low Coverage ◽

The Impact

AbstractLow-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.

Download Full-text

Molecular population genetics of sequence length diversity in the Adh region of Drosophila pseudoobscura

Genetics Research ◽

10.1017/s0016672302005955 ◽

2002 ◽

Vol 80 (3) ◽

pp. 163-175 ◽

Cited By ~ 48

Author(s):

STEPHEN W. SCHAEFFER

Keyword(s):

Negative Selection ◽

Genome Rearrangement ◽

Rare Variants ◽

Natural Populations ◽

Population Expansion ◽

Purifying Selection ◽

Sequence Length ◽

Nucleotide Sequence Analysis ◽

Drosophila Pseudoobscura ◽

Indel Variation

Positive and negative selection on indel variation may explain the correlation between intron length and recombination levels in natural populations of Drosophila. A nucleotide sequence analysis of the 3·5 kilobase sequence of the alcohol dehydrogenase (Adh) region from 139 Drosophila pseudoobscura strains and one D. miranda strain was used to determine whether positive or negative selection acts on indel variation in a gene that experiences high levels of recombination. A total of 30 deletion and 36 insertion polymorphisms were segregating within D. pseudoobscura populations and no indels were fixed between D. pseudoobscura and its two sibling species D. miranda and D. persimilis. The ratio of Tajima's D to its theoretical minimum value (Dmin) was proposed as a metric to assess the heterogeneity in D among D. pseudoobscura loci when the number of segregating sites differs among loci. The magnitude of the D/Dmin ratio was found to increase as the rate of population expansion increases, allowing one to assess which loci have an excess of rare variants due to population expansion versus purifying selection. D. pseudoobscura populations appear to have had modest increases in size accounting for some of the observed excess of rare variants. The D/Dmin ratio rejected a neutral model for deletion polymorphisms. Linkage disequilibrium among pairs of indels was greater than between pairs of segregating nucleotides. These results suggest that purifying selection removes deletion variation from intron sequences, but not insertion polymorphisms. Genome rearrangement and size-dependent intron evolution are proposed as mechanisms that limit runaway intron expansion.

Download Full-text

Regions of Lower Crossing Over Harbor More Rare Variants in African Populations ofDrosophila melanogaster

Genetics ◽

10.1093/genetics/158.2.657 ◽

2001 ◽

Vol 158 (2) ◽

pp. 657-665 ◽

Cited By ~ 15

Author(s):

Peter Andolfatto ◽

Molly Przeworski

Keyword(s):

Frequency Spectrum ◽

Nucleotide Diversity ◽

Rare Variants ◽

Purifying Selection ◽

Crossing Over ◽

Background Selection ◽

Deleterious Mutations ◽

African Populations ◽

Rapid Fixation ◽

Parameter Values

AbstractA correlation between diversity levels and rates of recombination is predicted both by models of positive selection, such as hitchhiking associated with the rapid fixation of advantageous mutations, and by models of purifying selection against strongly deleterious mutations (commonly referred to as “background selection”). With parameter values appropriate for Drosophila populations, only the first class of models predicts a marked skew in the frequency spectrum of linked neutral variants, relative to a neutral model. Here, we consider 29 loci scattered throughout the Drosophila melanogaster genome. We show that, in African populations, a summary of the frequency spectrum of polymorphic mutations is positively correlated with the meiotic rate of crossing over. This pattern is demonstrated to be unlikely under a model of background selection. Models of weakly deleterious selection are not expected to produce both the observed correlation and the extent to which nucleotide diversity is reduced in regions of low (but nonzero) recombination. Thus, of existing models, hitchhiking due to the recurrent fixation of advantageous variants is the most plausible explanation for the data.

Download Full-text