Decreased recent adaptation at human mendelian disease genes as a possible consequence of interference between advantageous and deleterious variants

Advances in genome sequencing have dramatically improved our understanding of the genetic basis of human diseases, and thousands of human genes have been associated with different diseases. Despite our expanding knowledge of gene-disease associations, and despite the medical importance of disease genes, their recent evolution has not been thoroughly studied across diverse human populations. In particular, recent genomic adaptation at disease genes has not been characterized as well as purifying selection and long-term adaptation. Understanding the relationship between disease and adaptation at the gene level in the human genome is hampered by the fact that we don’t know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during the last ~50,000 years of recent human evolution. Here, we compare the rate of strong recent adaptation in the form of selective sweeps between mendelian, non-infectious disease genes and non-disease genes across 26 distinct human populations from the 1,000 Genomes Project. We find that mendelian disease genes have experienced far less selective sweeps compared to non-disease genes especially in Africa. This sweep deficit at mendelian disease genes is less visible in East Asia or Europe. Investigating further the possible causes of the sweep deficit at disease genes, we find that this deficit is very strong at disease genes with both low recombination rates and with high numbers of associated disease variants, but is almost non-existent at disease genes with higher recombination rates or lower numbers of associated disease variants. Because segregating recessive deleterious variants have the ability to interfere with adaptive ones, these observations strongly suggest that adaptation has been slowed down by the presence of interfering recessive deleterious variants at disease genes. This is further supported by population simulations that show that interference at disease genes is expected to be lower in East Asia and Europe. These results clarify the evolutionary relationship between disease genes and recent genomic adaptation, and suggest that disease genes suffer not only from a higher load of segregating deleterious mutations, but also from a transient inability to adapt as much, and/or as fast as the rest of the genome.

Download Full-text

Decreased adaptation at human disease genes as a possible consequence of interference between advantageous and deleterious variants

10.1101/2021.03.31.437959 ◽

2021 ◽

Author(s):

Chenlu Di ◽

Diego Salazar Tortosa ◽

M Elise Lauterbur ◽

David Enard

Keyword(s):

Human Evolution ◽

Evolutionary Relationship ◽

Disease Genes ◽

Human Populations ◽

Selective Sweeps ◽

Recombination Rates ◽

Human Genes ◽

Gene Level ◽

Disease Associations ◽

Genomic Adaptation

Advances in genome sequencing have dramatically improved our understanding of the genetic basis of human diseases, and thousands of human genes have been associated with different diseases. Despite our expanding knowledge of gene-disease associations, and despite the medical importance of disease genes, their evolution has not been thoroughly studied across diverse human populations. In particular, recent genomic adaptation at disease genes has not been well characterized, even though multiple evolutionary processes are expected to connect disease and adaptation at the gene level. Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we do not even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution. Here, we compare the rate of strong recent adaptation in the form of selective sweeps between disease genes and non-disease genes across 26 distinct human populations from the 1,000 Genomes Project. We find that disease genes have experienced far less selective sweeps compared to non-disease genes during recent human evolution. This sweep deficit at disease genes is particularly visible in Africa, and less visible in East Asia or Europe, likely due to more intense genetic drift in the latter populations creating more spurious selective sweeps signals. Investigating further the possible causes of the sweep deficit at disease genes, we find that this deficit is very strong at disease genes with both low recombination rates and with high numbers of associated disease variants, but is inexistent at disease genes with higher recombination rates or lower numbers of associated disease variants. Because recessive deleterious variants have the ability to interfere with adaptive ones, these observations strongly suggest that adaptation has been slowed down by the presence of interfering recessive deleterious variants at disease genes. These results clarify the evolutionary relationship between disease genes and recent genomic adaptation, and suggest that disease genes suffer not only from a higher load of segregating deleterious mutations, but also from an inability to adapt as much, and/or as fast as the rest of the genome.

Download Full-text

The population genetics of human disease: the case of recessive, lethal mutations

10.1101/091579 ◽

2016 ◽

Author(s):

Carlos Eduardo G. Amorim ◽

Ziyue Gao ◽

Zachary Baker ◽

José Francisco Diesel ◽

Yuval B. Simons ◽

...

Keyword(s):

Balancing Selection ◽

Finite Size ◽

Purifying Selection ◽

Ascertainment Bias ◽

European Ancestry ◽

Human Populations ◽

Mendelian Disease ◽

Recessive Lethal ◽

Disease Mutations ◽

Order Of Magnitude

AbstractDo the frequencies of disease mutations in human populations reflect a simple balance between mutation and purifying selection? What other factors shape the prevalence of disease mutations? To begin to answer these questions, we focused on one of the simplest cases: recessive mutations that alone cause lethal diseases or complete sterility. To this end, we generated a hand-curated set of 417 Mendelian mutations in 32 genes, reported to cause a recessive, lethal Mendelian disease. We then considered analytic models of mutation-selection balance in infinite and finite populations of constant sizes and simulations of purifying selection in a more realistic demographic setting, and tested how well these models fit allele frequencies estimated from 33,370 individuals of European ancestry. In doing so, we distinguished between CpG transitions, which occur at a substantially elevated rate, and three other mutation types. The observed frequency for CpG transitions is slightly higher than expectation but close, whereas the frequencies observed for the three other mutation types are an order of magnitude higher than expected. This discrepancy is even larger when subtle fitness effects in heterozygotes or lethal compound heterozygotes are taken into account. In principle, higher than expected frequencies of disease mutations could be due to widespread errors in reporting causal variants, compensation by other mutations, or balancing selection. It is unclear why these factors would have a greater impact on variants with lower mutation rates, however. We argue instead that the unexpectedly high frequency of disease mutations and the relationship to the mutation rate likely reflect an ascertainment bias: of all the mutations that cause recessive lethal diseases, those that by chance have reached higher frequencies are more likely to have been identified and thus to have been included in this study. Beyond the specific application, this study highlights the parameters likely to be important in shaping the frequencies of Mendelian disease alleles.Author SummaryWhat determines the frequencies of disease mutations in human populations? To begin to answer this question, we focus on one of the simplest cases: mutations that cause completely recessive, lethal Mendelian diseases. We first review theory about what to expect from mutation and selection in a population of finite size and further generate predictions based on simulations using a realistic demographic scenario of human evolution. For a highly mutable type of mutations, such as transitions at CpG sites, we find that the predictions are close to the observed frequencies of recessive lethal disease mutations. For less mutable types, however, predictions substantially under-estimate the observed frequency. We discuss possible explanations for the discrepancy and point to a complication that, to our knowledge, is not widely appreciated: that there exists ascertainment bias in disease mutation discovery. Specifically, we suggest that alleles that have been identified to date are likely the ones that by chance have reached higher frequencies and are thus more likely to have been mapped. More generally, our study highlights the factors that influence the frequencies of Mendelian disease alleles.

Download Full-text

Estimating the Selective Effect of Heterozygous Protein Truncating Variants from Human Exome Data

10.1101/075523 ◽

2016 ◽

Cited By ~ 1

Author(s):

Christopher A. Cassa ◽

Donate Weghorn ◽

Daniel J. Balick ◽

Daniel M. Jordan ◽

David Nusinow ◽

...

Keyword(s):

Age Of Onset ◽

Sequence Data ◽

Human Tumor ◽

Purifying Selection ◽

Wide Distribution ◽

Disease Genes ◽

Mendelian Disease ◽

Large Set ◽

Critical Function ◽

Genome Wide

The dispensability of individual genes for viability has interested generations of geneticists. For some genes it is essential to maintain two functional chromosomal copies, while other genes may tolerate the loss of one or both copies. Exome sequence data from 60,706 individuals provide sufficient observations of rare protein truncating variants (PTVs) to make genome-wide estimates of selection against heterozygous loss of gene function. The cumulative frequency of rare deleterious PTVs is primarily determined by the balance between incoming mutations and purifying selection rather than genetic drift. This enables the estimation of the genome-wide distribution of selection coefficients for heterozygous PTVs and corresponding Bayesian estimates for individual genes. The strength of selection can help discriminate the severity, age of onset, and mode of inheritance in Mendelian exome sequencing cases. We find that genes under the strongest selection are enriched in embryonic lethal mouse knockouts, putatively cell-essential genes inferred from human tumor cells, Mendelian disease genes, and regulators of transcription. Using an essentiality screen, we find a large set of genes under strong selection that are likely to have critical function but that have not yet been studied extensively.

Download Full-text

Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1512501113 ◽

2015 ◽

Vol 113 (1) ◽

pp. 152-157 ◽

Cited By ~ 162

Author(s):

Clare D. Marsden ◽

Diego Ortega-Del Vecchyo ◽

Dennis P. O’Brien ◽

Jeremy F. Taylor ◽

Oscar Ramirez ◽

...

Keyword(s):

Genetic Variation ◽

Amino Acid ◽

Artificial Selection ◽

Selective Breeding ◽

Large Population ◽

Disease Genes ◽

Mendelian Disease ◽

Selective Sweeps ◽

Gray Wolves ◽

Genome Wide

Population bottlenecks, inbreeding, and artificial selection can all, in principle, influence levels of deleterious genetic variation. However, the relative importance of each of these effects on genome-wide patterns of deleterious variation remains controversial. Domestic and wild canids offer a powerful system to address the role of these factors in influencing deleterious variation because their history is dominated by known bottlenecks and intense artificial selection. Here, we assess genome-wide patterns of deleterious variation in 90 whole-genome sequences from breed dogs, village dogs, and gray wolves. We find that the ratio of amino acid changing heterozygosity to silent heterozygosity is higher in dogs than in wolves and, on average, dogs have 2–3% higher genetic load than gray wolves. Multiple lines of evidence indicate this pattern is driven by less efficient natural selection due to bottlenecks associated with domestication and breed formation, rather than recent inbreeding. Further, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. To our knowledge, these results provide the first quantitative estimates of the increased burden of deleterious variants directly associated with domestication and have important implications for selective breeding programs and the conservation of rare and endangered species. Specifically, they highlight the costs associated with selective breeding and question the practice favoring the breeding of individuals that best fit breed standards. Our results also suggest that maintaining a large population size, rather than just avoiding inbreeding, is a critical factor for preventing the accumulation of deleterious variants.

Download Full-text

Genomic Insights into the Formation of Human Populations in East Asia

Nature ◽

10.1038/s41586-021-03336-2 ◽

2021 ◽

Author(s):

Chuan-Chao Wang ◽

Hui-Yuan Yeh ◽

Alexander N. Popov ◽

Hu-Qin Zhang ◽

Hirofumi Matsumura ◽

...

Keyword(s):

East Asia ◽

Human Populations

Download Full-text

Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network

Bioinformatics ◽

10.1093/bioinformatics/btq108 ◽

2010 ◽

Vol 26 (9) ◽

pp. 1219-1224 ◽

Cited By ~ 238

Author(s):

Yongjin Li ◽

Jagdish C. Patra

Keyword(s):

Heterogeneous Network ◽

Gene Network ◽

Genetic Diseases ◽

Supplementary Information ◽

Disease Genes ◽

Phenotypic Data ◽

Disease Associations ◽

Improved Performance ◽

Leave One Out ◽

Phenotype Network

Abstract Motivation: Clinical diseases are characterized by distinct phenotypes. To identify disease genes is to elucidate the gene–phenotype relationships. Mutations in functionally related genes may result in similar phenotypes. It is reasonable to predict disease-causing genes by integrating phenotypic data and genomic data. Some genetic diseases are genetically or phenotypically similar. They may share the common pathogenetic mechanisms. Identifying the relationship between diseases will facilitate better understanding of the pathogenetic mechanism of diseases. Results: In this article, we constructed a heterogeneous network by connecting the gene network and phenotype network using the phenotype–gene relationship information from the OMIM database. We extended the random walk with restart algorithm to the heterogeneous network. The algorithm prioritizes the genes and phenotypes simultaneously. We use leave-one-out cross-validation to evaluate the ability of finding the gene–phenotype relationship. Results showed improved performance than previous works. We also used the algorithm to disclose hidden disease associations that cannot be found by gene network or phenotype network alone. We identified 18 hidden disease associations, most of which were supported by literature evidence. Availability: The MATLAB code of the program is available at http://www3.ntu.edu.sg/home/aspatra/research/Yongjin_BI2010.zip Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Effects of Selection at Linked Sites on Patterns of Genetic Variability

Annual Review of Ecology Evolution and Systematics ◽

10.1146/annurev-ecolsys-010621-044528 ◽

2021 ◽

Vol 52 (1) ◽

pp. 177-197

Author(s):

Brian Charlesworth ◽

Jeffrey D. Jensen

Keyword(s):

Genetic Variability ◽

Population Genetic ◽

Selective Sweeps ◽

Recombination Rates ◽

Frequency Distributions ◽

A Genome ◽

Demographic Processes ◽

Dna Sequence Variants ◽

Genomic Regions ◽

Functional Components

Patterns of variation and evolution at a given site in a genome can be strongly influenced by the effects of selection at genetically linked sites. In particular, the recombination rates of genomic regions correlate with their amount of within-population genetic variability, the degree to which the frequency distributions of DNA sequence variants differ from their neutral expectations, and the levels of adaptation of their functional components. We review the major population genetic processes that are thought to lead to these patterns, focusing on their effects on patterns of variability: selective sweeps, background selection, associative overdominance, and Hill–Robertson interference among deleterious mutations. We emphasize the difficulties in distinguishing among the footprints of these processes and disentangling them from the effects of purely demographic factors such as population size changes. We also discuss how interactions between selective and demographic processes can significantly affect patterns of variability within genomes.

Download Full-text

Use of isolated inbred human populations for identification of disease genes

Trends in Genetics ◽

10.1016/s0168-9525(98)01556-x ◽

1998 ◽

Vol 14 (10) ◽

pp. 391-396 ◽

Cited By ~ 80

Author(s):

Val C Sheffield ◽

Edwin M Stone ◽

Rivka Carmi

Keyword(s):

Disease Genes ◽

Human Populations

Download Full-text

Transfer RNA genes experience exceptionally elevated mutation rates

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1801240115 ◽

2018 ◽

Vol 115 (36) ◽

pp. 8996-9001 ◽

Cited By ~ 12

Author(s):

Bryan P. Thornlow ◽

Josh Hough ◽

Jacquelyn M. Roger ◽

Henry Gong ◽

Todd M. Lowe ◽

...

Keyword(s):

Mutation Rate ◽

Trna Gene ◽

Gene Evolution ◽

Purifying Selection ◽

Mutation Rates ◽

Model Organisms ◽

Biological Synthesis ◽

Human Populations ◽

Trna Genes ◽

Simple Method

Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations.

Download Full-text

MOSES: A New Approach to Integrate Interactome Topology and Functional Features for Disease Gene Prediction

Genes ◽

10.3390/genes12111713 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1713

Author(s):

Manuela Petti ◽

Lorenzo Farina ◽

Federico Francone ◽

Stefano Lucidi ◽

Amalia Macali ◽

...

Keyword(s):

Network Topology ◽

Disease Gene ◽

Gene Prediction ◽

Knowledge Bases ◽

Biological Knowledge ◽

Disease Genes ◽

Human Interactome ◽

Disease Gene Prediction ◽

Disease Associations ◽

Functional Features

Disease gene prediction is to date one of the main computational challenges of precision medicine. It is still uncertain if disease genes have unique functional properties that distinguish them from other non-disease genes or, from a network perspective, if they are located randomly in the interactome or show specific patterns in the network topology. In this study, we propose a new method for disease gene prediction based on the use of biological knowledge-bases (gene-disease associations, genes functional annotations, etc.) and interactome network topology. The proposed algorithm called MOSES is based on the definition of two somewhat opposing sets of genes both disease-specific from different perspectives: warm seeds (i.e., disease genes obtained from databases) and cold seeds (genes far from the disease genes on the interactome and not involved in their biological functions). The application of MOSES to a set of 40 diseases showed that the suggested putative disease genes are significantly enriched in their reference disease. Reassuringly, known and predicted disease genes together, tend to form a connected network module on the human interactome, mitigating the scattered distribution of disease genes which is probably due to both the paucity of disease-gene associations and the incompleteness of the interactome.

Download Full-text