scholarly journals Crossovers are associated with mutation and biased gene conversion at recombination hotspots

2015 ◽  
Vol 112 (7) ◽  
pp. 2109-2114 ◽  
Author(s):  
Barbara Arbeithuber ◽  
Andrea J. Betancourt ◽  
Thomas Ebner ◽  
Irene Tiemann-Boege

Meiosis is a potentially important source of germline mutations, as sites of meiotic recombination experience recurrent double-strand breaks (DSBs). However, evidence for a local mutagenic effect of recombination from population sequence data has been equivocal, likely because mutation is only one of several forces shaping sequence variation. By sequencing large numbers of single crossover molecules obtained from human sperm for two recombination hotspots, we find direct evidence that recombination is mutagenic: Crossovers carry more de novo mutations than nonrecombinant DNA molecules analyzed for the same donors and hotspots. The observed mutations were primarily CG to TA transitions, with a higher frequency of transitions at CpG than non-CpGs sites. This enrichment of mutations at CpG sites at hotspots could predominate in methylated regions involving frequent single-stranded DNA processing as part of DSB repair. In addition, our data set provides evidence that GC alleles are preferentially transmitted during crossing over, opposing mutation, and shows that GC-biased gene conversion (gBGC) predominates over mutation in the sequence evolution of hotspots. These findings are consistent with the idea that gBGC could be an adaptation to counteract the mutational load of recombination.

Science ◽  
2019 ◽  
Vol 363 (6425) ◽  
pp. eaau1043 ◽  
Author(s):  
Bjarni V. Halldorsson ◽  
Gunnar Palsson ◽  
Olafur A. Stefansson ◽  
Hakon Jonsson ◽  
Marteinn T. Hardarson ◽  
...  

Genetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs. Crossovers exhibit a mutagenic effect, with overrepresentation of DNMs within 1 kilobase of crossovers in males and females. In females, a higher mutation rate is observed up to 40 kilobases from crossovers, particularly for complex crossovers, which increase with maternal age. We identified 35 loci associated with the recombination rate or the location of crossovers, demonstrating extensive genetic control of meiotic recombination, and our results highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.


2018 ◽  
Author(s):  
Toni I. Gossmann ◽  
Mathias Bockwoldt ◽  
Lilith Diringer ◽  
Friedrich Schwarz ◽  
Vic-Fabienne Schumann

ABSTRACTIt is well established that GC content varies across the genome in many species and that GC biased gene conversion, one form of meiotic recombination, is likely to contribute to this heterogeneity. Bird genomes provide an extraordinary system to study the impact of GC biased gene conversion owed to their specific genomic features. They are characterised by a high karyotype conservation with substantial heterogeneity in chromosome sizes, with up to a dozen large macrochromosomes and many smaller microchromosomes common across all bird species. This heterogeneity in chromosome morphology is also reflected by other genomic features, such as smaller chromosomes being gene denser, more compact and more GC rich relative to their macrochromosomal counterparts - illustrating that the intensity of GC biased gene conversion varies across the genome. Here we study whether it is possible to infer heterogeneity in GC biased gene conversion rates across the genome using a recently published method that accounts for GC biased gene conversion when estimating branch lengths in a phylogenetic context. To infer the strength of GC biased gene conversion we contrast branch length estimates across the genome both taking and not taking non-stationary GC composition into account. Using simulations we show that this approach works well when GC fixation bias is strong and note that the number of substitutions along a branch is consistently overestimated when GC biased gene conversion is not accounted for. We use this predictable feature to infer the strength of GC dynamics across the great tit genome by applying our new test statistic to data at 4-fold degenerate sites from three bird species - great tit, zebra finch and chicken - three species that are among the best annotated bird genomes to date. We show that using a simple one-dimensional binning we fail to capture a signal of fixation bias as observed in our simulations. However, using a multidimensional binning strategy, we find evidence for heterogeneity in the strength of fixation bias, including AT fixation bias. This highlights the difficulties when combining sequence data across different regions in the genome.


Author(s):  
Daniel L. Hartl

Chapter 4 focuses on forward and reverse mutation, gene duplication and functional divergence, gene conversion, and equilibrium heterozygosity. It includes an introduction to the coalescent as well as the Wright–Fisher and Moran models of random genetic drift, measures of nucleotide polymorphism and diversity, and how these may be estimated from sequence data. Biased gene conversion is discussed in regard to its effects on homogeneity of nucleotide sequence across the genome. Several distinct types of effective population number are compared and contrasted including the inbreeding, variance, and coalescent effective numbers. Various models of migration are also examined including one-way migration, the island model, and stepping-stone models.


Genetics ◽  
1997 ◽  
Vol 146 (1) ◽  
pp. 89-99 ◽  
Author(s):  
Esther Betrán ◽  
Julio Rozas ◽  
Arcadio Navarro ◽  
Antonio Barbadilla

DNA sequence variation studies report the transfer of small segments of DNA among different sequences caused by gene conversion events. Here, we provide an algorithm to detect gene conversion tracts and a statistical model to estimate the number and the length distribution of conversion tracts for population DNA sequence data. Two length distributions are defined in the model: (1) that of the observed tract lengths and (2) that of the true tract lengths. If the latter follows a geometric distribution, the relationship between both distributions depends on two basic parameters: ψ, which measures the probability of detecting a converted site, and φ the parameter of the geometric distribution, from which the average true tract length, 1 / (1 – φ), can be estimated. Expressions are provided for estimating φ by the method of the moments and that of the maximum likelihood. The robustness of the model is examined by computer simulation. The present methods have been applied to the published rp49 sequences of Drosophila subobscura. Maximum likelihood estimate of φ for this data set is 0.9918, which represents an average conversion tract length of 122 bp. Only a small percentage of extant conversion events is detected.


2021 ◽  
Vol 118 (52) ◽  
pp. e2115140118
Author(s):  
Matthew Halvorsen ◽  
Laura Gould ◽  
Xiaohan Wang ◽  
Gariel Grant ◽  
Raquel Moya ◽  
...  

Sudden unexplained death in childhood (SUDC) is an understudied problem. Whole-exome sequence data from 124 “trios” (decedent child, living parents) was used to test for excessive de novo mutations (DNMs) in genes involved in cardiac arrhythmias, epilepsy, and other disorders. Among decedents, nonsynonymous DNMs were enriched in genes associated with cardiac and seizure disorders relative to controls (odds ratio = 9.76, P = 2.15 × 10−4). We also found evidence for overtransmission of loss-of-function (LoF) or previously reported pathogenic variants in these same genes from heterozygous carrier parents (11 of 14 transmitted, P = 0.03). We identified a total of 11 SUDC proband genotypes (7 de novo, 1 transmitted parental mosaic, 2 transmitted parental heterozygous, and 1 compound heterozygous) as pathogenic and likely contributory to death, a genetic finding in 8.9% of our cohort. Two genes had recurrent missense DNMs, RYR2 and CACNA1C. Both RYR2 mutations are pathogenic (P = 1.7 × 10−7) and were previously studied in mouse models. Both CACNA1C mutations lie within a 104-nt exon (P = 1.0 × 10−7) and result in slowed L-type calcium channel inactivation and lower current density. In total, six pathogenic DNMs can alter calcium-related regulation of cardiomyocyte and neuronal excitability at a submembrane junction, suggesting a pathway conferring susceptibility to sudden death. There was a trend for excess LoF mutations in LoF intolerant genes, where ≥1 nonhealthy sample in denovo-db has a similar variant (odds ratio = 6.73, P = 0.02); additional uncharacterized genetic causes of sudden death in children might be discovered with larger cohorts.


2015 ◽  
Author(s):  
Jinmyung Choi ◽  
Parisa Shooshtari ◽  
Kaitlin E Samocha ◽  
Mark J Daly ◽  
Chris Cotsapas

Using robust, integrated analysis of multiple genomic datasets, we show that genes depleted for non-synonymous de novo mutations form a subnetwork of 72 members under strong selective constraint. We further show this subnetwork is preferentially expressed in the early development of the human hippocampus and is enriched for genes mutated in neurological, but not other, Mendelian disorders. We thus conclude that carefully orchestrated developmental processes are under strong constraint in early brain development, and perturbations caused by mutation have adverse outcomes subject to strong purifying selection. Our findings demonstrate that selective forces can act on groups of genes involved in the same process, supporting the notion that adaptation can act coordinately on multiple genes. Our approach provides a statistically robust, interpretable way to identify the tissues and developmental times where groups of disease genes are active. Our findings highlight the importance of considering the interactions between genes when analyzing genome-wide sequence data.


2010 ◽  
Vol 7 (50) ◽  
pp. 1257-1274 ◽  
Author(s):  
Katia Koelle ◽  
Priya Khatri ◽  
Meredith Kamradt ◽  
Thomas B. Kepler

Understanding the epidemiological and evolutionary dynamics of rapidly evolving pathogens is one of the most challenging problems facing disease ecologists today. To date, many mathematical and individual-based models have provided key insights into the factors that may regulate these dynamics. However, in many of these models, abstractions have been made to the simulated sequences that limit an effective interface with empirical data. This is especially the case for rapidly evolving viruses in which de novo mutations result in antigenically novel variants. With this focus, we present a simple two-tiered ‘phylodynamic’ model whose purpose is to simulate, along with case data, sequence data that will allow for a more quantitative interface with observed sequence data. The model differs from previous approaches in that it separates the simulation of the epidemiological dynamics (tier 1) from the molecular evolution of the virus's dominant antigenic protein (tier 2). This separation of phenotypic dynamics from genetic dynamics results in a modular model that is computationally simpler and allows sequences to be simulated with specifications such as sequence length, nucleotide composition and molecular constraints. To illustrate its use, we apply the model to influenza A (H3N2) dynamics in humans, influenza B dynamics in humans and influenza A (H3N8) dynamics in equine hosts. In all three of these illustrative examples, we show that the model can simulate sequences that are quantitatively similar in pattern to those empirically observed. Future work should focus on statistical estimation of model parameters for these examples as well as the possibility of applying this model, or variants thereof, to other host–virus systems.


Author(s):  
Gus Waneka ◽  
Yumary M Vasquez ◽  
Gordon M Bennett ◽  
Daniel B Sloan

Abstract Compared to free-living bacteria, endosymbionts of sap-feeding insects have tiny and rapidly evolving genomes. Increased genetic drift, high mutation rates, and relaxed selection associated with host control of key cellular functions all likely contribute to genome decay. Phylogenetic comparisons have revealed massive variation in endosymbiont evolutionary rate, but such methods make it difficult to partition the effects of mutation vs. selection. For example, the ancestor of Auchenorrhynchan insects contained two obligate endosymbionts, Sulcia and a betaproteobacterium (BetaSymb; called Nasuia in leafhoppers) that exhibit divergent rates of sequence evolution and different propensities for loss and replacement in the ensuing ~300 Ma. Here, we use the auchenorrhynchan leafhopper Macrosteles sp. nr. severini, which retains both of the ancestral endosymbionts, to test the hypothesis that differences in evolutionary rate are driven by differential mutagenesis. We used a high-fidelity technique known as duplex sequencing to measure and compare low-frequency variants in each endosymbiont. Our direct detection of de novo mutations reveals that the rapidly evolving endosymbiont (Nasuia) has a much higher frequency of single-nucleotide variants than the more stable endosymbiont (Sulcia) and a mutation spectrum that is potentially even more AT-biased than implied by the 83.1% AT content of its genome. We show that indels are common in both endosymbionts but differ substantially in length and distribution around repetitive regions. Our results suggest that differences in long-term rates of sequence evolution in Sulcia vs. BetaSymb, and perhaps the contrasting degrees of stability of their relationships with the host, are driven by differences in mutagenesis.


2014 ◽  
Author(s):  
Sylvain Glemin ◽  
Peter F Arndt ◽  
Philipp W Messer ◽  
Dmitri Petrov ◽  
Nicolas Galtier ◽  
...  

Many lines of evidence indicate GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, up to now, this process had not been properly quantified. In principle, the strength of gBGC can be measured from the analysis of derived allele frequency spectra. However, this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors, especially at hypermutable sites, and spatial heterogeneity in gBGC strength. Here we propose a new method to quantify gBGC from DAF spectra, incorporating polarization errors and taking spatial heterogeneity into account. This method is very general in that it does not require any prior knowledge about the source of polarization errors and also provides information about mutation patterns. We apply this approach to human polymorphism data from the 1000 genomes project. We show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. We further find that the impact of gBGC is concentrated primarily within recombination hotspots: genome-wide, the strength of gBGC is in the nearly neutral area, but 2% of the human genome is subject to strong gBGC, with population-scaled gBGC coefficients above 5. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.


Sign in / Sign up

Export Citation Format

Share Document