Big data reveals fewer recombination hotspots than expected in human genome

Mapping Intimacies ◽

10.1101/809020 ◽

2019 ◽

Author(s):

Ziqian Hao ◽

Haipeng Li

Keyword(s):

Artificial Intelligence ◽

Sample Size ◽

Recombination Rate ◽

Genetic Maps ◽

Recombination Activity ◽

Recombination Rates ◽

Accurate Identification ◽

Data Set ◽

Recombination Hotspots ◽

Precise Estimation

AbstractRecombination is a major force that shapes genetic diversity. The inference accuracy of recombination rate is important and can be improved by increasing sample size. However, it has never been investigated whether sample size affects the distribution of inferred recombination activity along the genome, and the inference of recombination hotspots. In this study, we applied an artificial intelligence approach to estimate recombination rates in the UK10K human genomic data set with 7,562 genomes and in the OMNI CEU data set with 170 genomes. We found that the fluctuation of local recombination rate along the UK10K genomes is much smaller than that along the CEU genomes, and recombination activity in the UK10K genomes is also much less concentrated. The same phenomena were also observed when comparing UK10K with its two subsets with 200 and 400 genomes. In all cases, analyses of a larger number of genomes result in a more precise estimation of recombination rate and a less concentrated recombination activity with fewer recombination hotpots identified. Generally, UK10K recombination hotspots are about 2.93-14.25 times fewer than that identified in previous studies. By comparing the recombination hotspots of UK10K and its subsets, we found that the false inference of population-specific recombination hotspots could be as high as 75.86% if the number of sampled genomes is not super large. The results suggest that the uncertainty of estimated recombination rate is substantial when sample size is not super large, and more attention should be paid to accurate identification of recombination hotspots, especially population-specific recombination hotspots.Author summaryWe applied FastEPRR, an artificial intelligence method to estimate recombination rates in the UK10K data set with 7,562 genomes and established the most accurate human genetic map. By comparing with other human genetic maps, we found that analyses of a larger number of genomes result in a more precise estimation of recombination rate and a less concentrated recombination activity with fewer recombination hotpots identified. The false inference of population-specific recombination hotspots could be substantial if the number of sampled genomes is not super large.

Download Full-text

Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data

Genetics ◽

10.1093/genetics/165.4.2213 ◽

2003 ◽

Vol 165 (4) ◽

pp. 2213-2233 ◽

Cited By ~ 41

Author(s):

Na Li ◽

Matthew Stephens

Keyword(s):

Linkage Disequilibrium ◽

Recombination Rate ◽

Population Sample ◽

Simulated Data ◽

Region Of Interest ◽

Population Data ◽

Recombination Rates ◽

Single Nucleotide ◽

Recombination Hotspots ◽

Genomic Regions

AbstractWe introduce a new statistical model for patterns of linkage disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the assumption that LD necessarily has a “block-like” structure; and (iv) being computationally tractable for huge genomic regions (up to complete chromosomes). We examine in detail one natural application of the model: estimation of underlying recombination rates from population data. Using simulation, we show that in the case where recombination is assumed constant across the region of interest, recombination rate estimates based on our model are competitive with the very best of current available methods. More importantly, we demonstrate, on real and simulated data, the potential of the model to help identify and quantify fine-scale variation in recombination rate from population data. We also outline how the model could be useful in other contexts, such as in the development of more efficient haplotype-based methods for LD mapping.

Download Full-text

Effect of Heterogeneity in Recombination Rate on Variation in Realised Relationship

10.1101/341776 ◽

2018 ◽

Author(s):

Ian M.S. White ◽

William G. Hill

Keyword(s):

Recombination Rate ◽

Genetic Information ◽

Fine Scale ◽

Recombination Rates ◽

Recombination Hotspots ◽

Identical By Descent ◽

Small Influence ◽

Scale Levels ◽

Pedigree Relationship ◽

Actual Relationship

ABSTRACTIndividuals of specified pedigree relationship vary in the proportion of the genome they share identical by descent, i.e. in their realised or actual relationship. Basing predictions of the variance in realised relationship solely on the proportion of the map length shared implicitly assumes that both recombination rate and genetic information are uniformly distributed along the genome, ignoring the possible existence of recombination hotspots, and failing to distinguish between coding and non-coding sequences. In this paper we quantify the effects of heterogeneity in recombination rate at broad and fine scale levels on the variation in realised relationship. A chromosome with variable recombination rate usually shows more variance in realised relationship than does one having the same map length with constant recombination rate, especially if recombination rates are higher towards chromosome ends. Reductions in variance can also be found, and the overall pattern of change is quite complex. In general, local (fine-scale) variation in recombination rate, e.g. hotspots, has a small influence on the variance in realised relationship. Differences in rates across longer regions and between chromosome ends can increase or decrease the variance in realised relationship, depending on the genomic architecture.

Download Full-text

A glance at recombination hotspots in the domestic cat

10.1101/028043 ◽

2015 ◽

Author(s):

Hasan Alhaddad ◽

Chi Zhang ◽

Bruce Rannala ◽

Leslie A Lyons

Keyword(s):

Recombination Rate ◽

Gc Content ◽

Domestic Cat ◽

Recombination Rates ◽

Recombination Hotspots ◽

Regional Population ◽

Variable Population ◽

Line Elements ◽

Population Recombination ◽

Region Size

Recombination has essential roles in increasing genetic variability within a population and in ensuring successful meiotic events. The objective of this study is to (i) infer the population scaled recombination rate (ρ), and (ii) identify and characterize localities of increased recombination rate for the domestic cat, Felis silvestris catus. SNPs (n = 701) were genotyped in twenty-two cats of Eastern random bred origin. The SNPs covered ten different chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) with an average region size of 850 Kb and an average SNP density of 70 SNPs/region. The Bayesian method in the program inferRho was used to infer regional population recombination rates and hotspots localities. The regions exhibited variable population recombination rates and four decisive recombination hotspots were identified on cat chromosome A2, D1, and E2 regions. No correlation was detected between the GC content and the locality of recombination spots. The hotspots enclosed L2 LINE elements and MIR and tRNA-Lys SINE elements in agreement with hotspots found in other mammals.

Download Full-text

Genetic differentiation and intrinsic genomic features explain variation in recombination hotspots among cocoa tree populations

10.1101/482299 ◽

2018 ◽

Cited By ~ 1

Author(s):

Enrique J. Schwarzkopf ◽

Juan C. Motamayor ◽

Omar E. Cornejo

Keyword(s):

Genetic Differentiation ◽

Recombination Rate ◽

Plant Domestication ◽

Population Divergence ◽

Sequence Motifs ◽

Recombination Rates ◽

Genomic Features ◽

Recombination Hotspots ◽

Genomic Locations ◽

Dna Sequence Motifs

AbstractOur study investigates the possible drivers of recombination hotspots in Theobroma cacao using ten genetically differentiated populations. By comparing recombination patterns between multiple populations, we obtain a novel view of recombination at the population-divergence timescale. For each population, a fine-scale recombination map was generated using the coalescent with a standard method based on linkage disequilibrium (LD). These maps revealed higher recombination rates in a domesticated population and a population that has undergone a recent bottleneck. We inferred hotspots of recombination for each population and find that the genomic locations of hotspots correlate with genetic differentiation between populations (FST). We used randomization approaches to generate appropriate null models to understand the association between hotspots of recombination and both DNA sequence motifs and genomic features. We found that hotspot regions contained fewer known retroelement sequences than expected and were overrepresented near transcription start and termination sites. Our findings indicate that recombination hotspots are evolving in a way that is consistent with genetic differentiation but are also preferentially driven to near coding regions. We illustrate that, consistent with predictions in plant domestication, the recombination rate of the domesticated population is orders of magnitude higher than that of other populations. More importantly, we find two fixed mutations in the domesticated population’s FIGL1 protein. FIGL1 has been shown to increase recombination rates in Arabidopsis by several orders of magnitude, suggesting a possible mechanism for the observed increased recombination rate in the domesticated population.

Download Full-text

Recombining without hotspots: A comprehensive evolutionary portrait of recombination in two closely related species of Drosophila

10.1101/016972 ◽

2015 ◽

Author(s):

Caiti Smukowski Heil ◽

Chris Ellison ◽

Matthew Dubin ◽

Mohamed Noor

Keyword(s):

Recombination Rate ◽

Multiple Scales ◽

Sequence Data ◽

Scale Effects ◽

Recombination Rates ◽

Recombination Hotspots ◽

Genome Wide ◽

Genomic Landscape ◽

Species Specific

Meiotic recombination rate varies across the genome within and between individuals, populations, and species in virtually all taxa studied. In almost every species, this variation takes the form of discrete recombination hotspots, determined in Metazoans by a protein called PRDM9. Hotspots and their determinants have a profound effect on the genomic landscape, and share certain features that extend across the tree of life. Drosophila, in contrast, are anomalous in their absence of hotspots, PRDM9, and other species-specific differences in the determination of recombination. To better understand the evolution of meiosis and general patterns of recombination across diverse taxa, we present what may be the most comprehensive portrait of recombination to date, combining contemporary recombination estimates from each of two sister species along with historic estimates of recombination using linkage-disequilibrium-based approaches derived from sequence data from both species. Using Drosophila pseudoobscura and Drosophila miranda as a model system, we compare recombination rate between species at multiple scales, and we replicate the pattern seen in human-chimpanzee that recombination rate is conserved at broad scales and more divergent at finer scales. We also find evidence of a species-wide recombination modifier, resulting in both a present and historic genome wide elevation of recombination rates in D. miranda, and identify broad scale effects on recombination from the presence of an inter-species inversion. Finally, we reveal an unprecedented view of the distribution of recombination in D. pseudoobscura, illustrating patterns of linked selection and where recombination is taking place. Overall, by combining these estimation approaches, we highlight key similarities and differences in recombination between Drosophila and other organisms.

Download Full-text

Snake Recombination Landscapes Are Concentrated in Functional Regions despite PRDM9

Molecular Biology and Evolution ◽

10.1093/molbev/msaa003 ◽

2020 ◽

Vol 37 (5) ◽

pp. 1272-1294 ◽

Cited By ~ 2

Author(s):

Drew R Schield ◽

Giulia I M Pasquesi ◽

Blair W Perry ◽

Richard H Adams ◽

Zachary L Nikolakis ◽

...

Keyword(s):

Recombination Rate ◽

Gc Content ◽

Substantial Variation ◽

Recombination Rates ◽

Recombination Hotspots ◽

Functional Regions ◽

Intergenic Regions ◽

Positive Correlations ◽

Adaptive Role

Abstract Meiotic recombination in vertebrates is concentrated in hotspots throughout the genome. The location and stability of hotspots have been linked to the presence or absence of PRDM9, leading to two primary models for hotspot evolution derived from mammals and birds. Species with PRDM9-directed recombination have rapid turnover of hotspots concentrated in intergenic regions (i.e., mammals), whereas hotspots in species lacking PRDM9 are concentrated in functional regions and have greater stability over time (i.e., birds). Snakes possess PRDM9, yet virtually nothing is known about snake recombination. Here, we examine the recombination landscape and test hypotheses about the roles of PRDM9 in rattlesnakes. We find substantial variation in recombination rate within and among snake chromosomes, and positive correlations between recombination rate and gene density, GC content, and genetic diversity. Like mammals, snakes appear to have a functional and active PRDM9, but rather than being directed away from genes, snake hotspots are concentrated in promoters and functional regions—a pattern previously associated only with species that lack a functional PRDM9. Snakes therefore provide a unique example of recombination landscapes in which PRDM9 is functional, yet recombination hotspots are associated with functional genic regions—a combination of features that defy existing paradigms for recombination landscapes in vertebrates. Our findings also provide evidence that high recombination rates are a shared feature of vertebrate microchromosomes. Our results challenge previous assumptions about the adaptive role of PRDM9 and highlight the diversity of recombination landscape features among vertebrate lineages.

Download Full-text

Comparison of fine-scale recombination maps in fungal plant pathogens reveals dynamic recombination landscapes and intragenic hotspots

10.1101/158907 ◽

2017 ◽

Cited By ~ 4

Author(s):

Eva H. Stukenbrock ◽

Julien Y. Dutheil

Keyword(s):

Recombination Rate ◽

Plant Pathogens ◽

Population Genomics ◽

Gene Evolution ◽

Strong Impact ◽

Recombination Rates ◽

Zymoseptoria Tritici ◽

Sequence Composition ◽

Recombination Hotspots ◽

Fungal Plant Pathogens

AbstractMeiotic recombination is an important driver of evolution. Variability in the intensity of recombination across chromosomes can affect sequence composition, nucleotide variation and rates of adaptation. In many organisms recombination events are concentrated within short segments termed recombination hotspots. The variation in recombination rate and positions of recombination hotspot can be studied using population genomics data and statistical methods. In this study, we conducted population genomics analyses to address the evolution of recombination in two closely related fungal plant pathogens: the prominent wheat pathogen Zymoseptoria tritici and a sister species infecting wild grasses Zymoseptoria ardabiliae. We specifically addressed whether recombination landscapes, including hotspot positions, are conserved in the two recently diverged species and if recombination contributes to rapid evolution of pathogenicity traits. We conducted a detailed simulation analysis to assess the performance of methods of recombination rate estimation based on patterns of linkage disequilibrium, in particular in the context of high nucleotide diversity. Our analyses reveal overall high recombination rates, a lack of suppressed recombination in centromeres and significantly lower recombination rates on chromosomes that are known to be accessory. The comparison of the recombination landscapes of the two species reveals a strong correlation of recombination rate at the megabase scale, but little correlation at smaller scales. The recombination landscapes in both pathogen species are dominated by frequent recombination hotspots across the genome including coding regions, suggesting a strong impact of recombination on gene evolution. A significant but small fraction of these hotspots co-localize between the two species, suggesting that hotspots dynamics contribute to the overall pattern of fast evolving recombination in these species.

Download Full-text

PSVI-8 Meta-regression Analysis to Determine the Relationship Between Growing Pig Body Weight and Variation

Journal of Animal Science ◽

10.1093/jas/skab054.357 ◽

2021 ◽

Vol 99 (Supplement_1) ◽

pp. 218-219

Author(s):

Andres Fernando T Russi ◽

Mike D Tokach ◽

Jason C Woodworth ◽

Joel M DeRouchey ◽

Robert D Goodband ◽

...

Keyword(s):

Body Weight ◽

Regression Analysis ◽

Sample Size ◽

Polynomial Regression ◽

Data Sets ◽

Regression Equations ◽

Prediction Equations ◽

Data Set ◽

Rate Of Increase ◽

The Relationship

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.

Download Full-text

Heterogeneity in Rates of Recombination Across the Mouse Genome

Genetics ◽

10.1093/genetics/142.2.537 ◽

1996 ◽

Vol 142 (2) ◽

pp. 537-548 ◽

Cited By ~ 2

Author(s):

Michael W Nachman ◽

Gary A Churchill

Keyword(s):

Linkage Map ◽

Genetic Map ◽

Large Scale ◽

Physical Map ◽

Genetic Maps ◽

Recombination Rates ◽

Physical Maps ◽

Genomic Patterns ◽

Mary Lyon ◽

Microsatellite Linkage

Abstract If loci are randomly distributed on a physical map, the density of markers on a genetic map will be inversely proportional to recombination rate. First proposed by MARY LYON, we have used this idea to estimate recombination rates from the Drosophila melanogaster linkage map. These results were compared with results of two other studies that estimated regional recombination rates in D. melanogaster using both physical and genetic maps. The three methods were largely concordant in identifying large-scale genomic patterns of recombination. The marker density method was then applied to the Mus musculus microsatellite linkage map. The distribution of microsatellites provided evidence for heterogeneity in recombination rates. Centromeric regions for several mouse chromosomes had significantly greater numbers of markers than expected, suggesting that recombination rates were lower in these regions. In contrast, most telomeric regions contained significantly fewer markers than expected. This indicates that recombination rates are elevated at the telomeres of many mouse chromosomes and is consistent with a comparison of the genetic and cytogenetic maps in these regions. The density of markers on a genetic map may provide a generally useful way to estimate regional recombination rates in species for which genetic, but not physical, maps are available.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text