Big data reveals fewer recombination hotspots than expected in human genome
AbstractRecombination is a major force that shapes genetic diversity. The inference accuracy of recombination rate is important and can be improved by increasing sample size. However, it has never been investigated whether sample size affects the distribution of inferred recombination activity along the genome, and the inference of recombination hotspots. In this study, we applied an artificial intelligence approach to estimate recombination rates in the UK10K human genomic data set with 7,562 genomes and in the OMNI CEU data set with 170 genomes. We found that the fluctuation of local recombination rate along the UK10K genomes is much smaller than that along the CEU genomes, and recombination activity in the UK10K genomes is also much less concentrated. The same phenomena were also observed when comparing UK10K with its two subsets with 200 and 400 genomes. In all cases, analyses of a larger number of genomes result in a more precise estimation of recombination rate and a less concentrated recombination activity with fewer recombination hotpots identified. Generally, UK10K recombination hotspots are about 2.93-14.25 times fewer than that identified in previous studies. By comparing the recombination hotspots of UK10K and its subsets, we found that the false inference of population-specific recombination hotspots could be as high as 75.86% if the number of sampled genomes is not super large. The results suggest that the uncertainty of estimated recombination rate is substantial when sample size is not super large, and more attention should be paid to accurate identification of recombination hotspots, especially population-specific recombination hotspots.Author summaryWe applied FastEPRR, an artificial intelligence method to estimate recombination rates in the UK10K data set with 7,562 genomes and established the most accurate human genetic map. By comparing with other human genetic maps, we found that analyses of a larger number of genomes result in a more precise estimation of recombination rate and a less concentrated recombination activity with fewer recombination hotpots identified. The false inference of population-specific recombination hotspots could be substantial if the number of sampled genomes is not super large.