scholarly journals A method for genome-wide genealogy estimation for thousands of samples

2019 ◽  
Author(s):  
Leo Speidel ◽  
Marie Forest ◽  
Sinan Shi ◽  
Simon R. Myers

AbstractKnowledge of genome-wide genealogies for thousands of individuals would simplify most evolutionary analyses for humans and other species, but has remained computationally infeasible. We developed a method, Relate, scaling to > 10,000 sequences while simultaneously estimating branch lengths, mutational ages, and variable historical population sizes, as well as allowing for data errors. Application to 1000 Genomes Project haplotypes produces joint genealogical histories for 26 human populations. Highly diverged lineages are present in all groups, but most frequent in Africa. Outside Africa, these mainly reflect ancient introgression from groups related to Neanderthals and Denisovans, while African signals instead reflect unknown events, unique to that continent. Our approach allows more powerful inferences of natural selection than previously possible. We identify multiple novel regions under strong positive selection, and multi-allelic traits including hair colour, BMI, and blood pressure, showing strong evidence of directional selection, varying among human groups.

2018 ◽  
Author(s):  
Shivakumara Manu ◽  
Kshitish K Acharya ◽  
Saravanamuthu Thiyagarajan

ABSTRACTBackgroundMeiotic recombination plays an important role in evolution by shuffling different alleles along the chromosomes, thus generating the genetic diversity across generations that is vital for adaptation. The plasticity of recombination rates and presence of hotspots of recombination along the genome has attracted much attention over two decades due to their contribution to the evolution of the genome. Yet, the variation in genome-wide recombination landscape and the differences in the location and strength of hotspots across worldwide human populations remains little explored.ResultsWe make use of the untapped linkage disequilibrium (LD) based genetic maps from the 1000 Genomes Project (1KGP) to perform in-depth analyses of finescale variation in the autosomal recombination rates across 20 human populations to uncover the global recombination landscape. We have generated a detailed map of human recombination landscape comprising of a comprehensive set of 88,841 putative hotspots and 80,129 coldspots with their respective strengths across populations, about 2/3rd of which were previously unknown. We have validated and assessed the number of historical putative hotspots derived from the patterns of LD that are currently active in the contemporary populations using a recently published high-resolution pedigree-based genetic map, constructed and refined using 3.38 million crossovers from various populations. For the first time, we provide statistics regarding the conserved, shared, and unique hotspots across all the populations studied.ConclusionsOur analysis yields clusters of continental groups, reflecting their shared ancestry and genetic similarities in the recombination rates that are linked to the migratory and evolutionary histories of the populations. We provide the genomic locations and strengths of hotspots and coldspots across all the populations studied which are a valuable set of resources arising out our analyses of 1KGP data. The findings are of great importance for further research on human hotspots as we approach the dusk of retiring HapMap-based resources.


2014 ◽  
Vol 6 (4) ◽  
pp. 846-860 ◽  
Author(s):  
Gabriel Santpere ◽  
Fleur Darre ◽  
Soledad Blanco ◽  
Antonio Alcami ◽  
Pablo Villoslada ◽  
...  

2019 ◽  
Vol 37 (1) ◽  
pp. 2-10 ◽  
Author(s):  
Luke Anderson-Trocmé ◽  
Rick Farouni ◽  
Mathieu Bourgey ◽  
Yoichiro Kamatani ◽  
Koichiro Higasa ◽  
...  

Abstract Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.


2017 ◽  
Author(s):  
Louise V. Wain ◽  
Ahmad Vaez ◽  
Rick Jansen ◽  
Roby Joehanes ◽  
Peter J. van der Most ◽  
...  

ABSTRACTElevated blood pressure is a major risk factor for cardiovascular disease and has a substantial genetic contribution. Genetic variation influencing blood pressure has the potential to identify new pharmacological targets for the treatment of hypertension. To discover additional novel blood pressure loci, we used 1000 Genomes Project-based imputation in 150,134 European ancestry individuals and sought significant evidence for independent replication in a further 228,245 individuals. We report 6 new signals of association in or near HSPB7, TNXB, LRP12, LOC283335, SEPT9 and AKT2, and provide new replication evidence for a further 2 signals in EBF2 and NFKBIA. Combining large whole-blood gene expression resources totaling 12,607 individuals, we investigated all novel and previously reported signals and identified 48 genes with evidence for involvement in BP regulation that are significant in multiple resources. Three novel kidney-specific signals were also detected. These robustly implicated genes may provide new leads for therapeutic innovation.


2019 ◽  
Author(s):  
Luke Anderson-Trocmé ◽  
Rick Farouni ◽  
Mathieu Bourgey ◽  
Yoichiro Kamatani ◽  
Koichiro Higasa ◽  
...  

AbstractRecent reports have identified differences in the mutational spectra across human populations. While some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data is used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower-quality data from the early phases of the 1kGP thus continues to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.


2021 ◽  
Author(s):  
Ashley J Mulford ◽  
Claudia Wing ◽  
M Eileen Dolan ◽  
Heather E Wheeler

Abstract Most cancer chemotherapeutic agents are ineffective in a subset of patients, thus it is important to consider the role of genetic variation in drug response. Lymphoblastoid cell lines (LCLs) in 1000 Genomes Project populations of diverse ancestries are a useful model for determining how genetic factors impact variation in cytotoxicity. In our study, LCLs from three 1000 Genomes Project populations of diverse ancestries were previously treated with increasing concentrations of eight chemotherapeutic drugs and cell growth inhibition was measured at each dose with half-maximal inhibitory concentration (IC50) or area under the dose–response curve (AUC) as our phenotype for each drug. We conducted both genome-wide (GWAS) and transcriptome-wide association studies (TWAS) within and across ancestral populations. We identified four unique loci in GWAS and three genes in TWAS significantly associated with chemotherapy-induced cytotoxicity within and across ancestral populations. For etoposide, increased STARD5 predicted expression associated with decreased etoposide IC50 (p = 8.5 x 10−8). Functional studies in A549, a lung cancer cell line, revealed that knockdown of STARD5 expression resulted in decreased sensitivity to etoposide following exposure for 72 (p = 0.033) and 96 hours (p = 0.0001). By identifying loci and genes associated with cytotoxicity across ancestral populations, we strive to understand the genetic factors impacting the effectiveness of chemotherapy drugs and to contribute to the development of future cancer treatment.


2020 ◽  
Vol 12 (6) ◽  
pp. 779-794 ◽  
Author(s):  
W Scott Watkins ◽  
Julie E Feusier ◽  
Jainy Thomas ◽  
Clement Goubert ◽  
Swapon Mallick ◽  
...  

Abstract Ongoing retrotransposition of Alu, LINE-1, and SINE–VNTR–Alu elements generates diversity and variation among human populations. Previous analyses investigating the population genetics of mobile element insertions (MEIs) have been limited by population ascertainment bias or by relatively small numbers of populations and low sequencing coverage. Here, we use 296 individuals representing 142 global populations from the Simons Genome Diversity Project (SGDP) to discover and characterize MEI diversity from deeply sequenced whole-genome data. We report 5,742 MEIs not originally reported by the 1000 Genomes Project and show that high sampling diversity leads to a 4- to 7-fold increase in MEI discovery rates over the original 1000 Genomes Project data. As a result of negative selection, nonreference polymorphic MEIs are underrepresented within genes, and MEIs within genes are often found in the transcriptional orientation opposite that of the gene. Globally, 80% of Alu subfamilies predate the expansion of modern humans from Africa. Polymorphic MEIs show heterozygosity gradients that decrease from Africa to Eurasia to the Americas, and the number of MEIs found uniquely in a single individual are also distributed in this general pattern. The maximum fraction of MEI diversity partitioned among the seven major SGDP population groups (FST) is 7.4%, similar to, but slightly lower than, previous estimates and likely attributable to the diverse sampling strategy of the SGDP. Finally, we utilize these MEIs to extrapolate the primary Native American shared ancestry component to back to Asia and provide new evidence from genome-wide identical-by-descent genetic markers that add additional support for a southeastern Siberian origin for most Native Americans.


2013 ◽  
Vol 35 (3) ◽  
pp. 387-393 ◽  
Author(s):  
Ji Eun Lim ◽  
Young-Ah Shin ◽  
Kyung-Won Hong ◽  
Hyun-Seok Jin ◽  
In Song Koh ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fadilla Wahyudi ◽  
Farhang Aghakhanian ◽  
Sadequr Rahman ◽  
Yik-Ying Teo ◽  
Michał Szpak ◽  
...  

Abstract Background In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles. Results We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav. Conclusions The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.


Sign in / Sign up

Export Citation Format

Share Document