1000 Genomes Project Finds Substantial Genetic Variation Among Populations

AbstractPopulation-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready variants remains challenging. Here we introduce an open-source cohort variant-calling method using the highly-accurate caller DeepVariant and scalable merging tool GLnexus. We optimized callset quality based on benchmark samples and Mendelian consistency across many sample sizes and sequencing specifications, resulting in substantial quality improvements and cost savings over existing best practices. We further evaluated our pipeline in the 1000 Genomes Project (1KGP) samples, showing superior quality metrics and imputation performance. We publicly release the 1KGP callset to foster development of broad studies of genetic variation.

Download Full-text

Haplotype-aware graph indexes

Bioinformatics ◽

10.1093/bioinformatics/btz575 ◽

2019 ◽

Cited By ~ 6

Author(s):

Jouni Sirén ◽

Erik Garrison ◽

Adam M Novak ◽

Benedict Paten ◽

Richard Durbin

Keyword(s):

Genetic Variation ◽

Precision Medicine ◽

Chromosome 17 ◽

Supplementary Information ◽

Whole Genome ◽

Supplementary Data ◽

1000 Genomes Project ◽

1000 Genomes ◽

Burrows Wheeler Transform ◽

Haplotype Information

Abstract Motivation The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. Results We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. Availability and implementation Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Haplotype-aware graph indexes

10.1101/559583 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jouni Sirén ◽

Erik Garrison ◽

Adam M. Novak ◽

Benedict Paten ◽

Richard Durbin

Keyword(s):

Genetic Variation ◽

Chromosome 17 ◽

Supplementary Information ◽

Whole Genome ◽

Supplementary Data ◽

1000 Genomes Project ◽

1000 Genomes ◽

Link Type ◽

Supplementary Material ◽

Haplotype Information

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.

Download Full-text

The analysis of APOL1 genetic variation and haplotype diversity provided by 1000 Genomes project

BMC Nephrology ◽

10.1186/s12882-017-0675-6 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 2

Author(s):

Ting Peng ◽

Li Wang ◽

Guisen Li

Keyword(s):

Genetic Variation ◽

Haplotype Diversity ◽

1000 Genomes Project ◽

1000 Genomes

Download Full-text

1000 Genomes Project reveals human variation

Nature ◽

10.1038/news.2010.567 ◽

2010 ◽

Cited By ~ 3

Author(s):

Alla Katsnelson

Keyword(s):

Human Variation ◽

1000 Genomes Project ◽

1000 Genomes

Download Full-text

The Epidemiology and Genetics of Hyperuricemia and Gout across Major Racial Groups: A Literature Review and Population Genetics Secondary Database Analysis

Journal of Personalized Medicine ◽

10.3390/jpm11030231 ◽

2021 ◽

Vol 11 (3) ◽

pp. 231

Author(s):

Faven Butler ◽

Ali Alghubayshi ◽

Youssef Roman

Keyword(s):

Literature Review ◽

Risk Allele ◽

Statistical Significance ◽

Elevated Serum ◽

The United States ◽

Allele Frequencies ◽

Racial Groups ◽

1000 Genomes Project ◽

1000 Genomes ◽

Risk Alleles

Gout is an inflammatory condition caused by elevated serum urate (SU), a condition known as hyperuricemia (HU). Genetic variations, including single nucleotide polymorphisms (SNPs), can alter the function of urate transporters, leading to differential HU and gout prevalence across different populations. In the United States (U.S.), gout prevalence differentially affects certain racial groups. The objective of this proposed analysis is to compare the frequency of urate-related genetic risk alleles between Europeans (EUR) and the following major racial groups: Africans in Southwest U.S. (ASW), Han-Chinese (CHS), Japanese (JPT), and Mexican (MXL) from the 1000 Genomes Project. The Ensembl genome browser of the 1000 Genomes Project was used to conduct cross-population allele frequency comparisons of 11 SNPs across 11 genes, physiologically involved and significantly associated with SU levels and gout risk. Gene/SNP pairs included: ABCG2 (rs2231142), SLC2A9 (rs734553), SLC17A1 (rs1183201), SLC16A9 (rs1171614), GCKR (rs1260326), SLC22A11 (rs2078267), SLC22A12 (rs505802), INHBC (rs3741414), RREB1 (rs675209), PDZK1 (rs12129861), and NRXN2 (rs478607). Allele frequencies were compared to EUR using Chi-Square or Fisher’s Exact test, when appropriate. Bonferroni correction for multiple comparisons was used, with p < 0.0045 for statistical significance. Risk alleles were defined as the allele that is associated with baseline or higher HU and gout risks. The cumulative HU or gout risk allele index of the 11 SNPs was estimated for each population. The prevalence of HU and gout in U.S. and non-US populations was evaluated using published epidemiological data and literature review. Compared with EUR, the SNP frequencies of 7/11 in ASW, 9/11 in MXL, 9/11 JPT, and 11/11 CHS were significantly different. HU or gout risk allele indices were 5, 6, 9, and 11 in ASW, MXL, CHS, and JPT, respectively. Out of the 11 SNPs, the percentage of risk alleles in CHS and JPT was 100%. Compared to non-US populations, the prevalence of HU and gout appear to be higher in western world countries. Compared with EUR, CHS and JPT populations had the highest HU or gout risk allele frequencies, followed by MXL and ASW. These results suggest that individuals of Asian descent are at higher HU and gout risk, which may partly explain the nearly three-fold higher gout prevalence among Asians versus Caucasians in ambulatory care settings. Furthermore, gout remains a disease of developed countries with a marked global rising.

Download Full-text

Genome-Wide Analysis of Wild-Type Epstein–Barr Virus Genomes Derived from Healthy Individuals of the 1000 Genomes Project

Genome Biology and Evolution ◽

10.1093/gbe/evu054 ◽

2014 ◽

Vol 6 (4) ◽

pp. 846-860 ◽

Cited By ~ 44

Author(s):

Gabriel Santpere ◽

Fleur Darre ◽

Soledad Blanco ◽

Antonio Alcami ◽

Pablo Villoslada ◽

...

Keyword(s):

Epstein Barr Virus ◽

Healthy Individuals ◽

Wild Type ◽

1000 Genomes Project ◽

Genome Wide Analysis ◽

Barr Virus ◽

1000 Genomes ◽

Genome Wide ◽

Epstein Barr ◽

Virus Genomes

Download Full-text

Utilizing the Jaccard index to reveal population stratification in sequencing data: a simulation study and an application to the 1000 Genomes Project

Bioinformatics ◽

10.1093/bioinformatics/btv752 ◽

2015 ◽

Vol 32 (9) ◽

pp. 1366-1372 ◽

Cited By ~ 23

Author(s):

Dmitry Prokopenko ◽

Julian Hecker ◽

Edwin K. Silverman ◽

Marcello Pagano ◽

Markus M. Nöthen ◽

...

Keyword(s):

Simulation Study ◽

Population Stratification ◽

Jaccard Index ◽

Sequencing Data ◽

1000 Genomes Project ◽

1000 Genomes

Download Full-text

Evaluation of serverless computing for scalable execution of a joint variant calling workflow

PLoS ONE ◽

10.1371/journal.pone.0254363 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254363

Author(s):

Aji John ◽

Kathleen Muenzen ◽

Kristiina Ausmees

Keyword(s):

Genetic Information ◽

Best Practice ◽

Workflow Management ◽

Variant Calling ◽

Phase Iii ◽

1000 Genomes Project ◽

1000 Genomes ◽

Genomics Research ◽

The Cost ◽

Analysis Of Performance

Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.

Download Full-text

The projection of a test genome onto a reference population and applications to humans and archaic hominins

10.1101/008805 ◽

2014 ◽

Author(s):

Melinda A Yang ◽

Kelley Harris ◽

Montgomery Slatkin

Keyword(s):

Numerical Analysis ◽

Population Size ◽

Demographic History ◽

Reference Population ◽

Average Weight ◽

1000 Genomes Project ◽

1000 Genomes ◽

Test Genome ◽

History Of ◽

Past Population

We introduce a method for comparing a test genome with numerous genomes from a reference population. Sites in the test genome are given a weight w that depends on the allele frequency x in the reference population. The projection of the test genome onto the reference population is the average weight for each x, w(x). The weight is assigned in such a way that if the test genome is a random sample from the reference population, w(x)=1. Using analytic theory, numerical analysis, and simulations, we show how the projection depends on the time of population splitting, the history of admixture and changes in past population size. The projection is sensitive to small amounts of past admixture, the direction of admixture and admixture from a population not sampled (a ghost population). We compute the projection of several human and two archaic genomes onto three reference populations from the 1000 Genomes project, Europeans (CEU), Han Chinese (CHB) and Yoruba (YRI) and discuss the consistency of our analysis with previously published results for European and Yoruba demographic history. Including higher amounts of admixture between Europeans and Yoruba soon after their separation and low amounts of admixture more recently can resolve discrepancies between the projections and demographic inferences from some previous studies.

Download Full-text