scholarly journals Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

2022 ◽  
Vol 12 ◽  
Author(s):  
Andrés Jiménez-Kaufmann ◽  
Amanda Y. Chong ◽  
Adrián Cortés ◽  
Consuelo D. Quinto-Cortés ◽  
Selene L. Fernandez-Valverde ◽  
...  

Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.

2018 ◽  
Author(s):  
Candelaria Vergara ◽  
Margaret M. Parker ◽  
Liliana Franco ◽  
Michael H. Cho ◽  
Ana V. Valencia-Duarte ◽  
...  

ABSTRACTGenotype imputation is used to estimate unobserved genotypes from genome-wide maker data, to increase genome coverage and power for genome-wide association studies. Imputation has been most successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We aimed to compare the performance of these reference panels when imputing variation in 3,747 African Americans (AA) from 2 cohorts (HCV and COPDGene) genotyped using the Illumina Omni family of microarrays. The haplotypes of 2,504 individuals (from 1000G), 883 (from CAAPA) and 32,611 (from HRC) were used as reference. We compared the performance of these panels based on number of variants, imputation quality, imputation accuracy and coverage. In both cohorts, 1000G imputed 1.5–1.6x more variants compared to CAAPA and 1.2x more variants than HRC. Similar findings were observed for variants with higher imputation quality (R2>0.5) and for rare, low frequency, and common variants. When merging the results of the three panels the total number of imputed variants was 62M-63M with 20M overlapping variants imputed by all three panels, and a range of 5 to 15M unique variants imputed exclusively with one of the three panels. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. The 1000G, HRC and CAAPA participants of African ancestry provided high performance and accuracy for imputation of African American admixed individuals, increasing the total number of variants with high quality available for subsequent analyses. These three panels are complementary and would benefit from the development of an integrated African reference panel, including data from multiple sources and populations.


2020 ◽  
Author(s):  
Gamze Gürsoy ◽  
Eduardo Chielle ◽  
Charlotte M. Brannon ◽  
Michail Maniatakos ◽  
Mark Gerstein

AbstractGenotype imputation is the statistical inference of unknown genotypes using known population haplotype structures observed in large genomic datasets, such as HapMap and 1000 genomes project. Genotype imputation can help further our understanding of the relationships between genotypes and traits, and is extremely useful for analyses such as genome-wide association studies and expression quantitative loci inference. Increasing the number of genotyped genomes will increase the statistical power for inferring genotype-phenotype relationships, but the amount of data required and the compute-intense nature of the genotype imputation problem overwhelms servers. Hence, many institutions are moving towards outsourcing cloud services to scale up research in a cost effective manner. This raises privacy concerns, which we propose to address via homomorphic encryption. Homomorphic encryption is a type of encryption that allows data analysis on cipher texts, and would thereby avoid the decryption of private genotypes in the cloud. Here we develop an efficient, privacy-preserving genotype imputation algorithm, p-Impute, using homomorphic encryption. Our results showed that the performance of p-Impute is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, and requiring a scalable amount of memory and computational time.


2019 ◽  
Author(s):  
Corbin Quick ◽  
Pramod Anugu ◽  
Solomon Musani ◽  
Scott T. Weiss ◽  
Esteban G. Burchard ◽  
...  

ABSTRACTA key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to capture the full spectrum of genetic variation, but remains prohibitively expensive for large samples. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture variation across a wider set of variants. However, imputation coverage and accuracy depend crucially on the reference panel size and genetic distance from the target population.Here, we consider a strategy in which a subset of study participants is sequenced and the rest array-genotyped and imputed using a reference panel that comprises the sequenced study participants and individuals from an external reference panel. We systematically assess how imputation quality and statistical power for association depend on the number of individuals sequenced and included in the reference panel for two admixed populations (African and Latino Americans) and two European population isolates (Sardinians and Finns). We develop a framework to identify powerful and cost-effective GWAS designs in these populations given current sequencing and array genotyping costs. For populations that are well-represented in current reference panels, we find that array genotyping alone is cost-effective and well-powered to detect both common- and rare-variant associations. For poorly represented populations, we find that sequencing a subset of study participants to improve imputation is often more cost-effective than array genotyping alone, and can substantially increase genomic coverage and power.


2018 ◽  
Vol 19 (1) ◽  
pp. 73-96 ◽  
Author(s):  
Sayantan Das ◽  
Gonçalo R. Abecasis ◽  
Brian L. Browning

Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.


2021 ◽  
Author(s):  
Samuel Pattillo Smith ◽  
Sahar Shahamatdar ◽  
Wei Cheng ◽  
Selena Zhang ◽  
Joseph Paik ◽  
...  

AbstractSince 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from European ancestry individuals apply heterogeneously in non-European ancestry individuals. Here, we argue that enrichment analyses which aggregate SNP-level association statistics at multiple genomic scales—to genes and pathways—have been overlooked and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the insights generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven self-identified human ancestries in the UK Biobank and the Biobank Japan, as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African-American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. By testing for statistical associations at multiple genomic scales, enrichment analyses also illustrate the importance of reconciling contrasting results from association tests, heritability estimation, and prediction models in order to make personalized medicine a reality for all.


Nature ◽  
2021 ◽  
Vol 590 (7845) ◽  
pp. 290-299 ◽  
Author(s):  
Daniel Taliun ◽  
◽  
Daniel N. Harris ◽  
Michael D. Kessler ◽  
Jedidiah Carlson ◽  
...  

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 772
Author(s):  
João Botelho ◽  
Vanessa Machado ◽  
José João Mendes ◽  
Paulo Mascarenhas

The latest evidence revealed a possible association between periodontitis and Parkinson’s disease (PD). We explored the causal relationship of this bidirectional association through two-sample Mendelian randomization (MR) in European ancestry populations. To this end, we used openly accessible data of genome-wide association studies (GWAS) on periodontitis and PD. As instrumental variables for periodontitis, seventeen single-nucleotide polymorphisms (SNPs) from a GWAS of periodontitis (1817 periodontitis cases vs. 2215 controls) and eight non-overlapping SNPs of periodontitis from an additional GWAS for validation purposes. Instrumental variables to explore for the reverse causation included forty-five SNPs from a GWAS of PD (20,184 cases and 397,324 controls). Multiple approaches of MR were carried-out. There was no evidence of genetic liability of periodontitis being associated with a higher risk of PD (B = −0.0003, Standard Error [SE] 0.0003, p = 0.26). The eight independent SNPs (B = −0.0000, SE 0.0001, p = 0.99) validated this outcome. We also found no association of genetically primed PD towards periodontitis (B = −0.0001, SE 0.0001, p = 0.19). These MR study findings do not support a bidirectional causal genetic liability between periodontitis and PD. Further GWAS studies are needed to confirm the consistency of these results.


2021 ◽  
Author(s):  
Robin N Beaumont ◽  
Isabelle K Mayne ◽  
Rachel M Freathy ◽  
Caroline F Wright

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.


Sign in / Sign up

Export Citation Format

Share Document