scholarly journals I’am hiQ – A Novel Accuracy Index for Imputed Genotypes

Author(s):  
Albert Rosenberger ◽  
Viola Tozzi ◽  
Rayjean J Hung ◽  
David C Christiani ◽  
Neil E Caporaso ◽  
...  

Abstract Background: Imputation of untyped markers is a standard tool in genome-wide association studies to close the gap between directly genotyped and other known DNA variants. However, high accuracy with which genotypes are imputed is fundamental. Several accuracy measures have been proposed and some are implemented in imputation software, unfortunately diversely across platforms. In the present paper we introduce I’am hiQ, an independent pair of accuracy measures that can be applied to dosage files, the output of all imputation software. I’am (imputation accuracy measure) quantifies the average amount of individual-specific versus population-specific genotype information in a linear manner. hiQ (heterogeneity in quantities of dosages) addresses the inter-individual heterogeneity between dosages of a marker across the sample at hand. Results: Applying both measures to a large case-control sample of the International Lung Cancer Consortium (ILCCO), comprising 27,065 individuals, we found meaningful thresholds for I’am and hiQ suitable to classify markers of poor accuracy. We demonstrate how Manhattan-like plots and moving averages of I’am and hiQ can be useful to identify regions enriched with less accurate imputed markers, whereas these regions would by missed when applying the accuracy measure info (implemented in IMPUTE2). Conclusion: We recommend using I’am hiQ additional to other accuracy scores for variant filtering before stepping into the analysis of imputed GWAS data.

Author(s):  
Tiit Nikopensius ◽  
Priit Niibo ◽  
Toomas Haller ◽  
Triin Jagomägi ◽  
Ülle Voog-Oras ◽  
...  

Abstract Background Juvenile idiopathic arthritis (JIA) is the most common chronic rheumatic condition of childhood. Genetic association studies have revealed several JIA susceptibility loci with the strongest effect size observed in the human leukocyte antigen (HLA) region. Genome-wide association studies have augmented the number of JIA-associated loci, particularly for non-HLA genes. The aim of this study was to identify new associations at non-HLA loci predisposing to the risk of JIA development in Estonian patients. Methods We performed genome-wide association analyses in an entire JIA case–control sample (All-JIA) and in a case–control sample for oligoarticular JIA, the most prevalent JIA subtype. The entire cohort was genotyped using the Illumina HumanOmniExpress BeadChip arrays. After imputation, 16,583,468 variants were analyzed in 263 cases and 6956 controls. Results We demonstrated nominal evidence of association for 12 novel non-HLA loci not previously implicated in JIA predisposition. We replicated known JIA associations in CLEC16A and VCTN1 regions in the oligoarticular JIA sample. The strongest associations in the All-JIA analysis were identified at PRKG1 (P = 2,54 × 10−6), LTBP1 (P = 9,45 × 10−6), and ELMO1 (P = 1,05 × 10−5). In the oligoarticular JIA analysis, the strongest associations were identified at NFIA (P = 5,05 × 10−6), LTBP1 (P = 9,95 × 10−6), MX1 (P = 1,65 × 10−5), and CD200R1 (P = 2,59 × 10−5). Conclusion This study increases the number of known JIA risk loci and provides additional evidence for the existence of overlapping genetic risk loci between JIA and other autoimmune diseases, particularly rheumatoid arthritis. The reported loci are involved in molecular pathways of immunological relevance and likely represent genomic regions that confer susceptibility to JIA in Estonian patients. Key Points• Juvenile idiopathic arthritis (JIA) is the most common childhood rheumatic disease with heterogeneous presentation and genetic predisposition.• Present genome-wide association study for Estonian JIA patients is first of its kind in Northern and Northeastern Europe.• The results of the present study increase the knowledge about JIA risk loci replicating some previously described associations, so adding weight to their relevance and describing novel loci.• The study provides additional evidence for the existence of overlapping genetic risk loci between JIA and other autoimmune diseases, particularly rheumatoid arthritis.


2012 ◽  
Vol 32 (suppl_1) ◽  
Author(s):  
Themistocles L Assimes ◽  
Benjamin Goldstein ◽  

Genome wide association studies (GWAS) to date have identified 30 CAD susceptibility loci but the ability to use this information to improve risk prediction remains limited. A meta-analysis of the GWAS and Cardio Metabochip data produced by the CARDIoGRAM+C4D consortium representing 63,253 cases and 126,820 controls has identified 1885 SNPs passing a False Discovery Rate (FDR) threshold of 0.5%. We hypothesized that an expanded multi locus genetic risk score (GRS) incorporating genotype information at all loci below an FDR of 0.5% would perform better than a GRS restricted to 42 loci reaching genome wide significance and tested this hypothesis in subjects of European ancestry participating in the Atherosclerosis Risk in the Community (ARIC) study. Models testing the GRS were either minimally (age and sex) or fully adjusted for traditional risk factors (TRFs). The Figure shows the hazard ratio (HZ) and 95% CI for incident events comparing each quintile of GRS to the middle quintile. The GRS including genotype information at all loci with an FDR of 0.5% noticeably improves risk prediction over the GRS restricted to genome wide significant loci in both the minimally and fully adjusted models based on several metrics including i) HR per GRS quintile, ii) the HR per SD of the GRS, and iii) the logistic regression pseudo R2, and iv) the c statistic. The HR per GRS quintile and per SD of GRS were all lower in the fully adjusted models compared to the respective minimally adjusted models but the reduction of the HR was more striking for the models that tested the more expansive GRS. These findings suggest that a larger proportion of novel GWAS CAD loci are mediating their effects through TRFs. While these findings demonstrate some progress in risk prediction using GWAS loci, both the limited and the expanded GRS continues to explain a relatively small proportion of the overall variance compared to TRF. Thus, the clinical utility of a CAD GRS remains to be determined.


2020 ◽  
Vol 3 (1) ◽  
pp. 265-288
Author(s):  
Ning Sun ◽  
Hongyu Zhao

Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.


2021 ◽  
Author(s):  
Frank David Vogt ◽  
Gautam Shirsekar ◽  
Detlef Weigel

We present a new software package vcf2gwas to perform reproducible genome-wide association studies (GWAS). vcf2gwas is a Python API for bcftools, PLINK and GEMMA. Before running the analysis a traditional GWAS workflow requires the user to edit and format the genotype information from commonly used Variant Call Format (VCF) file and phenotype information. Post-processing steps involve summarizing and visualizing the analysis results. This workflow requires a user to utilize the command-line, manual text-editing and knowledge of one or more programming/scripting languages which can be time-consuming especially when analyzing multiple phenotypes. Our package provides a convenient pipeline performing all of these steps, reducing the GWAS workflow to a single command-line input without the need to edit or format the VCF file beforehand or to install any additional software. In addition, features like reducing the dimensionality of the phenotypic space and performing analyses on the reduced dimensions or comparing the significant variants from the results to specific genes/regions of interest are implemented. By integrating different tools to perform GWAS under one workflow, the package ensures reproducible GWAS while reducing the user efforts significantly


2018 ◽  
Author(s):  
Candelaria Vergara ◽  
Margaret M. Parker ◽  
Liliana Franco ◽  
Michael H. Cho ◽  
Ana V. Valencia-Duarte ◽  
...  

ABSTRACTGenotype imputation is used to estimate unobserved genotypes from genome-wide maker data, to increase genome coverage and power for genome-wide association studies. Imputation has been most successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We aimed to compare the performance of these reference panels when imputing variation in 3,747 African Americans (AA) from 2 cohorts (HCV and COPDGene) genotyped using the Illumina Omni family of microarrays. The haplotypes of 2,504 individuals (from 1000G), 883 (from CAAPA) and 32,611 (from HRC) were used as reference. We compared the performance of these panels based on number of variants, imputation quality, imputation accuracy and coverage. In both cohorts, 1000G imputed 1.5–1.6x more variants compared to CAAPA and 1.2x more variants than HRC. Similar findings were observed for variants with higher imputation quality (R2>0.5) and for rare, low frequency, and common variants. When merging the results of the three panels the total number of imputed variants was 62M-63M with 20M overlapping variants imputed by all three panels, and a range of 5 to 15M unique variants imputed exclusively with one of the three panels. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. The 1000G, HRC and CAAPA participants of African ancestry provided high performance and accuracy for imputation of African American admixed individuals, increasing the total number of variants with high quality available for subsequent analyses. These three panels are complementary and would benefit from the development of an integrated African reference panel, including data from multiple sources and populations.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 1889 ◽  
Author(s):  
Sally R. Ellingson ◽  
David W. Fardo

This paper provides details on the necessary steps to assess and control data in genome wide association studies (GWAS) using genotype information on a large number of genetic markers for large number of individuals. Due to varied study designs and genotyping platforms between multiple sites/projects as well as potential genotyping errors, it is important to ensure high quality data. Scripts and directions are provided to facilitate others in this process.


2021 ◽  
Author(s):  
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  
...  

AbstractGenome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genome of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on SNPs, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed addon SNPs to the base H3Africa array.


Sign in / Sign up

Export Citation Format

Share Document