scholarly journals Practical Calling Approach for Exome Array-Based Genome-Wide Association Studies in Korean Population

2015 ◽  
Vol 2015 ◽  
pp. 1-6 ◽  
Author(s):  
Tae-Joon Park ◽  
Lyong Heo ◽  
Sanghoon Moon ◽  
Young Jin Kim ◽  
Ji Hee Oh ◽  
...  

Exome-based genotyping arrays are cost-effective and have recently been used as alternative platforms to whole-exome sequencing. However, the automated clustering algorithm in an exome array has a genotype calling problem in accuracy for identifying rare and low-frequency variants. To address these shortcomings, we present a practical approach for accurate genotype calling using the Illumina Infinium HumanExome BeadChip. We present comparison results and a statistical summary of our genotype data sets. Our data set comprises 14,647 Korean samples. To solve the limitation of automated clustering, we performed manual genotype clustering for the targeted identification of 46,076 variants that were identified using GenomeStudio software. To evaluate the effects of applying custom cluster files, we tested cluster files using 804 independent Korean samples and the same platform. Our study firstly suggests practical guidelines for exome chip quality control in Asian populations and provides valuable insight into an association study using exome chip.

2017 ◽  
Author(s):  
Sina Rüeger ◽  
Aaron McDaid ◽  
Zoltán Kutalik

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.


2021 ◽  
Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.


2020 ◽  
pp. HEP36
Author(s):  
Pierre Nahon ◽  
Manon Allaire ◽  
Jean-Charles Nault ◽  
Valérie Paradis

Hepatocellular carcinoma (HCC) developed in non-alcoholic fatty liver disease (NAFLD) individuals presents substantial clinical and biological characteristics, which remain to be elucidated. Its occurrence in noncirrhotic patients raises issues regarding surveillance strategies, which cannot be considered as cost-effective given the high prevalence of obesity and metabolic syndrome, and furthermore delineates specific oncogenic process that could be targeted in the setting of primary or secondary prevention. In this context, the identification of a genetic heterogeneity modulating HCC risk as well as specific biological pathways have been made possible through genome-wide association studies, development of animal models and in-depth analyses of human samples at the pathological and genomic levels. These advances must be confirmed and pursued to pave the way for personalized management of NAFLD-related HCC.


2019 ◽  
Vol 29 (4) ◽  
pp. 689-702 ◽  
Author(s):  
Thibaud S Boutin ◽  
David G Charteris ◽  
Aman Chandra ◽  
Susan Campbell ◽  
Caroline Hayward ◽  
...  

Abstract Retinal detachment (RD) is a serious and common condition, but genetic studies to date have been hampered by the small size of the assembled cohorts. In the UK Biobank data set, where RD was ascertained by self-report or hospital records, genetic correlations between RD and high myopia or cataract operation were, respectively, 0.46 (SE = 0.08) and 0.44 (SE = 0.07). These correlations are consistent with known epidemiological associations. Through meta-analysis of genome-wide association studies using UK Biobank RD cases (N = 3 977) and two cohorts, each comprising ~1 000 clinically ascertained rhegmatogenous RD patients, we uncovered 11 genome-wide significant association signals. These are near or within ZC3H11B, BMP3, COL22A1, DLG5, PLCE1, EFEMP2, TYR, FAT3, TRIM29, COL2A1 and LOXL1. Replication in the 23andMe data set, where RD is self-reported by participants, firmly establishes six RD risk loci: FAT3, COL22A1, TYR, BMP3, ZC3H11B and PLCE1. Based on the genetic associations with eye traits described to date, the first two specifically impact risk of a RD, whereas the last four point to shared aetiologies with macular condition, myopia and glaucoma. Fine-mapping prioritized the lead common missense variant (TYR S192Y) as causal variant at the TYR locus and a small set of credible causal variants at the FAT3 locus. The larger study size presented here, enabled by resources linked to health records or self-report, provides novel insights into RD aetiology and underlying pathological pathways.


TH Open ◽  
2020 ◽  
Vol 04 (04) ◽  
pp. e322-e331
Author(s):  
Eric Manderstedt ◽  
Christina Lind-Halldén ◽  
Stefan Lethagen ◽  
Christer Halldén

AbstractGenome-wide association studies (GWASs) have identified genes that affect plasma von Willebrand factor (VWF) levels. ABO showed a strong effect, whereas smaller effects were seen for VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M. This study screened comprehensively for both common and rare variants in these eight genes by resequencing their coding sequences in 104 Swedish von Willebrand disease (VWD) patients. The common variants previously associated with the VWF level were all accumulated in the VWD patients compared to three control populations. The strongest effect was detected for blood group O coded for by the ABO gene (71 vs. 38% of genotypes). The other seven VWF level associated alleles were enriched in the VWD population compared to control populations, but the differences were small and not significant. The sequencing detected a total of 146 variants in the eight genes. Excluding 70 variants in VWF, 76 variants remained. Of the 76 variants, 54 had allele frequencies > 0.5% and have therefore been investigated for their association with the VWF level in previous GWAS. The remaining 22 variants with frequencies < 0.5% are less likely to have been evaluated previously. PolyPhen2 classified 3 out of the 22 variants as probably or possibly damaging (two in STAB2 and one in STX2); the others were either synonymous or benign. No accumulation of low frequency (0.05–0.5%) or rare variants (<0.05%) in the VWD population compared to the gnomAD (Genome Aggregation Database) population was detected. Thus, rare variants in these genes do not contribute to the low VWF levels observed in VWD patients.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Yousef Rahimi ◽  
Mohammad Reza Bihamta ◽  
Alireza Taleei ◽  
Hadi Alipour ◽  
Pär K. Ingvarsson

Abstract Background Identification of loci for agronomic traits and characterization of their genetic architecture are crucial in marker-assisted selection (MAS). Genome-wide association studies (GWAS) have increasingly been used as potent tools in identifying marker-trait associations (MTAs). The introduction of new adaptive alleles in the diverse genetic backgrounds may help to improve grain yield of old or newly developed varieties of wheat to balance supply and demand throughout the world. Landraces collected from different climate zones can be an invaluable resource for such adaptive alleles. Results GWAS was performed using a collection of 298 Iranian bread wheat varieties and landraces to explore the genetic basis of agronomic traits during 2016–2018 cropping seasons under normal (well-watered) and stressed (rain-fed) conditions. A high-quality genotyping by sequencing (GBS) dataset was obtained using either all original single nucleotide polymorphism (SNP, 10938 SNPs) or with additional imputation (46,862 SNPs) based on W7984 reference genome. The results confirm that the B genome carries the highest number of significant marker pairs in both varieties (49,880, 27.37%) and landraces (55,086, 28.99%). The strongest linkage disequilibrium (LD) between pairs of markers was observed on chromosome 2D (0.296). LD decay was lower in the D genome, compared to the A and B genomes. Association mapping under two tested environments yielded a total of 313 and 394 significant (−log10P >3) MTAs for the original and imputed SNP data sets, respectively. Gene ontology results showed that 27 and 27.5% of MTAs of SNPs in the original set were located in protein-coding regions for well-watered and rain-fed conditions, respectively. While, for the imputed data set 22.6 and 16.6% of MTAs represented in protein-coding genes for the well-watered and rain-fed conditions, respectively. Conclusions Our finding suggests that Iranian bread wheat landraces harbor valuable alleles that are adaptive under drought stress conditions. MTAs located within coding genes can be utilized in genome-based breeding of new wheat varieties. Although imputation of missing data increased the number of MTAs, the fraction of these MTAs located in coding genes were decreased across the different sub-genomes.


2017 ◽  
Author(s):  
Rosa B. Thorolfsdottir ◽  
Gardar Sveinbjornsson ◽  
Patrick Sulem ◽  
Stefan Jonsson ◽  
Gisli H. Halldorsson ◽  
...  

AbstractWe performed a meta-analysis of genome-wide association studies on atrial fibrillation (AF) among 14,710 cases and 373,897 controls from Iceland and 14,792 cases and 393,863 controls from the UK Biobank, focusing on low frequency coding and splice mutations, with follow-up in samples from Norway and the US. We observed associations with two missense (OR=1.19 for both) and one splice-donor mutation (OR=1.52) in RPL3L, encoding a ribosomal protein primarily expressed in skeletal muscle and heart. Analysis of 167 RNA samples from the right atrium revealed that the splice donor mutation in RPL3L results in exon skipping. AF is the first disease associated with RPL3L and RPL3L is the first ribosomal gene implicated in AF. This finding is consistent with tissue specialization of ribosomal function. We also found an association with a missense variant in MYZAP (OR=1.37), encoding a component of the intercalated discs of cardiomyocytes, the organelle harbouring most of the mutated proteins involved in arrhythmogenic right ventricular cardiomyopathy. Both discoveries emphasize the close relationship between the mechanical and electrical function of the heart.


Sign in / Sign up

Export Citation Format

Share Document