Genotype imputation and reference panel: a systematic evaluation on haplotype size and diversity

2019 ◽  
Vol 21 (5) ◽  
pp. 1806-1817 ◽  
Author(s):  
Wei-Yang Bai ◽  
Xiao-Wei Zhu ◽  
Pei-Kuan Cong ◽  
Xue-Jun Zhang ◽  
J Brent Richards ◽  
...  

Abstract Here, 622 imputations were conducted with 394 customized reference panels for Han Chinese and European populations. Besides validating the fact that imputation accuracy could always benefit from the increased panel size when the reference panel was population specific, the results brought two new thoughts. First, when the haplotype size of the reference panel was fixed, the imputation accuracy of common and low-frequency variants (Minor Allele Frequency (MAF) > 0.5%) decreased while the population diversity of the reference panel increased, but for rare variants (MAF < 0.5%), a small fraction of diversity in panel could improve imputation accuracy. Second, when the haplotype size of the reference panel was increased with extra population-diverse samples, the imputation accuracy of common variants (MAF > 5%) for the European population could always benefit from the expanding sample size. However, for the Han Chinese population, the accuracy of all imputed variants reached the highest when reference panel contained a fraction of an extra diverse sample (8–21%). In addition, we evaluated the imputation performances in the existing reference panels, such as the Haplotype Reference Consortium (HRC), 1000 Genomes Project Phase 3 and the China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE). For the European population, the HRC panel showed the best performance in our analysis. For the Han Chinese population, we proposed an optimum imputation reference panel constituent ratio if researchers would like to customize their own sequenced reference panel, but a high-quality and large-scale Chinese reference panel was still needed. Our findings could be generalized to the other populations with conservative genome; a tool was provided to investigate other populations of interest (https://github.com/Abyss-bai/reference-panel-reconstruction).

2019 ◽  
Author(s):  
Wei-Yang Bai ◽  
Xiao-Wei Zhu ◽  
Pei-Kuan Cong ◽  
Xue-Jun Zhang ◽  
J Brent Richards ◽  
...  

AbstractHere, 622 imputations were conducted with 394 customized reference panels for Han Chinese and European populations. Besides validating the fact that the imputation accuracy could always benefit from the increased panel size when the reference panel was population-specific, the results brought two new thoughts as follows. First, when the haplotype size of reference panel was fixed, the imputation accuracy of common and low-frequency variants (MAF>0.5%) decreased while the population-diversity of reference panel increased, but for rare variants (MAF<0.5%), a fraction of diversity (<20%) of panel could improve the imputation accuracy. Second, when the haplotype size of reference panel was increased with extra population-diverse samples, the imputation accuracy of common variants (MAF>5%) for European population could always benefit from the expanding sample size. But for Han Chinese population, the accuracy of all imputed variants reached the highest when reference panel contained a fraction of extra diverse sample (15%∼21%). In addition, we evaluated the existing reference panels such as the HRC and 1000G Phase3 and CONVERGE. For European population, HRC was the best reference panel. For Han Chinese population, we proposed an optimum constituent ratio for the Han Chinese imputation if researchers would like to customize their own sequenced reference panel, but a high quality and large-scale Chinese reference panel was still needed. Our findings could be generalized to the other populations with conservative genome, a tool was provided to investigate other populations of interest (https://github.com/Abyss-bai/reference-panel-reconstruction).Highlights (Key points)A total of 394 reference panels were designed and customized by three strategies, and large-scale genotype imputations were performed with these panels for systematic evaluation in Han Chinese and European populations.The accuracy of imputed variants reached the highest when reference panel contains a fraction of extra diverse sample (15%∼21%) for Han Chinese population, if the haplotype size of the reference panel was increased with extra samples, which is the most common cases.The imputation accuracy showed the different trends between Han Chinese and European populations. In a sense, the European genome may more diverse than Han Chinese genome by itself.Existing reference panels were not the best choice for Chinese imputation, a high quality and large-scale Chinese reference panel was still needed.


Author(s):  
Hong-miao Tao ◽  
Bei Shao ◽  
Guo-zhong Chen

Background:The angiotensin-1 converting enzyme (ACE) gene is known to have two polymorphic alleles insertion/deletion(I/D). People with the DD genotype have been shown to be at greater risk of cerebral infarction, but only in some studies. Identification of cerebral infarction susceptibility genes and quantification of associated risks have been hampered by conflicting results from underpowered case-control studies. This meta-analysis was made to look specifically into the genetics of cerebral infarction among Han Chinese population.Methods:Genetic associations studies published from January 1, 1990 to December 30, 2007 were collected from databases of MEDLINE, EMBASE, CBM and CNKI. Data were extracted using standardised forms and pooled odds ratios (ORs) with 95% confidence intervals (CIs) were calculated.Results:Twenty-nine original case-control studies of Han Chinese population, comprising 3654 patients with cerebral infarction and 3058 controls were included in the meta-analysis. Using the random effects model, the pooled ORs of ACE DD genotype VS ID+ II was 1.91 (95% CI 1.56 to 2.34, P<0.00001).Conclusions:These data suggest that the ACE DD genotype may be a risk factor for cerebral infarction in Han Chinese population. A large scale case-control study is needed to clarify the functional effect of the polymorphism of the ACE I/D gene in the pathogenesis of cerebral infarction in Han Chinese population.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Hongtao Tang ◽  
Zhenzhen Cheng ◽  
Wenlong Ma ◽  
Youwen Liu ◽  
Zhaofang Tong ◽  
...  

2018 ◽  
Vol 137 (6-7) ◽  
pp. 431-436 ◽  
Author(s):  
Yuan Lin ◽  
Lu Liu ◽  
Sen Yang ◽  
Yun Li ◽  
Dongxin Lin ◽  
...  

2015 ◽  
Vol 207 (6) ◽  
pp. 490-494 ◽  
Author(s):  
Zhiqiang Li ◽  
Yuqian Xiang ◽  
Jianhua Chen ◽  
Qiaoli Li ◽  
Jiawei Shen ◽  
...  

BackgroundA large schizophrenia genome-wide association study (GWAS) and a subsequent extensive replication study of individuals of European ancestry identified eight new loci with genome-wide significance and suggested that theMIR137-mediated pathway plays a role in the predisposition for schizophrenia.AimsTo validate the above findings in a Han Chinese population.MethodWe analysed the single nucleotide polymorphisms (SNPs) in the newly identified schizophrenia candidate loci and predictedMIR137target genes based on our published Han Chinese populations (BIOX) GWAS data. We then analysed 18 SNPs from the candidate regions in an independent cohort that consisted of 3585 patients with schizophrenia and 5496 controls of Han Chinese ancestry.ResultsWe replicated the associations of five markers (P<0.05), including three that were located in the predictedMIR137target genes. Two loci (ITIH3/4: rs2239547,P=1.17×10–10andCALN1: rs2944829,P=9.97×10–9) exhibited genome-wide significance in the Han Chinese population.ConclusionsTheITIH3/4locus has been reported to be of genome-wide significance in the European population. The successful replication of this finding in a different ethnic group provides stronger evidence for the association between schizophrenia andITIH3/4. We detected the first genome-wide significant association of schizophrenia withCALN1, which is a predicted target ofMIR137, and thus provide new evidence for the associations betweenMIR137targets and schizophrenia.


2019 ◽  
Vol 48 (D1) ◽  
pp. D971-D976 ◽  
Author(s):  
Yang Gao ◽  
Chao Zhang ◽  
Liyun Yuan ◽  
YunChao Ling ◽  
Xiaoji Wang ◽  
...  

Abstract As the largest ethnic group in the world, the Han Chinese population is nonetheless underrepresented in global efforts to catalogue the genomic variability of natural populations. Here, we developed the PGG.Han, a population genome database to serve as the central repository for the genomic data of the Han Chinese Genome Initiative (Phase I). In its current version, the PGG.Han archives whole-genome sequences or high-density genome-wide single-nucleotide variants (SNVs) of 114 783 Han Chinese individuals (a.k.a. the Han100K), representing geographical sub-populations covering 33 of the 34 administrative divisions of China, as well as Singapore. The PGG.Han provides: (i) an interactive interface for visualization of the fine-scale genetic structure of the Han Chinese population; (ii) genome-wide allele frequencies of hierarchical sub-populations; (iii) ancestry inference for individual samples and controlling population stratification based on nested ancestry informative markers (AIMs) panels; (iv) population-structure-aware shared control data for genotype-phenotype association studies (e.g. GWASs) and (v) a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user-friendly interface is provided for data analysis and results visualization. The PGG.Han database is freely accessible via http://www.pgghan.org or https://www.hanchinesegenomes.org.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Juan Xia ◽  
Chunyue Guo ◽  
Kuo Liu ◽  
Yunyi Xie ◽  
Han Cao ◽  
...  

Abstract Background There is a well-documented empirical relationship between lipoprotein (a) [Lp(a)] and cardiovascular disease (CVD); however, causal evidence, especially from the Chinese population, is lacking. Therefore, this study aims to estimate the causal association between variants in genes affecting Lp(a) concentrations and CVD in people of Han Chinese ethnicity. Methods Two-sample Mendelian randomization analysis was used to assess the causal effect of Lp(a) concentrations on the risk of CVD. Summary statistics for Lp(a) variants were obtained from 1256 individuals in the Cohort Study on Chronic Disease of Communities Natural Population in Beijing, Tianjin and Hebei. Data on associations between single-nucleotide polymorphisms (SNPs) and CVD were obtained from recently published genome-wide association studies. Results Thirteen SNPs associated with Lp(a) levels in the Han Chinese population were used as instrumental variables. Genetically elevated Lp(a) was inversely associated with the risk of atrial fibrillation [odds ratio (OR), 0.94; 95% confidence interval (95%CI), 0.901–0.987; P = 0.012)], the risk of arrhythmia (OR, 0.96; 95%CI, 0.941–0.990; P = 0.005), the left ventricular mass index (OR, 0.97; 95%CI, 0.949–1.000; P = 0.048), and the left ventricular internal dimension in diastole (OR, 0.97; 95%CI, 0.950–0.997; P = 0.028) according to the inverse-variance weighted method. No significant association was observed for congestive heart failure (OR, 0.99; 95% CI, 0.950–1.038; P = 0.766), ischemic stroke (OR, 1.01; 95%CI, 0.981–1.046; P = 0.422), and left ventricular internal dimension in systole (OR, 0.98; 95%CI, 0.960–1.009; P = 0.214). Conclusions This study provided evidence that genetically elevated Lp(a) was inversely associated with atrial fibrillation, arrhythmia, the left ventricular mass index and the left ventricular internal dimension in diastole, but not with congestive heart failure, ischemic stroke, and the left ventricular internal dimension in systole in the Han Chinese population. Further research is needed to identify the mechanism underlying these results and determine whether genetically elevated Lp(a) increases the risk of coronary heart disease or other CVD subtypes.


2020 ◽  
Vol 23 (8) ◽  
pp. 1050-1056
Author(s):  
Tianyun Zhao ◽  
Chi Ma ◽  
Wei Wang ◽  
Bin Zhao ◽  
Baopin Xie ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yanmei Ruan ◽  
Jinwei Zhang ◽  
Shiqi Mai ◽  
Wenfeng Zeng ◽  
Lili Huang ◽  
...  

AbstractGenetic factors and gene-environment interaction may play an important role in the development of noise induced hearing loss (NIHL). 191 cases and 191 controls were selected by case–control study. Among them, case groups were screened from workers exposed to noise in binaural high-frequency hearing thresholds greater than 25 dB (A). Workers with hearing thresholds ≤ 25 dB (A) in any binaural frequency band were selected to the control group, based on matching factors such as age, exposure time to noise, and operating position. The blood samples from two groups of workers were subjected to DNA extraction and SNP sequencing of CASP3 and CASP7 genes using the polymerase chain reaction ligase detection reaction method. Conditional logistic regression correction was used to analyze the genetic variation associated with susceptibility to NIHL. There was an association between rs2227310 and rs4353229 of the CASP7 gene and the risk of NIHL. Compared with the GG genotype, the CC genotype of rs2227310 reduced the risk of NIHL. Compared with CC genotype, the TT genotype of rs4353229 reduced the risk of NIHL. Workers carrying the rs2227310GG and rs4353229CC genotype had an increased risk of NIHL compared to workers without any high-risk genotype. There were additive interaction and multiplication interaction between CASP7rs2227310 and CNE, and the same interaction between CASP7rs4353229 and CNE. The interaction between the CASP7 gene and CNE significantly increased the risk of NIHL. The genetic polymorphisms of CASP7rs2227310GG and CASP7rs4353229CC were associated with an increased risk of NIHL in Han Chinese population and have the potential to act as biomarkers for noise-exposed workers.


Sign in / Sign up

Export Citation Format

Share Document