Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

10.1101/2021.02.03.429542 ◽

2021 ◽

Author(s):

Zhi Ming Xu ◽

Sina Rüeger ◽

Michaela Zwyer ◽

Daniela Brites ◽

Hellen Hiza ◽

...

Keyword(s):

Association Studies ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Small Subset ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Selection Of

AbstractGenome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genome of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on SNPs, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed addon SNPs to the base H3Africa array.

Download Full-text

Exploration of CYP21A2 and CYP17A1 polymorphisms and preeclampsia risk among Chinese Han population: a large-scale case-control study based on 5021 subjects

Human Genomics ◽

10.1186/s40246-020-00286-0 ◽

2020 ◽

Vol 14 (1) ◽

Author(s):

Bo Hou ◽

Xuewen Jia ◽

Ziwen Deng ◽

Xin Liu ◽

Huitang Liu ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Chinese Han ◽

Single Nucleotide ◽

Allelic Frequencies ◽

Genome Wide ◽

And Control ◽

The Relationship

Abstract Background Several genome-wide association studies have identified single-nucleotide polymorphisms (SNPs), such as rs4409766, rs1004467, and rs3824755 in CYP17A1 and rs2021783 in CYP21A2, as new hypertension susceptibility genetic variants in the Chinese population. This study aimed to look into the relationship between preeclampsia (PE) and these SNPs in Chinese Han women. Methods Overall, 5021 unrelated pregnant women were recruited, including 2002 patients with PE and 3019 normal healthy controls. The real-time PCR (TaqMan) method was applied to genotype these four polymorphisms. Results A statistically obvious difference in the allelic frequencies was observed in CYP21A2 rs2021783 between cases and controls (χ2 = 7.201, Pc = 0.028 by allele), and the T allele was associated with the occurrence and development of PE (OR = 1.151, 95% CI 1.039–1.275). We also found a significant association between rs2021783 and the development of early-onset PE (Pc = 0.008 by genotype, Pc = 0.004 by allele). For rs1004467 and rs3824755, the distribution of allelic frequencies differed markedly between mild PE and control groups (χ2 = 6.843, Pc = 0.036; χ2 = 6.869, Pc = 0.036), and patients with the TT genotype of rs1004467 were less easy to develop mild PE than were those carrying the CT or CC genotype (χ2 = 7.002, Pc = 0.032, OR = 1.306, 95% CI 1.071–1.593). The GG genotype of rs3824755 appeared to a protective effect on the occurrence of mild PE (OR = 0.766, 95% CI 0.629–0.934). Conclusions CYP21A2 rs2021783 appears to be closely related to PE susceptibility, and CYP17A1 rs1004467 and rs3824755 seem to be closely associated with mild PE in Han women.

Download Full-text

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009628 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009628

Author(s):

Zhi Ming Xu ◽

Sina Rüeger ◽

Michaela Zwyer ◽

Daniela Brites ◽

Hellen Hiza ◽

...

Keyword(s):

Association Studies ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Small Subset ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Selection Of

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Download Full-text

Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkaa953 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1480-D1488

Author(s):

Yingjie Gao ◽

Zhiquan Yang ◽

Wenqian Yang ◽

Yanbo Yang ◽

Jing Gong ◽

...

Keyword(s):

Plant Species ◽

Genetic Research ◽

Genotype Imputation ◽

Reference Panel ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Online Tool ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Genotype imputation is a process that estimates missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs), boost the power to identify genetic association and promote the combination of genetic studies. However, there has been a lack of high-quality reference panels for most plants, which greatly hinders the application of genotype imputation. Here, we developed Plant-ImputeDB (http://gong_lab.hzau.edu.cn/Plant_imputeDB/), a comprehensive database with reference panels of 12 plant species for online genotype imputation, SNP and block search and free download. By integrating genotype data and whole-genome resequencing data of plants from various studies and databases, the current Plant-ImputeDB provides high-quality reference panels of 12 plant species, including ∼69.9 million SNPs from 34 244 samples. It also provides an easy-to-use online tool with the option of two popular tools specifically designed for genotype imputation. In addition, Plant-ImputeDB accepts submissions of different types of genomic variations, and provides free and open access to all publicly available data in support of related research worldwide. In general, Plant-ImputeDB may serve as an important resource for plant genotype imputation and greatly facilitate the research on plant genetic research.

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

Impact of Pre and Post Variant Filtration Strategies on Imputation

10.21203/rs.3.rs-128366/v1 ◽

2020 ◽

Author(s):

Celine Charon ◽

Rodrigue Allodji ◽

Vincent Meyer ◽

Jean-François Deleuze

Keyword(s):

Quality Control ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Direct Effects ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Conservative Post

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.

Download Full-text

Genome-Wide Association Studies for the Concentration of Albumin in Colostrum and Serum in Chinese Holstein

Animals ◽

10.3390/ani10122211 ◽

2020 ◽

Vol 10 (12) ◽

pp. 2211

Author(s):

Shan Lin ◽

Zihui Wan ◽

Junnan Zhang ◽

Lingna Xu ◽

Bo Han ◽

...

Keyword(s):

Association Studies ◽

Significant Snps ◽

Albumin Concentration ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Chinese Holstein ◽

Genome Wide ◽

Chinese Holstein Cows ◽

Newborn Calves

Albumin can be of particular benefit in fighting infections for newborn calves due to its anti-inflammatory and anti-oxidative stress properties. To identify the candidate genes related to the concentration of albumin in colostrum and serum, we collected the colostrum and blood samples from 572 Chinese Holstein cows within 24 h after calving and measured the concentration of albumin in the colostrum and serum using the ELISA methods. The cows were genotyped with GeneSeek 150 K chips (containing 140,668 single nucleotide polymorphisms; SNPs). After quality control, we performed GWASs via GCTA software with 91,620 SNPs and 563 cows. Consequently, 9 and 7 genome-wide significant SNPs (false discovery rate (FDR) at 1%) were identified. Correspondingly, 42 and 206 functional genes that contained or were approximate to (±1 Mbp) the significant SNPs were acquired. Integrating the biological process of these genes and the reported QTLs for immune and inflammation traits in cattle, 3 and 12 genes were identified as candidates for the concentration of colostrum and serum albumin, respectively; these are RUNX1, CBR1, OTULIN,CDK6, SHARPIN, CYC1, EXOSC4, PARP10, NRBP2, GFUS, PYCR3, EEF1D, GSDMD, PYCR2 and CXCL12. Our findings provide important information for revealing the genetic mechanism behind albumin concentration and for molecular breeding of disease-resistance traits in dairy cattle.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text