Measuring epigenetics as the mediator of gene/environment interactions in DOHaD

Analysis of DNA methylation data in epigenome-wide association studies provides many bioinformatics and statistical challenges. Not least of these, are the non-independence of individual DNA methylation marks from each other, from genotype and from technical sources of variation. In this review we discuss DNA methylation data from the Infinium450K array and processing methodologies to reduce technical variation. We describe recent approaches to harness the concordance of neighbouring DNA methylation values to improve power in association studies. We also describe how the non-independence of genotype and DNA methylation has been used to infer causality (in the case of Mendelian randomization approaches); suggest the mediating effect of DNA methylation in linking intergenic single nucleotide polymorphisms, identified in genome-wide association studies, to phenotype; and to uncover the widespread influence of gene and environment interactions on methylation levels.

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

Impact of Pre and Post Variant Filtration Strategies on Imputation

10.21203/rs.3.rs-128366/v1 ◽

2020 ◽

Author(s):

Celine Charon ◽

Rodrigue Allodji ◽

Vincent Meyer ◽

Jean-François Deleuze

Keyword(s):

Quality Control ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Direct Effects ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Conservative Post

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.

Download Full-text

Genome-Wide Association Studies for the Concentration of Albumin in Colostrum and Serum in Chinese Holstein

Animals ◽

10.3390/ani10122211 ◽

2020 ◽

Vol 10 (12) ◽

pp. 2211

Author(s):

Shan Lin ◽

Zihui Wan ◽

Junnan Zhang ◽

Lingna Xu ◽

Bo Han ◽

...

Keyword(s):

Association Studies ◽

Significant Snps ◽

Albumin Concentration ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Chinese Holstein ◽

Genome Wide ◽

Chinese Holstein Cows ◽

Newborn Calves

Albumin can be of particular benefit in fighting infections for newborn calves due to its anti-inflammatory and anti-oxidative stress properties. To identify the candidate genes related to the concentration of albumin in colostrum and serum, we collected the colostrum and blood samples from 572 Chinese Holstein cows within 24 h after calving and measured the concentration of albumin in the colostrum and serum using the ELISA methods. The cows were genotyped with GeneSeek 150 K chips (containing 140,668 single nucleotide polymorphisms; SNPs). After quality control, we performed GWASs via GCTA software with 91,620 SNPs and 563 cows. Consequently, 9 and 7 genome-wide significant SNPs (false discovery rate (FDR) at 1%) were identified. Correspondingly, 42 and 206 functional genes that contained or were approximate to (±1 Mbp) the significant SNPs were acquired. Integrating the biological process of these genes and the reported QTLs for immune and inflammation traits in cattle, 3 and 12 genes were identified as candidates for the concentration of colostrum and serum albumin, respectively; these are RUNX1, CBR1, OTULIN,CDK6, SHARPIN, CYC1, EXOSC4, PARP10, NRBP2, GFUS, PYCR3, EEF1D, GSDMD, PYCR2 and CXCL12. Our findings provide important information for revealing the genetic mechanism behind albumin concentration and for molecular breeding of disease-resistance traits in dairy cattle.

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

An independent validation study of three single nucleotide polymorphisms at the sex hormone-binding globulin locus for testosterone levels identified by genome-wide association studies

Human Reproduction Open ◽

10.1093/hropen/hox002 ◽

2017 ◽

Vol 2017 (1) ◽

Author(s):

Youichi Sato ◽

Atsushi Tajima ◽

Motoki Katsurayama ◽

Shiari Nozawa ◽

Miki Yoshiike ◽

...

Keyword(s):

Validation Study ◽

Association Studies ◽

Genome Wide Association ◽

Sex Hormone Binding Globulin ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Hormone Binding ◽

Single Nucleotide ◽

Independent Validation ◽

Genome Wide

Download Full-text

The missing story behind Genome Wide Association Studies: single nucleotide polymorphisms in gene deserts have a story to tell

Frontiers in Genetics ◽

10.3389/fgene.2014.00039 ◽

2014 ◽

Vol 5 ◽

Cited By ~ 24

Author(s):

William Schierding ◽

Wayne S. Cutfield ◽

Justin M. O'Sullivan

Keyword(s):

Single Nucleotide Polymorphisms ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide

Download Full-text

Regulatory Variants and Disease: The E-Cadherin −160C/A SNP as an Example

Molecular Biology International ◽

10.1155/2014/967565 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 17

Author(s):

Gongcheng Li ◽

Tiejun Pan ◽

Dan Guo ◽

Long-Cheng Li

Keyword(s):

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Regulate Gene Expression ◽

Regulatory Variants ◽

Genome Wide ◽

Regulatory Snps ◽

E Cadherin

Single nucleotide polymorphisms (SNPs) occurring in noncoding sequences have largely been ignored in genome-wide association studies (GWAS). Yet, amounting evidence suggests that many noncoding SNPs especially those that are in the vicinity of protein coding genes play important roles in shaping chromatin structure and regulate gene expression and, as such, are implicated in a wide variety of diseases. One of such regulatory SNPs (rSNPs) is the E-cadherin (CDH1) promoter −160C/A SNP (rs16260) which is known to affect E-cadherin promoter transcription by displacing transcription factor binding and has been extensively scrutinized for its association with several diseases especially malignancies. Findings from studying this SNP highlight important clinical relevance of rSNPs and justify their inclusion in future GWAS to identify novel disease causing SNPs.

Download Full-text

Analysis of zebrafish periderm enhancers facilitates identification of a regulatory variant near human KRT8/18

10.1101/2020.01.27.921320 ◽

2020 ◽

Author(s):

Huan Liu ◽

Kaylia Duncan ◽

Annika Helverson ◽

Priyanka Kumari ◽

Camille Mumm ◽

...

Keyword(s):

Association Studies ◽

Mesenchymal Cell ◽

Cell Types ◽

Support Vector ◽

Oral Epithelium ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Regulatory Variant

AbstractGenome wide association studies for non-syndromic orofacial cleft (OFC) have identified single nucleotide polymorphisms (SNPs) at loci where the presumed risk-relevant gene is expressed in oral periderm. The functional subsets of such SNPs are difficult to predict because the sequence underpinnings of periderm enhancers are unknown. We applied ATAC-seq to models of human palate periderm, including zebrafish periderm, mouse embryonic palate epithelia, and a human oral epithelium cell line, and to complementary mesenchymal cell types. We identified sets of enhancers specific to the epithelial cells and trained gapped-kmer support-vector-machine classifiers on these sets. We used the classifiers to predict the effect of 14 OFC-associated SNPs at 12q13 near KRT18. All the classifiers picked the same SNP as having the strongest effect, but the significance was highest with the classifier trained on zebrafish periderm. Reporter and deletion analyses support this SNP as lying within a periderm enhancer regulating KRT18/KRT8 expression.

Download Full-text

Uncovering complementary sets of variants for the prediction of quantitative phenotypes

10.1101/2020.12.11.419952 ◽

2020 ◽

Author(s):

Serhan Yilmaz ◽

Mohamad Fakhouri ◽

Mehmet Koyuturk ◽

A. Ercument Cicek ◽

Oznur Tastan

Keyword(s):

Association Studies ◽

Simple Algorithm ◽

Fine Tuning ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Selection Methods ◽

Single Nucleotide ◽

Phenotype Prediction ◽

Genome Wide ◽

Complementary Subset

Recent genome-wide association studies (GWAS) show that mutations in single genetic loci, frequently called single nucleotide polymorphisms (SNPs), alone are not sufficient to explain the phenotypic heritability of complex, quantitative phenotypes. Instead, many methods attempt to deal with this issue by considering a set of loci that can characterize the phenotype together. While the state-of-the-art methods are successful in selecting subsets of SNPs that can achieve high phenotype prediction rates, they are either slow in runtime or have hyper-parameters that require further fine tuning through cross-validation or other similar techniques, which makes such methods inconvenient to use. In this work, we propose a fast and simple algorithm named Macarons to select a small, complementary subset of SNPs by avoiding redundant pairs of SNPs that are likely to be in linkage disequilibrium (LD). Our method features two interpretable parameters that control the time/performance trade-off without requiring any hyper-parameter optimization procedures. In our experiments, we benchmark the performance of the SNP selection methods on the 17 flowering time phenotypes of Arabidopsis Thaliana. Our results consistently show that Macarons has similar or better phenotype prediction performance while being faster and having a simpler premise than other SNP selection methods.

Download Full-text

Polygenic risk score based on weight gain trajectories is predictive of childhood obesity

10.1101/606277 ◽

2019 ◽

Author(s):

Sarah J. C. Craig ◽

Ana M. Kenney ◽

Junli Lin ◽

Ian M. Paul ◽

Leann L. Birch ◽

...

Keyword(s):

Weight Gain ◽

Childhood Obesity ◽

Risk Score ◽

Genetic Variants ◽

Association Studies ◽

Polygenic Risk Score ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide

AbstractObesity is highly heritable, yet only a small fraction of its heritability has been attributed to specific genetic variants. These variants are traditionally ascertained from genome-wide association studies (GWAS), which utilize samples with tens or hundreds of thousands of individuals for whom a single summary measurement (e.g., BMI) is collected. An alternative approach is to focus on a smaller, more deeply characterized sample in conjunction with advanced statistical models that leverage detailed phenotypes. Here we use novel functional data analysis (FDA) techniques to capitalize on longitudinal growth information and construct a polygenic risk score (PRS) for obesity in children followed from birth to three years of age. This score, comprised of 24 single nucleotide polymorphisms (SNPs), is significantly higher in children with (vs. without) rapid infant weight gain—a predictor of obesity later in life. Using two independent cohorts, we show that genetic variants identified in early childhood are also informative in older children and in adults, consistent with early childhood obesity being predictive of obesity later in life. In contrast, PRSs based on SNPs identified by adult obesity GWAS are not predictive of weight gain in our cohort of children. Our research provides an example of a successful application of FDA to GWAS. We demonstrate that a deep, statistically sophisticated characterization of a longitudinal phenotype can provide increased statistical power to studies with relatively small sample sizes. This study shows how FDA approaches can be used as an alternative to the traditional GWAS.Author SummaryFinding genetic variants that confer an increased risk of developing a particular disease has long been a focus of modern genetics. Genome wide association studies (GWAS) have catalogued single nucleotide polymorphisms (SNPs) associated with a variety of complex diseases in humans, including obesity, but by and large have done so using increasingly large samples-- tens or even hundreds of thousands of individuals, whose phenotypes are thus often only superficially characterized. This, in turn, may hide the intricacies of the genetic influence on disease. GWAS findings are also usually study-population dependent. We found that genetic risk scores based on SNPs from large adult obesity studies are not predictive of the propensity to gain weight in very young children. However, using a small cohort of a few hundred children deeply characterized with growth trajectories between birth and two years, and leveraging such trajectories through novel functional data analysis (FDA) techniques, we were able to produce a strong childhood obesity genetic risk score.

Download Full-text