Common alleles of CMT2 and NRPE1 are major determinants of de novo DNA methylation variation in Arabidopsis thaliana

AbstractDNA cytosine methylation is an epigenetic mark associated with silencing of transposable elements (TEs) and heterochromatin formation. In plants, it occurs in three sequence contexts: CG, CHG, and CHH (where H is A, T, or C). The latter does not allow direct inheritance of methylation during DNA replication due to lack of symmetry, and methylation must therefore be re-established every cell generation. Genome-wide association studies (GWAS) have previously shown that CMT2 and NRPE1 are major determinants of genome-wide patterns of TE CHH-methylation. Here we instead focus on CHH-methylation of individual TEs and TE-families, allowing us to identify the pathways involved in CHH-methylation simply from natural variation and confirm the associations by comparing them with mutant phenotypes. Methylation at TEs targeted by the RNA-directed DNA methylation (RdDM) pathway is unaffected by CMT2 variation, but is strongly affected by variation at NRPE1, which is largely responsible for the longitudinal cline in this phenotype. In contrast, CMT2-targeted TEs are affected by both loci, which jointly explain 7.3% of the phenotypic variation (13.2% of total genetic effects). There is no longitudinal pattern for this phenotype, however, because the geographic patterns appear to compensate for each other in a pattern suggestive of stabilizing selection.Author SummaryDNA methylation is a major component of transposon silencing, and essential for genomic integrity. Recent studies revealed large-scale geographic variation as well as the existence of major trans-acting polymorphisms that partly explained this variation. In this study, we re-analyze previously published data (The 1001 Epigenomes), focusing on de novo DNA methylation patterns of individual TEs and TE families rather than on genome-wide averages (as was done in previous studies). GWAS of the patterns reveals the underlying regulatory networks, and allowed us to comprehensively characterize trans-regulation of de novo DNA methylation and its role in the striking geographic pattern for this phenotype.

Download Full-text

Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging

Genome Biology ◽

10.1186/s13059-021-02398-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Daniel L. McCartney ◽

Josine L. Min ◽

Rebecca C. Richmond ◽

Ake T. Lu ◽

Maria K. Sobczyk ◽

...

Keyword(s):

Dna Methylation ◽

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

Biological Aging ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Biomarkers Of Aging ◽

Genome Wide ◽

Shared Genetic

Abstract Background Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. Results Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. Conclusion This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A description of large-scale metabolomics studies: increasing value by combining metabolomics with genome-wide SNP genotyping and transcriptional profiling

Journal of Endocrinology ◽

10.1530/joe-12-0144 ◽

2012 ◽

Vol 215 (1) ◽

pp. 17-28 ◽

Cited By ~ 18

Author(s):

Georg Homuth ◽

Alexander Teumer ◽

Uwe Völker ◽

Matthias Nauck

Keyword(s):

Blood Cells ◽

Large Scale ◽

Genetic Factors ◽

Association Studies ◽

Transcriptional Profiling ◽

Genome Wide Association Studies ◽

Protein Levels ◽

Future Developments ◽

Genome Wide ◽

Metabolome Data

The metabolome, defined as the reflection of metabolic dynamics derived from parameters measured primarily in easily accessible body fluids such as serum, plasma, and urine, can be considered as the omics data pool that is closest to the phenotype because it integrates genetic influences as well as nongenetic factors. Metabolic traits can be related to genetic polymorphisms in genome-wide association studies, enabling the identification of underlying genetic factors, as well as to specific phenotypes, resulting in the identification of metabolome signatures primarily caused by nongenetic factors. Similarly, correlation of metabolome data with transcriptional or/and proteome profiles of blood cells also produces valuable data, by revealing associations between metabolic changes and mRNA and protein levels. In the last years, the progress in correlating genetic variation and metabolome profiles was most impressive. This review will therefore try to summarize the most important of these studies and give an outlook on future developments.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

Genotypic and Phenotypic Characterization of Lettuce Bacterial Pathogen Xanthomonas hortorum pv. vitians Populations Collected in Quebec, Canada

Agronomy ◽

10.3390/agronomy11122386 ◽

2021 ◽

Vol 11 (12) ◽

pp. 2386

Author(s):

Pierre-Olivier Hébert ◽

Martin Laforest ◽

Dong Xu ◽

Marie Ciotola ◽

Mélanie Cadieux ◽

...

Keyword(s):

Leaf Spot ◽

Genome Wide Association Study ◽

De Novo ◽

Association Studies ◽

Genome Wide Association ◽

Phenotypic Characterization ◽

Genome Wide Association Studies ◽

Bacterial Leaf Spot ◽

Eastern Canada ◽

Genome Wide

Bacterial leaf spot of lettuce, caused by Xanthomonas hortorum pv. vitians, is an economically important disease worldwide. For instance, it caused around 4 million CAD in losses in only a few months during the winter of 1992 in Florida. Because only one pesticide is registered to control this disease in Canada, the development of lettuce cultivars tolerant to bacterial leaf spot remains the most promising approach to reduce the incidence and severity of the disease in lettuce fields. The lack of information about the genetic diversity of the pathogen, however, impairs breeding programs, especially when disease resistance is tested on newly developed lettuce germplasm lines. To evaluate the diversity of X. hortorum pv. vitians, a multilocus sequence analysis was performed on 694 isolates collected in Eastern Canada through the summers of 2014 to 2017 and two isolates in 1996 and 2007. All isolates tested were clustered into five phylogroups. Six pathotypes were identified following pathogenicity tests conducted in greenhouses, but when phylogroups were compared with pathotypes, no correlation could be drawn. However, in vitro production of xanthan and xanthomonadins was investigated, and isolates with higher production of xanthomonadins were generally causing less severe symptoms on the tolerant cultivar Little Gem. Whole-genome sequencing was undertaken for 95 isolates belonging to the pathotypes identified, and de novo assembly made with reads unmapped to the reference strain’s genome sequence resulted in 694 contigs ranging from 128 to 120,795 bp. Variant calling was performed prior to genome-wide association studies computed with single-nucleotide polymorphisms (SNPs), copy-number variants and gaps. Polymorphisms with significant p-values were only found on the cultivar Little Gem. Our results allowed molecular identification of isolates likely to cause bacterial leaf spot of lettuce, using two SNPs identified through genome-wide association study.

Download Full-text

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID

PLoS Genetics ◽

10.1371/journal.pgen.1009315 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1009315

Author(s):

Ardalan Naseri ◽

Junjie Shi ◽

Xihong Lin ◽

Shaojie Zhang ◽

Degui Zhi

Keyword(s):

Large Scale ◽

Association Studies ◽

Scale Up ◽

Data Driven ◽

Genome Wide Association Studies ◽

Inference Method ◽

Genome Wide ◽

Familial Relationship ◽

Kinship Coefficients ◽

Data Driven Approach

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

Download Full-text

Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease

Nature Communications ◽

10.1038/s41467-019-12228-z ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 14

Author(s):

Tianxiao Huan ◽

Roby Joehanes ◽

Ci Song ◽

Fen Peng ◽

Yichen Guo ◽

...

Keyword(s):

Cardiovascular Disease ◽

Dna Methylation ◽

Whole Blood ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Transcriptional Regulatory ◽

Disease Associations ◽

Independent Replication ◽

Functional Mechanisms

Abstract Identifying methylation quantitative trait loci (meQTLs) and integrating them with disease-associated variants from genome-wide association studies (GWAS) may illuminate functional mechanisms underlying genetic variant-disease associations. Here, we perform GWAS of >415 thousand CpG methylation sites in whole blood from 4170 individuals and map 4.7 million cis- and 630 thousand trans-meQTL variants targeting >120 thousand CpGs. Independent replication is performed in 1347 participants from two studies. By linking cis-meQTL variants with GWAS results for cardiovascular disease (CVD) traits, we identify 92 putatively causal CpGs for CVD traits by Mendelian randomization analysis. Further integrating gene expression data reveals evidence of cis CpG-transcript pairs causally linked to CVD. In addition, we identify 22 trans-meQTL hotspots each targeting more than 30 CpGs and find that trans-meQTL hotspots appear to act in cis on expression of nearby transcriptional regulatory genes. Our findings provide a powerful meQTL resource and shed light on DNA methylation involvement in human diseases.

Download Full-text

Genetics of juvenile rheumatic diseases

10.1093/med/9780199642489.003.0043_update_002 ◽

2015 ◽

Author(s):

Anne Hinks ◽

Wendy Thomson

Keyword(s):

Risk Factors ◽

Rheumatic Diseases ◽

Large Scale ◽

Association Studies ◽

Genetic Diseases ◽

Response To Treatment ◽

Genome Wide Association Studies ◽

Established Risk Factor ◽

Genome Wide ◽

Juvenile Rheumatic Diseases

Juvenile rheumatic diseases are heterogeneous, complex genetic diseases; to date only juvenile idiopathic arthritis (JIA) has been extensively studied in terms of identifying genetic risk factors. The MHC region is a well-established risk factor but in the last few years candidate gene and large-scale genome-wide association studies have been utilized in the search for non-HLA risk factors. There are now 17 JIA susceptibility loci which reach the genome-wide significance threshold for association and a further 7 regions with evidence for association in more than one study. In addition, some subtype-specific associations are emerging. These risk loci now need to be investigated further using fine-mapping strategies and then appropriate functional studies to show how the variant alters the gene function. This knowledge will not only lead to a better understanding of disease pathogenesis for juvenile rheumatic diseases but may also aid in the classification of these heterogeneous diseases. It may identify new pathways for potential therapeutic targets and help in the prediction of disease outcome and response to treatment.

Download Full-text

Understanding the genetic determinants of the brain with MOSTest

Nature Communications ◽

10.1038/s41467-020-17368-1 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 3

Author(s):

Dennis van der Meer ◽

Oleksandr Frei ◽

Tobias Kaufmann ◽

Alexey A. Shadrin ◽

Anna Devor ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Computational Design ◽

Brain Regions ◽

Brain Morphology ◽

Genome Wide Association Studies ◽

Small Individual ◽

Significance Threshold ◽

Regional Brain ◽

Genome Wide

Abstract Regional brain morphology has a complex genetic architecture, consisting of many common polymorphisms with small individual effects. This has proven challenging for genome-wide association studies (GWAS). Due to the distributed nature of genetic signal across brain regions, multivariate analysis of regional measures may enhance discovery of genetic variants. Current multivariate approaches to GWAS are ill-suited for complex, large-scale data of this kind. Here, we introduce the Multivariate Omnibus Statistical Test (MOSTest), with an efficient computational design enabling rapid and reliable inference, and apply it to 171 regional brain morphology measures from 26,502 UK Biobank participants. At the conventional genome-wide significance threshold of α = 5 × 10−8, MOSTest identifies 347 genomic loci associated with regional brain morphology, more than any previous study, improving upon the discovery of established GWAS approaches more than threefold. Our findings implicate more than 5% of all protein-coding genes and provide evidence for gene sets involved in neuron development and differentiation.

Download Full-text

A Review of the Hereditary Component of Triple Negative Breast Cancer: High- and Moderate-Penetrance Breast Cancer Genes, Low-Penetrance Loci, and the Role of Nontraditional Genetic Elements

Journal of Oncology ◽

10.1155/2019/4382606 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 11

Author(s):

Darrell L. Ellsworth ◽

Clesson E. Turner ◽

Rachel E. Ellsworth

Keyword(s):

Breast Cancer ◽

Triple Negative Breast Cancer ◽

Large Scale ◽

Triple Negative ◽

Association Studies ◽

African Ancestry ◽

Genome Wide Association Studies ◽

Genetic Elements ◽

Genome Wide ◽

Increased Risk

Triple negative breast cancer (TNBC), representing 10-15% of breast tumors diagnosed each year, is a clinically defined subtype of breast cancer associated with poor prognosis. The higher incidence of TNBC in certain populations such as young women and/or women of African ancestry and a unique pathological phenotype shared between TNBC and BRCA1-deficient tumors suggest that TNBC may be inherited through germline mutations. In this article, we describe genes and genetic elements, beyond BRCA1 and BRCA2, which have been associated with increased risk of TNBC. Multigene panel testing has identified high- and moderate-penetrance cancer predisposition genes associated with increased risk for TNBC. Development of large-scale genome-wide SNP assays coupled with genome-wide association studies (GWAS) has led to the discovery of low-penetrance TNBC-associated loci. Next-generation sequencing has identified variants in noncoding RNAs, viral integration sites, and genes in underexplored regions of the human genome that may contribute to the genetic underpinnings of TNBC. Advances in our understanding of the genetics of TNBC are driving improvements in risk assessment and patient management.

Download Full-text