Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

AbstractBirth weight is an important factor in newborn and infant survival, and both low and high birth weights are associated with adverse later life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with either maternal or fetal effects on birth weight. Knowledge of the underlying causal genes and pathways is crucial to understand how these loci influence birth weight, and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme upper or lower ends of the normal distribution, and genes implicated in those syndromes may provide valuable information to help prioritise candidate genes at GWAS loci. We examined the proximity of genes implicated in developmental disorders to birth weight GWAS loci at which a fetal effect is either likely or cannot be ruled out. We used simulations to test whether those genes fall disproportionately close to the GWAS loci. We found that birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected by chance. This is the case both when the developmental disorder gene is the nearest gene to the birth weight SNP and also when examining all genes within 258kb of the SNP. This enrichment was driven by genes that cause monogenic developmental disorders with dominant modes of inheritance. We found several examples of SNPs located in the intron of one gene that mark plausible effects via different nearby genes implicated in monogenic short stature, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight loci, which has helped identify GWAS loci likely to have direct fetal effects on birth weight which could not previously be classified as fetal or maternal due to insufficient statistical power.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies

Genes ◽

10.3390/genes9120608 ◽

2018 ◽

Vol 9 (12) ◽

pp. 608

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Alon Keinan

Keyword(s):

Correlation Coefficient ◽

Statistical Power ◽

Association Studies ◽

Gene Interaction ◽

P Value ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Real World Data ◽

Distance Correlation ◽

The Difference

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.

Download Full-text

Genetics of early growth traits

Human Molecular Genetics ◽

10.1093/hmg/ddaa149 ◽

2020 ◽

Vol 29 (R1) ◽

pp. R66-R72

Author(s):

Diana L Cousminer ◽

Rachel M Freathy

Keyword(s):

Birth Weight ◽

Health Outcomes ◽

Fetal Growth ◽

Association Studies ◽

Later Life ◽

Early Growth ◽

Mr Studies ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genetic Contributions

Abstract In recent years, genome-wide association studies have shed light on the genetics of early growth and its links with later-life health outcomes. Large-scale datasets and meta-analyses, combined with recently developed analytical methods, have enabled dissection of the maternal and fetal genetic contributions to variation in birth weight. Additionally, longitudinal approaches have shown differences between the genetic contributions to infant, childhood and adult adiposity. In contrast, studies of adult height loci have shown strong associations with early body length and childhood height. Early growth-associated loci provide useful tools for causal analyses: Mendelian randomization (MR) studies have provided evidence that early BMI and height are causally related to a number of adult health outcomes. We advise caution in the design and interpretation of MR studies of birth weight investigating effects of fetal growth on later-life cardiometabolic disease because birth weight is only a crude indicator of fetal growth, and the choice of genetic instrument (maternal or fetal) will greatly influence the interpretation of the results. Most genetic studies of early growth have to date centered on European-ancestry participants and outcomes measured at a single time-point, so key priorities for future studies of early growth genetics are aggregation of large samples of diverse ancestries and longitudinal studies of growth trajectories.

Download Full-text

Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Cancer Informatics ◽

10.4137/cin.s17305 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17305 ◽

Cited By ~ 1

Author(s):

Yaping Wang ◽

Donghui Li ◽

Peng Wei

Keyword(s):

Statistical Power ◽

Association Studies ◽

Score Test ◽

Principal Component ◽

Case Control ◽

Degree Of Freedom ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Missing Heritability ◽

Gene Environment

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies

Frontiers in Genetics ◽

10.3389/fgene.2021.672304 ◽

2021 ◽

Vol 12 ◽

Author(s):

Felipe S. Kaibara ◽

Tânia K. de Araujo ◽

Patricia A. O. R. A. Araujo ◽

Marina K. M. Alvim ◽

Clarissa L. Yasuda ◽

...

Keyword(s):

Native Americans ◽

Statistical Power ◽

Association Studies ◽

Snp Array ◽

Absence Epilepsy ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Clonic Seizures ◽

Candidate Regions

Genetic generalized epilepsies (GGEs) include well-established epilepsy syndromes with generalized onset seizures: childhood absence epilepsy, juvenile myoclonic epilepsy (JME), juvenile absence epilepsy (JAE), myoclonic absence epilepsy, epilepsy with eyelid myoclonia (Jeavons syndrome), generalized tonic–clonic seizures, and generalized tonic–clonic seizures alone. Genome-wide association studies (GWASs) and exome sequencing have identified 48 single-nucleotide polymorphisms (SNPs) associated with GGE. However, these studies were mainly based on non-admixed, European, and Asian populations. Thus, it remains unclear whether these results apply to patients of other origins. This study aims to evaluate whether these previous results could be replicated in a cohort of admixed Brazilian patients with GGE. We obtained SNP-array data from 87 patients with GGE, compared with 340 controls from the BIPMed public dataset. We could directly access genotypes of 17 candidate SNPs, available in the SNP array, and the remaining 31 SNPs were imputed using the BEAGLE v5.1 software. We performed an association test by logistic regression analysis, including the first five principal components as covariates. Furthermore, to expand the analysis of the candidate regions, we also interrogated 14,047 SNPs that flank the candidate SNPs (1 Mb). The statistical power was evaluated in terms of odds ratio and minor allele frequency (MAF) by the genpwr package. Differences in SNP frequencies between Brazilian and Europeans, sub-Saharan African, and Native Americans were evaluated by a two-proportion Z-test. We identified nine flanking SNPs, located on eight candidate regions, which presented association signals that passed the Bonferroni correction (rs12726617; rs9428842; rs1915992; rs1464634; rs6459526; rs2510087; rs9551042; rs9888879; and rs8133217; p-values <3.55e–06). In addition, the two-proportion Z-test indicates that the lack of association of the remaining candidate SNPs could be due to different genomic backgrounds observed in admixed Brazilians. This is the first time that candidate SNPs for GGE are analyzed in an admixed Brazilian population, and we could successfully replicate the association signals in eight candidate regions. In addition, our results provide new insights on how we can account for population structure to improve risk stratification estimation in admixed individuals.

Download Full-text

Lipid associated polygenic enrichment in Alzheimer’s disease

10.1101/383844 ◽

2018 ◽

Author(s):

Iris J. Broce ◽

Chin Hong Tan ◽

Chun Chieh Fan ◽

Aree Witoelar ◽

Natalie Wen ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Genetic Variants ◽

Plasma Lipids ◽

Association Studies ◽

Density Lipoprotein ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genetic Pleiotropy ◽

Common Genetic Variants

ABSTRACTCardiovascular (CV) and lifestyle associated risk factors (RFs) are increasingly recognized as important for Alzheimer’s disease (AD) pathogenesis. Beyond the ∊4 allele of apolipoprotein E (APOE), comparatively little is known about whether CV associated genes also increase risk for AD (genetic pleiotropy). Using large genome-wide association studies (GWASs) (total n > 500,000 cases and controls) and validated tools to quantify genetic pleiotropy, we systematically identified single nucleotide polymorphisms (SNPs) jointly associated with AD and one or more CV RFs, namely body mass index (BMI), type 2 diabetes (T2D), coronary artery disease (CAD), waist hip ratio (WHR), total cholesterol (TC), low-density (LDL) and high-density lipoprotein (HDL). In fold enrichment plots, we observed robust genetic enrichment in AD as a function of plasma lipids (TC, LDL, and HDL); we found minimal AD genetic enrichment conditional on BMI, T2D, CAD, and WHR. Beyond APOE, at conjunction FDR < 0.05 we identified 57 SNPs on 19 different chromosomes that were jointly associated with AD and CV outcomes including APOA4, ABCA1, ABCG5, LIPG, and MTCH2/SPI1. We found that common genetic variants influencing AD are associated with multiple CV RFs, at times with a different directionality of effect. Expression of these AD/CV pleiotropic genes was enriched for lipid metabolism processes, over-represented within astrocytes and vascular structures, highly co-expressed, and differentially altered within AD brains. Beyond APOE, we show that the polygenic component of AD is enriched for lipid associated RFs. Rather than a single causal link between genetic loci, RF and the outcome, we found that common genetic variants influencing AD are associated with multiple CV RFs. Our collective findings suggest that a network of genes involved in lipid biology also influence Alzheimer’s risk.

Download Full-text

CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

10.1101/2021.11.05.467462 ◽

2021 ◽

Author(s):

Hector Roux de Bezieux ◽

Leandro Lima ◽

Fanny Perraudeau ◽

Arnaud Mary ◽

Sandrine Dudoit ◽

...

Keyword(s):

Statistical Power ◽

Association Studies ◽

Bacterial Species ◽

De Bruijn Graph ◽

Testable Hypothesis ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

A Genome ◽

De Bruijn ◽

Connected Subgraphs

Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k-mers. These covariates are able to capture polymorphic genes as a single entity, improving k-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb.

Download Full-text

Identifying Thyroid Carcinoma-Related Genes by Integrating GWAS and eQTL Data

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.645275 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fei Shen ◽

Xiaoxiong Gan ◽

Ruiying Zhong ◽

Jianhua Feng ◽

Zhen Chen ◽

...

Keyword(s):

Thyroid Cancer ◽

Thyroid Carcinoma ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Cancer Pathogenesis ◽

Genome Wide ◽

Causal Genes ◽

Eqtl Data

Thyroid carcinoma (TC) is the most common endocrine malignancy. The incidence rate of thyroid cancer has increased rapidly in recent years. The occurrence and development of thyroid cancers are highly related to the massive genetic and epigenetic changes. Therefore, it is essential to explore the mechanism of thyroid cancer pathogenesis. Genome-Wide Association Studies (GWAS) have been widely used in various diseases. Researchers have found multiple single nucleotide polymorphisms (SNPs) are significantly related to TC. However, the biological mechanism of these SNPs is still unknown. In this paper, we used one GWAS dataset and two eQTL datasets, and integrated GWAS with expression quantitative trait loci (eQTL) in both thyroid and blood to explore the mechanism of mutations and causal genes of thyroid cancer. Finally, we found rs1912998 regulates the expression of IGFALS (P = 1.70E-06) and HAGH (P = 5.08E-07) in thyroid, which is significantly related to thyroid cancer. In addition, KEGG shows that these genes participate in multiple thyroid cancer-related pathways.

Download Full-text

PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy

Animals ◽

10.3390/ani8120239 ◽

2018 ◽

Vol 8 (12) ◽

pp. 239 ◽

Cited By ~ 4

Author(s):

Wengang Zhang ◽

Xue Gao ◽

Xinping Shi ◽

Bo Zhu ◽

Zezhao Wang ◽

...

Keyword(s):

Quantitative Trait ◽

Statistical Power ◽

Muscle Development ◽

Association Studies ◽

Simulated Data ◽

Principal Component ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Multiple Trait ◽

Gwas Analysis

Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.

Download Full-text

Modeling diseases in multiple mouse strains for precision medicine studies

Physiological Genomics ◽

10.1152/physiolgenomics.00123.2016 ◽

2017 ◽

Vol 49 (3) ◽

pp. 177-179 ◽

Cited By ~ 3

Author(s):

Andrés D. Klein

Keyword(s):

Statistical Power ◽

Association Studies ◽

Phenotypic Variability ◽

Inbred Strains ◽

Mouse Strains ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Number Of Patients

The genetic basis of the phenotypic variability observed in patients can be studied in mice by generating disease models through genetic or chemical interventions in many genetic backgrounds where the clinical phenotypes can be assessed and used for genome-wide association studies (GWAS). This is particularly relevant for rare disorders, where patients sharing identical mutations can present with a wide variety of symptoms, but there are not enough number of patients to ensure statistical power of GWAS. Inbred strains are homozygous for each loci, and their single nucleotide polymorphisms catalogs are known and freely available, facilitating the bioinformatics and reducing the costs of the study, since it is not required to genotype every mouse. This kind of approach can be applied to pharmacogenomics studies as well.

Download Full-text