PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy

Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.

Download Full-text

Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Cancer Informatics ◽

10.4137/cin.s17305 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17305 ◽

Cited By ~ 1

Author(s):

Yaping Wang ◽

Donghui Li ◽

Peng Wei

Keyword(s):

Statistical Power ◽

Association Studies ◽

Score Test ◽

Principal Component ◽

Case Control ◽

Degree Of Freedom ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Missing Heritability ◽

Gene Environment

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies

Genes ◽

10.3390/genes9120608 ◽

2018 ◽

Vol 9 (12) ◽

pp. 608

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Alon Keinan

Keyword(s):

Correlation Coefficient ◽

Statistical Power ◽

Association Studies ◽

Gene Interaction ◽

P Value ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Real World Data ◽

Distance Correlation ◽

The Difference

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.

Download Full-text

Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies

Frontiers in Genetics ◽

10.3389/fgene.2021.672304 ◽

2021 ◽

Vol 12 ◽

Author(s):

Felipe S. Kaibara ◽

Tânia K. de Araujo ◽

Patricia A. O. R. A. Araujo ◽

Marina K. M. Alvim ◽

Clarissa L. Yasuda ◽

...

Keyword(s):

Native Americans ◽

Statistical Power ◽

Association Studies ◽

Snp Array ◽

Absence Epilepsy ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Clonic Seizures ◽

Candidate Regions

Genetic generalized epilepsies (GGEs) include well-established epilepsy syndromes with generalized onset seizures: childhood absence epilepsy, juvenile myoclonic epilepsy (JME), juvenile absence epilepsy (JAE), myoclonic absence epilepsy, epilepsy with eyelid myoclonia (Jeavons syndrome), generalized tonic–clonic seizures, and generalized tonic–clonic seizures alone. Genome-wide association studies (GWASs) and exome sequencing have identified 48 single-nucleotide polymorphisms (SNPs) associated with GGE. However, these studies were mainly based on non-admixed, European, and Asian populations. Thus, it remains unclear whether these results apply to patients of other origins. This study aims to evaluate whether these previous results could be replicated in a cohort of admixed Brazilian patients with GGE. We obtained SNP-array data from 87 patients with GGE, compared with 340 controls from the BIPMed public dataset. We could directly access genotypes of 17 candidate SNPs, available in the SNP array, and the remaining 31 SNPs were imputed using the BEAGLE v5.1 software. We performed an association test by logistic regression analysis, including the first five principal components as covariates. Furthermore, to expand the analysis of the candidate regions, we also interrogated 14,047 SNPs that flank the candidate SNPs (1 Mb). The statistical power was evaluated in terms of odds ratio and minor allele frequency (MAF) by the genpwr package. Differences in SNP frequencies between Brazilian and Europeans, sub-Saharan African, and Native Americans were evaluated by a two-proportion Z-test. We identified nine flanking SNPs, located on eight candidate regions, which presented association signals that passed the Bonferroni correction (rs12726617; rs9428842; rs1915992; rs1464634; rs6459526; rs2510087; rs9551042; rs9888879; and rs8133217; p-values <3.55e–06). In addition, the two-proportion Z-test indicates that the lack of association of the remaining candidate SNPs could be due to different genomic backgrounds observed in admixed Brazilians. This is the first time that candidate SNPs for GGE are analyzed in an admixed Brazilian population, and we could successfully replicate the association signals in eight candidate regions. In addition, our results provide new insights on how we can account for population structure to improve risk stratification estimation in admixed individuals.

Download Full-text

CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

10.1101/2021.11.05.467462 ◽

2021 ◽

Author(s):

Hector Roux de Bezieux ◽

Leandro Lima ◽

Fanny Perraudeau ◽

Arnaud Mary ◽

Sandrine Dudoit ◽

...

Keyword(s):

Statistical Power ◽

Association Studies ◽

Bacterial Species ◽

De Bruijn Graph ◽

Testable Hypothesis ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

A Genome ◽

De Bruijn ◽

Connected Subgraphs

Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k-mers. These covariates are able to capture polymorphic genes as a single entity, improving k-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb.

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modeling diseases in multiple mouse strains for precision medicine studies

Physiological Genomics ◽

10.1152/physiolgenomics.00123.2016 ◽

2017 ◽

Vol 49 (3) ◽

pp. 177-179 ◽

Cited By ~ 3

Author(s):

Andrés D. Klein

Keyword(s):

Statistical Power ◽

Association Studies ◽

Phenotypic Variability ◽

Inbred Strains ◽

Mouse Strains ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Number Of Patients

The genetic basis of the phenotypic variability observed in patients can be studied in mice by generating disease models through genetic or chemical interventions in many genetic backgrounds where the clinical phenotypes can be assessed and used for genome-wide association studies (GWAS). This is particularly relevant for rare disorders, where patients sharing identical mutations can present with a wide variety of symptoms, but there are not enough number of patients to ensure statistical power of GWAS. Inbred strains are homozygous for each loci, and their single nucleotide polymorphisms catalogs are known and freely available, facilitating the bioinformatics and reducing the costs of the study, since it is not required to genotype every mouse. This kind of approach can be applied to pharmacogenomics studies as well.

Download Full-text

Statistical power in genome-wide association studies and quantitative trait locus mapping

Heredity ◽

10.1038/s41437-019-0205-3 ◽

2019 ◽

Vol 123 (3) ◽

pp. 287-306 ◽

Cited By ~ 7

Author(s):

Meiyue Wang ◽

Shizhong Xu

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait Locus Mapping ◽

Quantitative Trait ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Trait Locus ◽

Locus Mapping

Download Full-text

Molecular Characterization of Global Finger Millet (Eleusine coracana, L. Gaertn) germplasm Reaction to Striga in Kenya

Asian Journal of Biochemistry, Genetics and Molecular Biology ◽

10.9734/ajbgmb/2018/v1i2493 ◽

2018 ◽

pp. 1-14

Author(s):

Sirengo Peter Nyongesa ◽

Wamalwa Dennis Simiyu ◽

Oduor Chrispus ◽

Odeny Damaris Achieng ◽

Dangasuk Otto George

Keyword(s):

Finger Millet ◽

Association Studies ◽

Block Design ◽

Principal Component ◽

Genotyping By Sequencing ◽

Eleusine Coracana ◽

Striga Hermonthica ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Striga Resistance

Finger millet (Eleusine coracana, L. Gaertn) is an important food crop in Africa and Asia. The parasitic weed Striga hermonthica (Del.) Benth limits finger millet production through reduced yield in agro-ecologies where they exist. The damage of Striga to cereal crops is more severe under drought and low soil fertility. This study aims to determine genetic basis for reaction to Striga hermonthica among the selected germplasm of finger millets through genotyping by sequencing (GBS). One hundred finger millet genotypes were evaluated for reaction to Striga hermonthica infestation under field conditions at Alupe and Kibos in Western Kenya. The experiment was laid out in a randomized complete block design (RCBD) consisting of 10 x 10 square (triple lattice) under Striga (inoculated) and no Striga conditions and plant growth monitored to maturity after 110 days. All genotypes were genotyped by genotyping by sequencing (GBS) and data analyzed using the non-reference based Universal Network Enabled Analysis Kit (UNEAK) pipeline. Genome wide association studies (GWAS) were done to establish the association of detected Single Nucleotide Polymorphisms (SNPs) with Striga reaction based on field results. In molecular analysis 117,542 SNPs from raw GBS data used in GWAS revealed that markers TP 85424 and TP 88244 were associated with Striga resistance in the 95 genotypes. Principal Component Analysis revealed that the first and third component axes accounted for 2.5 and 8% of total variance respectively and the genotypes were distributed according to their reaction to Striga weed. Genetic diversity analysis grouped the 95 accessions into three major clusters containing; 32 (A), 56 (B), and 7 (C) genotypes. All finger millet genotypes that showed high resistance to Striga in the field were from cluster B while the most susceptible genotypes were from clusters A and C. Results revealed genetic variation for Striga resistance in cultivated finger millet genotypes and hence the possibility of marker –assisted breeding for resistance to Striga.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

10.1101/2020.07.02.184028 ◽

2020 ◽

Author(s):

Robin N. Beaumont ◽

Isabelle K. Mayne ◽

Rachel M. Freathy ◽

Caroline F. Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Fetal Effect ◽

Common Genetic Variants ◽

Causal Genes

AbstractBirth weight is an important factor in newborn and infant survival, and both low and high birth weights are associated with adverse later life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with either maternal or fetal effects on birth weight. Knowledge of the underlying causal genes and pathways is crucial to understand how these loci influence birth weight, and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme upper or lower ends of the normal distribution, and genes implicated in those syndromes may provide valuable information to help prioritise candidate genes at GWAS loci. We examined the proximity of genes implicated in developmental disorders to birth weight GWAS loci at which a fetal effect is either likely or cannot be ruled out. We used simulations to test whether those genes fall disproportionately close to the GWAS loci. We found that birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected by chance. This is the case both when the developmental disorder gene is the nearest gene to the birth weight SNP and also when examining all genes within 258kb of the SNP. This enrichment was driven by genes that cause monogenic developmental disorders with dominant modes of inheritance. We found several examples of SNPs located in the intron of one gene that mark plausible effects via different nearby genes implicated in monogenic short stature, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight loci, which has helped identify GWAS loci likely to have direct fetal effects on birth weight which could not previously be classified as fetal or maternal due to insufficient statistical power.

Download Full-text