CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k-mers. These covariates are able to capture polymorphic genes as a single entity, improving k-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies

Genes ◽

10.3390/genes9120608 ◽

2018 ◽

Vol 9 (12) ◽

pp. 608

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Alon Keinan

Keyword(s):

Correlation Coefficient ◽

Statistical Power ◽

Association Studies ◽

Gene Interaction ◽

P Value ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Real World Data ◽

Distance Correlation ◽

The Difference

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.

Download Full-text

Association of CNVs with methylation variation

npj Genomic Medicine ◽

10.1038/s41525-020-00145-w ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Xinghua Shi ◽

Saranya Radhakrishnan ◽

Jia Wen ◽

Jin Yun Chen ◽

Junjie Chen ◽

...

Keyword(s):

Association Studies ◽

Copy Number Variants ◽

Cpg Methylation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Cellular Phenotype ◽

Genome Wide ◽

A Genome ◽

Physical Interactions ◽

Trait Locus

Abstract Germline copy number variants (CNVs) and single-nucleotide polymorphisms (SNPs) form the basis of inter-individual genetic variation. Although the phenotypic effects of SNPs have been extensively investigated, the effects of CNVs is relatively less understood. To better characterize mechanisms by which CNVs affect cellular phenotype, we tested their association with variable CpG methylation in a genome-wide manner. Using paired CNV and methylation data from the 1000 genomes and HapMap projects, we identified genome-wide associations by methylation quantitative trait locus (mQTL) analysis. We found individual CNVs being associated with methylation of multiple CpGs and vice versa. CNV-associated methylation changes were correlated with gene expression. CNV-mQTLs were enriched for regulatory regions, transcription factor-binding sites (TFBSs), and were involved in long-range physical interactions with associated CpGs. Some CNV-mQTLs were associated with methylation of imprinted genes. Several CNV-mQTLs and/or associated genes were among those previously reported by genome-wide association studies (GWASs). We demonstrate that germline CNVs in the genome are associated with CpG methylation. Our findings suggest that structural variation together with methylation may affect cellular phenotype.

Download Full-text

Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Cancer Informatics ◽

10.4137/cin.s17305 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17305 ◽

Cited By ~ 1

Author(s):

Yaping Wang ◽

Donghui Li ◽

Peng Wei

Keyword(s):

Statistical Power ◽

Association Studies ◽

Score Test ◽

Principal Component ◽

Case Control ◽

Degree Of Freedom ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Missing Heritability ◽

Gene Environment

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies

Frontiers in Genetics ◽

10.3389/fgene.2021.672304 ◽

2021 ◽

Vol 12 ◽

Author(s):

Felipe S. Kaibara ◽

Tânia K. de Araujo ◽

Patricia A. O. R. A. Araujo ◽

Marina K. M. Alvim ◽

Clarissa L. Yasuda ◽

...

Keyword(s):

Native Americans ◽

Statistical Power ◽

Association Studies ◽

Snp Array ◽

Absence Epilepsy ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Clonic Seizures ◽

Candidate Regions

Genetic generalized epilepsies (GGEs) include well-established epilepsy syndromes with generalized onset seizures: childhood absence epilepsy, juvenile myoclonic epilepsy (JME), juvenile absence epilepsy (JAE), myoclonic absence epilepsy, epilepsy with eyelid myoclonia (Jeavons syndrome), generalized tonic–clonic seizures, and generalized tonic–clonic seizures alone. Genome-wide association studies (GWASs) and exome sequencing have identified 48 single-nucleotide polymorphisms (SNPs) associated with GGE. However, these studies were mainly based on non-admixed, European, and Asian populations. Thus, it remains unclear whether these results apply to patients of other origins. This study aims to evaluate whether these previous results could be replicated in a cohort of admixed Brazilian patients with GGE. We obtained SNP-array data from 87 patients with GGE, compared with 340 controls from the BIPMed public dataset. We could directly access genotypes of 17 candidate SNPs, available in the SNP array, and the remaining 31 SNPs were imputed using the BEAGLE v5.1 software. We performed an association test by logistic regression analysis, including the first five principal components as covariates. Furthermore, to expand the analysis of the candidate regions, we also interrogated 14,047 SNPs that flank the candidate SNPs (1 Mb). The statistical power was evaluated in terms of odds ratio and minor allele frequency (MAF) by the genpwr package. Differences in SNP frequencies between Brazilian and Europeans, sub-Saharan African, and Native Americans were evaluated by a two-proportion Z-test. We identified nine flanking SNPs, located on eight candidate regions, which presented association signals that passed the Bonferroni correction (rs12726617; rs9428842; rs1915992; rs1464634; rs6459526; rs2510087; rs9551042; rs9888879; and rs8133217; p-values <3.55e–06). In addition, the two-proportion Z-test indicates that the lack of association of the remaining candidate SNPs could be due to different genomic backgrounds observed in admixed Brazilians. This is the first time that candidate SNPs for GGE are analyzed in an admixed Brazilian population, and we could successfully replicate the association signals in eight candidate regions. In addition, our results provide new insights on how we can account for population structure to improve risk stratification estimation in admixed individuals.

Download Full-text

PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy

Animals ◽

10.3390/ani8120239 ◽

2018 ◽

Vol 8 (12) ◽

pp. 239 ◽

Cited By ~ 4

Author(s):

Wengang Zhang ◽

Xue Gao ◽

Xinping Shi ◽

Bo Zhu ◽

Zezhao Wang ◽

...

Keyword(s):

Quantitative Trait ◽

Statistical Power ◽

Muscle Development ◽

Association Studies ◽

Simulated Data ◽

Principal Component ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Multiple Trait ◽

Gwas Analysis

Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.

Download Full-text

Subtype-specific gout susceptibility loci and enrichment of selection pressure on ABCG2 and ALDH2 identified by subtype genome-wide meta-analyses of clinically defined gout patients

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2019-216644 ◽

2020 ◽

Vol 79 (5) ◽

pp. 657-665 ◽

Cited By ~ 1

Author(s):

Akiyoshi Nakayama ◽

Masahiro Nakatochi ◽

Yusuke Kawamura ◽

Ken Yamamoto ◽

Hirofumi Nakaoka ◽

...

Keyword(s):

Selection Pressure ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Susceptibility Loci ◽

Normal Type ◽

Genome Wide ◽

A Genome ◽

Meta Analyses ◽

Pressure Analysis

ObjectivesGenome-wide meta-analyses of clinically defined gout were performed to identify subtype-specific susceptibility loci. Evaluation using selection pressure analysis with these loci was also conducted to investigate genetic risks characteristic of the Japanese population over the last 2000–3000 years.MethodsTwo genome-wide association studies (GWASs) of 3053 clinically defined gout cases and 4554 controls from Japanese males were performed using the Japonica Array and Illumina Array platforms. About 7.2 million single-nucleotide polymorphisms were meta-analysed after imputation. Patients were then divided into four clinical subtypes (the renal underexcretion type, renal overload type, combined type and normal type), and meta-analyses were conducted in the same manner. Selection pressure analyses using singleton density score were also performed on each subtype.ResultsIn addition to the eight loci we reported previously, two novel loci, PIBF1 and ACSM2B, were identified at a genome-wide significance level (p<5.0×10–8) from a GWAS meta-analysis of all gout patients, and other two novel intergenic loci, CD2-PTGFRN and SLC28A3-NTRK2, from normal type gout patients. Subtype-dependent patterns of Manhattan plots were observed with subtype GWASs of gout patients, indicating that these subtype-specific loci suggest differences in pathophysiology along patients’ gout subtypes. Selection pressure analysis revealed significant enrichment of selection pressure on ABCG2 in addition to ALDH2 loci for all subtypes except for normal type gout.ConclusionsOur findings on subtype GWAS meta-analyses and selection pressure analysis of gout will assist elucidation of the subtype-dependent molecular targets and evolutionary involvement among genotype, phenotype and subtype-specific tailor-made medicine/prevention of gout and hyperuricaemia.

Download Full-text

Modeling diseases in multiple mouse strains for precision medicine studies

Physiological Genomics ◽

10.1152/physiolgenomics.00123.2016 ◽

2017 ◽

Vol 49 (3) ◽

pp. 177-179 ◽

Cited By ~ 3

Author(s):

Andrés D. Klein

Keyword(s):

Statistical Power ◽

Association Studies ◽

Phenotypic Variability ◽

Inbred Strains ◽

Mouse Strains ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Number Of Patients

The genetic basis of the phenotypic variability observed in patients can be studied in mice by generating disease models through genetic or chemical interventions in many genetic backgrounds where the clinical phenotypes can be assessed and used for genome-wide association studies (GWAS). This is particularly relevant for rare disorders, where patients sharing identical mutations can present with a wide variety of symptoms, but there are not enough number of patients to ensure statistical power of GWAS. Inbred strains are homozygous for each loci, and their single nucleotide polymorphisms catalogs are known and freely available, facilitating the bioinformatics and reducing the costs of the study, since it is not required to genotype every mouse. This kind of approach can be applied to pharmacogenomics studies as well.

Download Full-text

A Genome-Wide Association Study of Novel Genetic Variants Associated With Anthropometric Traits in Koreans

Frontiers in Genetics ◽

10.3389/fgene.2021.669215 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hye-Won Cho ◽

Hyun-Seok Jin ◽

Yong-Bin Eom

Keyword(s):

Genetic Variants ◽

Genetic Factors ◽

Genome Wide Association Study ◽

Association Studies ◽

Fat Distribution ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

A Genome

Most previous genome-wide association studies (GWAS) have identified genetic variants associated with anthropometric traits. However, most of the evidence were reported in European populations. Anthropometric traits such as height and body fat distribution are significantly affected by gender and genetic factors. Here we performed GWAS involving 64,193 Koreans to identify the genetic factors associated with anthropometric phenotypes including height, weight, body mass index, waist circumference, hip circumference, and waist-to-hip ratio. We found nine novel single-nucleotide polymorphisms (SNPs) and 59 independent genetic signals in genomic regions that were reported previously. Of the 19 SNPs reported previously, eight genetic variants at RP11-513I15.6 and one genetic variant at the RP11-977G19.10 region and six Asian-specific genetic variants were newly found. We compared our findings with those of previous studies in other populations. Five overlapping genetic regions (PAN2, ANKRD52, RNF41, HGMA1, and C6orf106) had been reported previously but none of the SNPs were independently identified in the current study. Seven of the nine newly found novel loci associated with height in women revealed a statistically significant skeletal expression of quantitative trait loci. Our study provides additional insight into the genetic effects of anthropometric phenotypes in East Asians.

Download Full-text

Genome-Wide Association Studies of Somatic Cell Count in the Assaf Breed

Animals ◽

10.3390/ani11061531 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1531

Author(s):

Yasemin Öner ◽

Malena Serrano ◽

Pilar Sarto ◽

Laura Pilar Iguácel ◽

María Piquer-Sabanza ◽

...

Keyword(s):

Somatic Cell ◽

Genome Wide Association Study ◽

Association Studies ◽

Significant Snps ◽

Genome Wide Association ◽

System Response ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

A Genome

A genome-wide association study (GWAS) was performed to identify new single nucleotide polymorphisms (SNPs) and genes associated with mastitis resistance in Assaf sheep by using the Illumina Ovine Infinium® HD SNP BeadChip (680K). In total, 6173 records from 1894 multiparous Assaf ewes with at least three test day records and aged between 2 and 7 years old were used to estimate a corrected phenotype for somatic cell score (SCS). Then, 192 ewes were selected from the top (n = 96) and bottom (n = 96) tails of the corrected SCS phenotype distribution to be used in a GWAS. Although no significant SNPs were found at the genome level, four SNPs (rs419096188, rs415580501, rs410336647, and rs424642424) were significant at the chromosome level (FDR 10%) in two different regions of OAR19. The SNP rs419096188 was located in intron 1 of the NUP210 and close to the HDAC11 genes (61 kb apart), while the other three SNPs were totally linked and located 171 kb apart from the ARPP21 gene. These three genes were related to the immune system response. These results were validated in two SNPs (rs419096188 and rs424642424) in the total population (n = 1894) by Kompetitive Allele-Specific PCR (KASP) genotyping. Furthermore, rs419096188 was also associated with lactose content.

Download Full-text