scholarly journals Cross-Species association statistics for genome-wide studies of host and parasite polymorphism data

2019 ◽  
Author(s):  
Hanna Märkle ◽  
Aurélien Tellier ◽  
Sona John

AbstractUncovering the genes governing host-parasite coevolution is of importance for disease management in agriculture and human medicine. The availability of increasing amounts of host and parasite full genome-data in recent times allows to perform cross-species genome-wide association studies based on sampling of genomic data of infected hosts and their associated parasites strains. We aim to understand the statistical power of such approaches. We develop two indices, the cross species association (CSA) and the cross species prevalence (CSP), the latter additionally incorporating genomic data from uninfected hosts. For both indices, we derive genome-wide significance thresholds by computing their expected distribution over unlinked neutral loci, i.e. those not involved in determining the outcome of interaction. Using a population genetics and an epidemiological coevolutionary model, we demonstrate that the statistical power of these indices to pinpoint the interacting loci in full genome data varies over time. This is due to the underlying GxG interactions and the coevolutionary dynamics. Under trench-warfare dynamics, CSA and CSP are very accurate in finding out the loci under coevolution, while under arms-race dynamics the power is limited especially under a gene-for-gene interaction. Furthermore, we reveal that the combination of both indices across time samples can be used to estimate the asymmetry of the underlying infection matrix. Our results provide novel insights into the power and biological interpretation of cross-species association studies using samples from natural populations or controlled experiments.

2021 ◽  
Author(s):  
Robin N Beaumont ◽  
Isabelle K Mayne ◽  
Rachel M Freathy ◽  
Caroline F Wright

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.


2019 ◽  
Vol 116 (4) ◽  
pp. 1195-1200 ◽  
Author(s):  
Daniel J. Wilson

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


2010 ◽  
Vol 49 (06) ◽  
pp. 625-631
Author(s):  
H. Schäfer ◽  
B. H. Greene

Summary Background: Genome-wide association studies (GWAS) have been used successfully to identify genetic loci associated with complex diseases and phenotypes. Often this association takes the form of several significant signals (such as small p-values) in a univariate analysis at various markers within a single genetic region. Once confirmed, these associations lead to the question if a single marker tags the association signal of another, functionally relevant variant or if the single marker tags a functionally relevant haplo-type. To deal with this question, methods for family data based on logistic regression, adaptations of the transmission/disequilibrium test (TDT) or weighted haplotype likelihood (WHL) methods have been proposed in the literature. Objectives: Objectives were to examine the effect of parameters such as sample size, inheritance model, and the effects of linkage disequilibrium (LD) in the region on the ability of a selection of methods to detect an independent effect from an additional locus. Methods: All methods tested were applied to simulated genetic data of trios comprising a single affected offspring and two parents. Results: While regression-based methods have advantages such as model flexibility, potentially increasing power, the WHL method was more robust against increasing LD in the scenarios analyzed. Conclusions: Simulation results suggest that the regression and WHL methods are better able with regard to statistical power than the adaptation of the TDT analyzed here to detect genetic effects at an additional locus while controlling for confounding at another locus.


BMC Biology ◽  
2014 ◽  
Vol 12 (1) ◽  
Author(s):  
Meng Li ◽  
Xiaolei Liu ◽  
Peter Bradbury ◽  
Jianming Yu ◽  
Yuan-Ming Zhang ◽  
...  

Science ◽  
2018 ◽  
Vol 360 (6395) ◽  
pp. eaap8757 ◽  
Author(s):  
◽  
Verneri Anttila ◽  
Brendan Bulik-Sullivan ◽  
Hilary K. Finucane ◽  
Raymond K. Walters ◽  
...  

Disorders of the brain can exhibit considerable epidemiological comorbidity and often share symptoms, provoking debate about their etiologic overlap. We quantified the genetic sharing of 25 brain disorders from genome-wide association studies of 265,218 patients and 784,643 control participants and assessed their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders share common variant risk, whereas neurological disorders appear more distinct from one another and from the psychiatric disorders. We also identified significant sharing between disorders and a number of brain phenotypes, including cognitive measures. Further, we conducted simulations to explore how statistical power, diagnostic misclassification, and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a risk factor for brain disorders and the value of heritability-based methods in understanding their etiology.


2021 ◽  
Author(s):  
Gaëlle Munsch ◽  
Louisa Goumidi ◽  
Astrid van Hylckama Vlieg ◽  
Manal Ibrahim-Kosta ◽  
Maria Bruzelius ◽  
...  

In studies of time-to-events, it is common to collect information about events that occurred before the inclusion in a prospective cohort. In an ambispective design, when the risk factors studied are independent of time, including both pre- and post-inclusion events in the analyses increases the statistical power but may lead to a selection bias. To avoid such a bias, we propose a survival analysis weighted by the inverse of the survival probability at the time of data collection about the events. This method is applied to the study of the association of ABO blood groups with the risk of venous thromboembolism (VT) recurrence in the MARTHA and MEGA cohorts. The former relying on an ambispective design and the latter on a standard prospective one. In the combined sample totalling 2,752 patients including 993 recurrences, compared with the O1 group, A1 has an increased risk (Hazard Ratio (HR) of 1.18, p=4.2x10-3), homogeneously in MARTHA and in MEGA. The same trend (HR=1.19, p=0.06) was observed for the less frequent A2 group. In conclusion, this work clarified the association of ABO blood groups with the risk of VT recurrence. Besides, the methodology proposed here to analyse time-independent risk factors of events in an ambispective design has an immediate field of application in the context of genome wide association studies.


Sign in / Sign up

Export Citation Format

Share Document