Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis, is imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model we re-analysed clinical and genetic data from 2,220 Kenyan children with clinically defined severe malaria and 3,940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.

Download Full-text

Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision

10.1101/2021.04.16.440107 ◽

2021 ◽

Author(s):

James A Watson ◽

Carolyne M Ndila ◽

Sophie Uyoga ◽

Alex W Macharia ◽

Gideon Nyutu ◽

...

Keyword(s):

Severe Malaria ◽

Genetic Association ◽

Statistical Power ◽

Association Studies ◽

Genetic Association Studies ◽

Diagnostic Model ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

False Discovery Rates ◽

Population Controls

Download Full-text

Fine scale human genetic structure in three regions of Cameroon reveals episodic diversifying selection

Scientific Reports ◽

10.1038/s41598-020-79124-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kevin K. Esoh ◽

Tobias O. Apinjoh ◽

Steven G. Nyanjom ◽

Ambroise Wonkam ◽

Emile R. Chimusa ◽

...

Keyword(s):

Genetic Structure ◽

Ethnic Groups ◽

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Genomic Region ◽

Sub Saharan Africa ◽

Genome Wide Association Studies ◽

Fine Scale ◽

Sub Saharan

AbstractInferences from genetic association studies rely largely on the definition and description of the underlying populations that highlight their genetic similarities and differences. The clustering of human populations into subgroups (population structure) can significantly confound disease associations. This study investigated the fine-scale genetic structure within Cameroon that may underlie disparities observed with Cameroonian ethnicities in malaria genome-wide association studies in sub-Saharan Africa. Genotype data of 1073 individuals from three regions and three ethnic groups in Cameroon were analyzed using measures of genetic proximity to ascertain fine-scale genetic structure. Model-based clustering revealed distinct ancestral proportions among the Bantu, Semi-Bantu and Foulbe ethnic groups, while haplotype-based coancestry estimation revealed possible longstanding and ongoing sympatric differentiation among individuals of the Foulbe ethnic group, and their Bantu and Semi-Bantu counterparts. A genome scan found strong selection signatures in the HLA gene region, confirming longstanding knowledge of natural selection on this genomic region in African populations following immense disease pressure. Signatures of selection were also observed in the HBB gene cluster, a genomic region known to be under strong balancing selection in sub-Saharan Africa due to its co-evolution with malaria. This study further supports the role of evolution in shaping genomes of Cameroonian populations and reveals fine-scale hierarchical structure among and within Cameroonian ethnicities that may impact genetic association studies in the country.

Download Full-text

Sample Size and Statistical Power Calculation in Genetic Association Studies

Genomics & Informatics ◽

10.5808/gi.2012.10.2.117 ◽

2012 ◽

Vol 10 (2) ◽

pp. 117 ◽

Cited By ~ 227

Author(s):

Eun Pyo Hong ◽

Ji Wan Park

Keyword(s):

Sample Size ◽

Genetic Association ◽

Statistical Power ◽

Association Studies ◽

Genetic Association Studies ◽

Power Calculation ◽

Statistical Power Calculation

Download Full-text

Efficient estimation of disease odds ratios for follow-up genetic association studies

Statistical Methods in Medical Research ◽

10.1177/0962280217741771 ◽

2017 ◽

Vol 28 (7) ◽

pp. 1927-1941

Author(s):

Jiyuan Hu ◽

Wei Zhang ◽

Xinmin Li ◽

Dongdong Pan ◽

Qizhai Li

Keyword(s):

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

Efficient Estimation ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Odds Ratios ◽

Genome Wide ◽

Follow Up Studies

In the past decade, genome-wide association studies have identified thousands of susceptible variants associated with complex human diseases and traits. Conducting follow-up genetic association studies has become a standard approach to validate the findings of genome-wide association studies. One problem of high interest in genetic association studies is to accurately estimate the strength of the association, which is often quantified by odds ratios in case-control studies. However, estimating the association directly by follow-up studies is inefficient since this approach ignores information from the genome-wide association studies. In this article, an estimator called GFcom, which integrates information from genome-wide association studies and follow-up studies, is proposed. The estimator includes both the point estimate and corresponding confidence interval. GFcom is more efficient than competing estimators regarding MSE and the length of confidence intervals. The superiority of GFcom is particularly evident when the genome-wide association study suffers from severe selection bias. Comprehensive simulation studies and applications to three real follow-up studies demonstrate the performance of the proposed estimator. An R package, “GFcom”, implementing our method is publicly available at https://github.com/JiyuanHu/GFcom .

Download Full-text

Power Estimation for Gene-Longevity Association Analysis Using Concordant Twins

Genetics Research International ◽

10.1155/2014/154204 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Qihua Tan ◽

Jing Hua Zhao ◽

Torben Kruse ◽

Kaare Christensen

Keyword(s):

Association Study ◽

Genetic Association ◽

Association Analysis ◽

Statistical Power ◽

Association Studies ◽

Genetic Association Studies ◽

Small Sample ◽

Identical Twins ◽

Human Longevity ◽

Sample Sizes

Statistical power is one of the major concerns in genetic association studies. Related individuals such as twins are valuable samples for genetic studies because of their genetic relatedness. Phenotype similarity in twin pairs provides evidence of genetic control over the phenotype variation in a population. The genetic association study on human longevity, a complex trait that is under control of both genetic and environmental factors, has been confronted by the small sample sizes of longevity subjects which limit statistical power. Twin pairs concordant for longevity have increased probability for carrying beneficial genes and thus are useful samples for gene-longevity association analysis. We conducted a computer simulation to estimate the power of association study using longevity concordant twin pairs. We observed remarkable power increases in using singletons from longevity concordant twin pairs as cases in comparison with cases of sporadic proband. A similar power would require doubled sample sizes for fraternal twins than for identical twins who are concordant for longevity suggesting that longevity concordant identical twins are more efficient samples than fraternal twins. We also observed an approximate of 2- to 3-fold increase in sample sizes needed for longevity cutoff at age 90 as compared with that at age 95. Overall, our results showed high value of twins in genetic association studies on human longevity.

Download Full-text

M-regression, false discovery rates and outlier detection with application to genetic association studies

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2014.03.019 ◽

2014 ◽

Vol 78 ◽

pp. 33-42 ◽

Cited By ~ 13

Author(s):

V.M. Lourenço ◽

A.M. Pires

Keyword(s):

Outlier Detection ◽

Genetic Association ◽

Association Studies ◽

Genetic Association Studies ◽

False Discovery Rates ◽

False Discovery ◽

Discovery Rates

Download Full-text

Systemic Sclerosis is a Complex Disease Associated Mainly with Immune Regulatory and Inflammatory Genes

The Open Rheumatology Journal ◽

10.2174/1874312901408010029 ◽

2014 ◽

Vol 8 (1) ◽

pp. 29-42 ◽

Cited By ~ 20

Author(s):

Jingxiao Jin ◽

Chou Chou ◽

Maria Lima ◽

Danielle Zhou ◽

Xiaodong Zhou

Keyword(s):

Systemic Sclerosis ◽

Genetic Association ◽

Dna Cleavage ◽

Complex Disease ◽

Association Studies ◽

Genetic Association Studies ◽

Genome Wide Association Studies ◽

Functional Changes ◽

Circulating Autoantibodies ◽

Genetic Contributions

Systemic sclerosis (SSc) is a fibrotic and autoimmune disease characterized clinically by skin and internal organ fibrosis and vascular damage, and serologically by the presence of circulating autoantibodies. Although etiopathogenesis is not yet well understood, the results of numerous genetic association studies support genetic contributions as an important factor to SSc. In this paper, the major genes of SSc are reviewed. The most recent genome-wide association studies (GWAS) are taken into account along with robust candidate gene studies. The literature search was performed on genetic association studies of SSc in PubMed between January 2000 and March 2014 while eligible studies generally had over 600 total participants with replication. A few genetic association studies with related functional changes in SSc patients were also included. A total of forty seven genes or specific genetic regions were reported to be associated with SSc, although some are controversial. These genes include HLA genes, STAT4, CD247, TBX21, PTPN22, TNFSF4, IL23R, IL2RA, IL-21, SCHIP1/IL12A, CD226, BANK1, C8orf13-BLK, PLD4, TLR-2, NLRP1, ATG5, IRF5, IRF8, TNFAIP3, IRAK1, NFKB1, TNIP1, FAS, MIF, HGF, OPN, IL-6, CXCL8, CCR6, CTGF, ITGAM, CAV1, MECP2, SOX5, JAZF1, DNASEIL3, XRCC1, XRCC4, PXK, CSK, GRB10, NOTCH4, RHOB, KIAA0319, PSD3 and PSOR1C1. These genes encode proteins mainly involved in immune regulation and inflammation, and some of them function in transcription, kinase activity, DNA cleavage and repair. The discovery of various SSc-associated genes is important in understanding the genetics of SSc and potential pathogenesis that contribute to the development of this disease.

Download Full-text

The Role of SNP Interactions when Determining Independence of Novel Signals in Genetic Association Studies—An Application to ARG1 and Bronchodilator Response

Journal of Personalized Medicine ◽

10.3390/jpm11020145 ◽

2021 ◽

Vol 11 (2) ◽

pp. 145

Author(s):

Ryan Walsh ◽

Kirsten Voorhies ◽

Merry-Lynn McDonald ◽

Michael McGeachie ◽

Joanne E. Sordillo ◽

...

Keyword(s):

Genetic Association ◽

Association Studies ◽

Critical Role ◽

Genetic Association Studies ◽

Genome Wide Association Studies ◽

The Novel ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Bronchodilator Response

Genome-wide association studies (GWAS) play a critical role in identifying many loci for common diseases and traits. There has been a rapid increase in the number of GWAS over the past decade. As additional GWAS are being conducted, it is unclear whether a novel signal associated with the trait of interest is independent of single nucleotide polymorphisms (SNPs) in the same region that has been previously associated with the trait of interest. The general approach to determining whether the novel association is independent of previous signals is to examine the association of the novel SNP with the trait of interest conditional on the previously identified SNP and/or calculate linkage disequilibrium (LD) between the two SNPs. However, the role of epistasis and SNP by SNP interactions are rarely considered. Through simulation studies, we examined the role of SNP by SNP interactions when determining the independence of two genetic association signals. We have created an R package on Github called gxgRC to generate these simulation studies based on user input. In genetic association studies of asthma, we considered the role of SNP by SNP interactions when determining independence of signals for SNPs in the ARG1 gene and bronchodilator response.

Download Full-text

A Compendium of Age-Related PheWAS and GWAS Traits for Human Genetic Association Studies, Their Networks and Genetic Correlations

Frontiers in Genetics ◽

10.3389/fgene.2021.680560 ◽

2021 ◽

Vol 12 ◽

Author(s):

Seung-Soo Kim ◽

Adam D. Hudgins ◽

Brenda Gonzalez ◽

Sofiya Milman ◽

Nir Barzilai ◽

...

Keyword(s):

Genetic Association ◽

Heart Diseases ◽

Association Studies ◽

Genetic Correlations ◽

Genetic Association Studies ◽

Brain Diseases ◽

Genome Wide Association Studies ◽

Medical Subject Headings ◽

Age Related ◽

Rich Data

The rich data from the genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) offer an unprecedented opportunity to identify the biological underpinnings of age-related disease (ARD) risk and multimorbidity. Surprisingly, however, a comprehensive list of ARDs remains unavailable due to the lack of a clear definition and selection criteria. We developed a method to identify ARDs and to provide a compendium of ARDs for genetic association studies. Querying 1,358 electronic medical record-derived traits, we first defined ARDs and age-related traits (ARTs) based on their prevalence profiles, requiring a unimodal distribution that shows an increasing prevalence after the age of 40 years, and which reaches a maximum peak at 60 years of age or later. As a result, we identified a list of 463 ARDs and ARTs in the GWAS and PheWAS catalogs. We next translated the ARDs and ARTs to their respective 276 Medical Subject Headings diseases and 45 anatomy terms. The most abundant disease categories are neoplasms (48 terms), cardiovascular diseases (44 terms), and nervous system diseases (27 terms). Employing data from a human symptoms-disease network, we found 6 symptom-shared disease groups, representing cancers, heart diseases, brain diseases, joint diseases, eye diseases, and mixed diseases. Lastly, by overlaying our ARD and ART list with genetic correlation data from the UK Biobank, we found 54 phenotypes in 2 clusters with high genetic correlations. Our compendium of ARD and ART is a highly useful resource, with broad applicability for studies of the genetics of aging, ARD, and multimorbidity.

Download Full-text

FRACTAL CHARACTERIZATIONS OF MAX STATISTICAL DISTRIBUTION IN GENETIC ASSOCIATION STUDIES

Advances in Complex Systems ◽

10.1142/s0219525909002349 ◽

2009 ◽

Vol 12 (04n05) ◽

pp. 513-531 ◽

Cited By ~ 1

Author(s):

WENTIAN LI ◽

YANING YANG

Keyword(s):

Genetic Association ◽

Degrees Of Freedom ◽

Association Studies ◽

Disease Model ◽

Null Distribution ◽

Genetic Association Studies ◽

Case Control ◽

Case Control Studies ◽

Chi Square ◽

Two Parameters

Two noninteger parameters are defined for MAX statistics, which are maxima of d simpler test statistics. The first parameter, d MAX , is the fractional number of tests, representing the equivalent numbers of independent tests in MAX. If the d tests are dependent, d MAX < d. The second parameter is the fractional degrees of freedom k of the chi-square distribution [Formula: see text] that fits the MAX null distribution. These two parameters, d MAX and k, can be independently defined, and k can be noninteger even if d MAX is an integer. We illustrate these two parameters using the examples of MAX2 and MAX3 statistics in genetic case-control studies. We speculate that k is related to the amount of ambiguity of the model inferred by the test. In the case-control genetic association, tests with low k (e.g. k = 1) are able to provide definitive information about the disease model, as versus tests with high k (e.g. k = 2) that are completely uncertain about the disease model. Similar to Heisenberg's uncertain principle, the ability to infer disease model and the ability to detect significant association may not be simultaneously optimized, and k seems to measure the level of their balance.

Download Full-text