scholarly journals Power Analysis of C-TDT for Small Sample Size Genome-Wide Association Studies by the Joint Use of Case-Parent Trios and Pairs

2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Farid Rajabli ◽  
Gul Inan ◽  
Ozlem Ilk

In family-based genetic association studies, it is possible to encounter missing genotype information for one of the parents. This leads to a study consisting of both case-parent trios and case-parent pairs. One of the approaches to this problem is permutation-based combined transmission disequilibrium test statistic. However, it is still unknown how powerful this test statistic is with small sample sizes. In this paper, a simulation study is carried out to estimate the power and false positive rate of this test across different sample sizes for a family-based genome-wide association study. It is observed that a statistical power of over 80% and a reasonable false positive rate estimate can be achieved even with a combination of 50 trios and 30 pairs when 2% of the SNPs are assumed to be associated. Moreover, even smaller samples provide high power when smaller percentages of SNPs are associated with the disease.

2018 ◽  
Author(s):  
Cox Lwaka Tamba ◽  
Yuan-Ming Zhang

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.


2019 ◽  
Vol 35 (17) ◽  
pp. 3046-3054 ◽  
Author(s):  
Anastasia Gurinovich ◽  
Harold Bae ◽  
John J Farrell ◽  
Stacy L Andersen ◽  
Stefano Monti ◽  
...  

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Meiyue Wang ◽  
Zhuoqing Fang ◽  
Boyoung Yoo ◽  
Gill Bejerano ◽  
Gary Peltz

The ability to use genome-wide association studies (GWAS) for genetic discovery depends upon our ability to distinguish true causative from false positive association signals. Population structure (PS) has been shown to cause false positive signals in GWAS. PS correction is routinely used for analysis of human GWAS results, and it has been assumed that it also should be utilized for murine GWAS using inbred strains. Nevertheless, there are fundamental differences between murine and human GWAS, and the impact of PS on murine GWAS results has not been carefully investigated. To assess the impact of PS on murine GWAS, we examined 8223 datasets that characterized biomedical responses in panels of inbred mouse strains. Rather than treat PS as a confounding variable, we examined it as a response variable. Surprisingly, we found that PS had a minimal impact on datasets measuring responses in ≤20 strains; and had surprisingly little impact on most datasets characterizing 21 – 40 inbred strains. Moreover, we show that true positive association signals arising from haplotype blocks, SNPs or indels, which were experimentally demonstrated to be causative for trait differences, would be rejected if PS correction were applied to them. Our results indicate because of the special conditions created by GWAS (the use of inbred strains, small sample sizes) PS assessment results should be carefully evaluated in conjunction with other criteria, when murine GWAS results are evaluated.


Circulation ◽  
2016 ◽  
Vol 133 (suppl_1) ◽  
Author(s):  
James S Floyd ◽  
Colleen Sitlani ◽  
Christy L Avery ◽  
Eric A Whitsel ◽  
Leslie Lange ◽  
...  

Introduction: Sulfonylureas are a commonly-used class of diabetes medication that can prolong the QT-interval, which is a leading cause of drug withdrawals from the market given the possible risk of life-threatening arrhythmias. Previously, we conducted a meta-analysis of genome-wide association studies of sulfonylurea-genetic interactions on QT interval among 9 European-ancestry (EA) cohorts using cross-sectional data, with null results. To improve our power to identify novel drug-gene interactions, we have included repeated measures of medication use and QT interval and expanded our study to include several additional cohorts, including African-American (AA) and Hispanic-ancestry (HA) cohorts with a high prevalence of sulfonylurea use. To identify potentially differential effects on cardiac depolarization and repolarization, we have also added two phenotypes - the JT and QRS intervals, which together comprise the QT interval. Hypothesis: The use of repeated measures and expansion of our meta-analysis to include diverse ancestry populations will allow us to identify novel pharmacogenomic interactions for sulfonylureas on the ECG phenotypes QT, JT, and QRS. Methods: Cohorts with unrelated individuals used generalized estimating equations to estimate interactions; cohorts with related individuals used mixed effect models clustered on family. For each ECG phenotype (QT, JT, QRS), we conducted ancestry-specific (EA, AA, HA) inverse variance weighted meta-analyses using standard errors based on the t-distribution to correct for small sample inflation in the test statistic. Ancestry-specific summary estimates were combined using MANTRA, an analytic method that accounts for differences in local linkage disequilibrium between ethnic groups. Results: Our study included 65,997 participants from 21 cohorts, including 4,020 (6%) sulfonylurea users, a substantial increase from the 26,986 participants and 846 sulfonylureas users in the previous meta-analysis. Preliminary ancestry-specific meta-analyses have identified genome-wide significant associations (P < 5х10–8) for each ECG phenotype, and analyses with MANTRA are in progress. Conclusions: In the setting of the largest collection of pharmacogenomic studies to date, we used repeated measurements and leveraged diverse ancestry populations to identify new pharmacogenomic loci for ECG traits associated with cardiovascular risk.


2013 ◽  
Vol 37 (4) ◽  
pp. 383-392 ◽  
Author(s):  
Karla J. Lindquist ◽  
Eric Jorgenson ◽  
Thomas J. Hoffmann ◽  
John S. Witte

2010 ◽  
Vol 25 (5) ◽  
pp. 307-309 ◽  
Author(s):  
J. Lasky-Su ◽  
C. Lange

AbstractThe etiology of suicide is complex in nature with both environmental and genetic causes that are extremely diverse. This extensive heterogeneity weakens the relationship between genotype and phenotype and as a result, we face many challenges when studying the genetic etiology of suicide. We are now in the midst of a genetics revolution, where genotyping costs are decreasing and genotyping speed is increasing at a fast rate, allowing genetic association studies to genotype thousands to millions of SNPs that cover the entire human genome. As such, genome-wide association studies (GWAS) are now the norm. In this article we address several statistical challenges that occur when studying the genetic etiology of suicidality in the age of the genetics revolution. These challenges include: (1) the large number of statistical tests; (2) complex phenotypes that are difficult to quantify; and (3) modest genetic effect sizes. We address these statistical issues in the context of family-based study designs. Specifically, we discuss several statistical extensions of family-based association tests (FBATs) that work to alleviate these challenges. As our intention is to describe how statistical methodology may work to identify disease variants for suicidality, we avoid the mathematical details of the methodologies presented.


Sign in / Sign up

Export Citation Format

Share Document