scholarly journals Taking population stratification into account by local permutations in rare-variant association studies on small samples

Author(s):  
J. Mullaert ◽  
M. Bouaziz ◽  
Y. Seeleuthner ◽  
B. Bigio ◽  
J-L. Casanova ◽  
...  

AbstractMany methods for rare variant association studies require permutations to assess the significance of tests. Standard permutations assume that all individuals are exchangeable and do not take population stratification (PS), a known confounding factor in genetic studies, into account. We propose a novel strategy, LocPerm, in which individuals are permuted only with their closest ancestry-based neighbors. We performed a simulation study, focusing on small samples, to evaluate and compare LocPerm with standard permutations and classical adjustment on first principal components. Under the null hypothesis, LocPerm was the only method providing an acceptable type I error, regardless of sample size and level of stratification. The power of LocPerm was similar to that of standard permutation in the absence of PS, and remained stable in different PS scenarios. We conclude that LocPerm is a method of choice for taking PS and/or small sample size into account in rare variant association studies.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Vahid Ebrahimi ◽  
Zahra Bagheri ◽  
Zahra Shayan ◽  
Peyman Jafari

Assessing differential item functioning (DIF) using the ordinal logistic regression (OLR) model highly depends on the asymptotic sampling distribution of the maximum likelihood (ML) estimators. The ML estimation method, which is often used to estimate the parameters of the OLR model for DIF detection, may be substantially biased with small samples. This study is aimed at proposing a new application of the elastic net regularized OLR model, as a special type of machine learning method, for assessing DIF between two groups with small samples. Accordingly, a simulation study was conducted to compare the powers and type I error rates of the regularized and nonregularized OLR models in detecting DIF under various conditions including moderate and severe magnitudes of DIF ( DIF = 0.4   and   0.8 ), sample size ( N ), sample size ratio ( R ), scale length ( I ), and weighting parameter ( w ). The simulation results revealed that for I = 5 and regardless of R , the elastic net regularized OLR model with w = 0.1 , as compared with the nonregularized OLR model, increased the power of detecting moderate uniform DIF ( DIF = 0.4 ) approximately 35% and 21% for N = 100   and   150 , respectively. Moreover, for I = 10 and severe uniform DIF ( DIF = 0.8 ), the average power of the elastic net regularized OLR model with 0.03 ≤ w ≤ 0.06 , as compared with the nonregularized OLR model, increased approximately 29.3% and 11.2% for N = 100   and   150 , respectively. In these cases, the type I error rates of the regularized and nonregularized OLR models were below or close to the nominal level of 0.05. In general, this simulation study showed that the elastic net regularized OLR model outperformed the nonregularized OLR model especially in extremely small sample size groups. Furthermore, the present research provided a guideline and some recommendations for researchers who conduct DIF studies with small sample sizes.


2021 ◽  
Author(s):  
Jimmy Mullaert ◽  
Matthieu Bouaziz ◽  
Yoann Seeleuthner ◽  
Benedetta Bigio ◽  
Jean‐Laurent Casanova ◽  
...  

2019 ◽  
Vol 101 ◽  
Author(s):  
Lifeng Liu ◽  
Pengfei Wang ◽  
Jingbo Meng ◽  
Lili Chen ◽  
Wensheng Zhu ◽  
...  

Abstract In recent years, there has been an increasing interest in detecting disease-related rare variants in sequencing studies. Numerous studies have shown that common variants can only explain a small proportion of the phenotypic variance for complex diseases. More and more evidence suggests that some of this missing heritability can be explained by rare variants. Considering the importance of rare variants, researchers have proposed a considerable number of methods for identifying the rare variants associated with complex diseases. Extensive research has been carried out on testing the association between rare variants and dichotomous, continuous or ordinal traits. So far, however, there has been little discussion about the case in which both genotypes and phenotypes are ordinal variables. This paper introduces a method based on the γ-statistic, called OV-RV, for examining disease-related rare variants when both genotypes and phenotypes are ordinal. At present, little is known about the asymptotic distribution of the γ-statistic when conducting association analyses for rare variants. One advantage of OV-RV is that it provides a robust estimation of the distribution of the γ-statistic by employing the permutation approach proposed by Fisher. We also perform extensive simulations to investigate the numerical performance of OV-RV under various model settings. The simulation results reveal that OV-RV is valid and efficient; namely, it controls the type I error approximately at the pre-specified significance level and achieves greater power at the same significance level. We also apply OV-RV for rare variant association studies of diastolic blood pressure.


PEDIATRICS ◽  
1989 ◽  
Vol 83 (3) ◽  
pp. A72-A72
Author(s):  
Student

The believer in the law of small numbers practices science as follows: 1. He gambles his research hypotheses on small samples without realizing that the odds against him are unreasonably high. He overestimates power. 2. He has undue confidence in early trends (e.g., the data of the first few subjects) and in the stability of observed patterns (e.g., the number and identity of significant results). He overestimates significance. 3. In evaluating replications, his or others', he has unreasonably high expectations about the replicability of significant results. He underestimates the breadth of confidence intervals. 4. He rarely attributes a deviation of results from expectations to sampling variability, because he finds a causal "explanation" for any discrepancy. Thus, he has little opportunity to recognize sampling variation in action. His belief in the law of small numbers, therefore, will forever remain intact.


2019 ◽  
Vol 21 (3) ◽  
pp. 753-761 ◽  
Author(s):  
Regina Brinster ◽  
Dominique Scherer ◽  
Justo Lorenzo Bermejo

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.


2019 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Yaowu Liu ◽  
Jincheng Shen ◽  
Han Chen ◽  
...  

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.


2020 ◽  
Vol 57 (2) ◽  
pp. 237-251
Author(s):  
Achilleas Anastasiou ◽  
Alex Karagrigoriou ◽  
Anastasios Katsileros

SummaryThe normal distribution is considered to be one of the most important distributions, with numerous applications in various fields, including the field of agricultural sciences. The purpose of this study is to evaluate the most popular normality tests, comparing the performance in terms of the size (type I error) and the power against a large spectrum of distributions with simulations for various sample sizes and significance levels, as well as through empirical data from agricultural experiments. The simulation results show that the power of all normality tests is low for small sample size, but as the sample size increases, the power increases as well. Also, the results show that the Shapiro–Wilk test is powerful over a wide range of alternative distributions and sample sizes and especially in asymmetric distributions. Moreover the D’Agostino–Pearson Omnibus test is powerful for small sample sizes against symmetric alternative distributions, while the same is true for the Kurtosis test for moderate and large sample sizes.


2012 ◽  
Author(s):  
Nor Haniza Sarmin ◽  
Md Hanafiah Md Zin ◽  
Rasidah Hussin

Suatu transformasi terhadap min dilakukan menggunakan penganggar pembetulan kepincangan bagi mendapatkan statistik untuk menguji min hipotesis taburan terpencong. Penghasilan statistik ini melibatkan pengubahsuaian pemboleh ubah . Kajian simulasi yang dijalankan terhadap taburan yang terpencong iaitu taburan eksponen, kuasa dua khi dan Weibull ke atas Kebarangkalian Ralat Jenis I menunjukkan bahawa statistik t3 sesuai untuk ujian satu hujung sebelah kiri dan saiz sampel yang kecil (n=5). Kata kunci: Min; statistik; taburan terpencong; penganggar pembetulan kepincangan; kebarangkalian Ralat Jenis I A transformation of mean has been done using a bias correction estimator to produce a statistic for mean hypothesis of skewed distributions. The statistic found involves a modification of the variable . A simulation study that has been done on some skewed distributions i.e. esponential, chi-square and Weibull on the Type I Error shows that t3 is suitable for the left-tailed test and a small sample size (n=5). Key words: Mean; statistic; skewed distribution; bias correction estimator; Type I Error


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Keisuke Ejima ◽  
Andrew Brown ◽  
Daniel Smith ◽  
Ufuk Beyaztas ◽  
David Allison

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.


Sign in / Sign up

Export Citation Format

Share Document