scholarly journals Subsampling Technique to Estimate Variance Component for UK-Biobank Traits

2021 ◽  
Vol 12 ◽  
Author(s):  
Ting Xu ◽  
Guo-An Qi ◽  
Jun Zhu ◽  
Hai-Ming Xu ◽  
Guo-Bo Chen

The estimation of heritability has been an important question in statistical genetics. Due to the clear mathematical properties, the modified Haseman–Elston regression has been found a bridge that connects and develops various parallel heritability estimation methods. With the increasing sample size, estimating heritability for biobank-scale data poses a challenge for statistical computation, in particular that the calculation of the genetic relationship matrix is a huge challenge in statistical computation. Using the Haseman–Elston framework, in this study we explicitly analyzed the mathematical structure of the key term tr(KTK), the trace of high-order term of the genetic relationship matrix, a component involved in the estimation procedure. In this study, we proposed two estimators, which can estimate tr(KTK) with greatly reduced sampling variance compared to the existing method under the same computational complexity. We applied this method to 81 traits in UK Biobank data and compared the chromosome-wise partition heritability with the whole-genome heritability, also as an approach for testing polygenicity.

2009 ◽  
Vol 41 (1) ◽  
Author(s):  
Alison M Kelly ◽  
Brian R Cullis ◽  
Arthur R Gilmour ◽  
John A Eccleston ◽  
Robin Thompson

2014 ◽  
Author(s):  
tristan hayeck ◽  
Noah Zaitlen ◽  
Po-Ru Loh ◽  
Bjarni Vilhjalmsson ◽  
Samuela Pollack ◽  
...  

We introduce a Liability Threshold Mixed Linear Model (LTMLM) association statistic for ascertained case-control studies that increases power vs. existing mixed model methods, with a well-controlled false-positive rate. Recent work has shown that existing mixed model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem using a chi-square score statistic computed from posterior mean liabilities (PML) under the liability threshold model. Each individual’s PML is conditional not only on that individual’s case-control status, but also on every individual’s case-control status and on the genetic relationship matrix obtained from the data. The PML are estimated using a multivariate Gibbs sampler, with the liability-scale phenotypic covariance matrix based on the genetic relationship matrix (GRM) and a heritability parameter estimated via Haseman-Elston regression on case-control phenotypes followed by transformation to liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed model methods in all scenarios tested, with the magnitude of the improvement depending on sample size and severity of case-control ascertainment. In a WTCCC2 multiple sclerosis data set with >10,000 samples, LTMLM was correctly calibrated and attained a 4.1% improvement (P=0.007) in chi-square statistics (vs. existing mixed model methods) at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, an increase in power over existing mixed model methods is available for ascertained case-control studies of diseases with low prevalence.


1999 ◽  
Vol 56 (7) ◽  
pp. 1234-1240
Author(s):  
W R Gould ◽  
L A Stefanski ◽  
K H Pollock

All catch-effort estimation methods implicitly assume catch and effort are known quantities, whereas in many cases, they have been estimated and are subject to error. We evaluate the application of a simulation-based estimation procedure for measurement error models (J.R. Cook and L.A. Stefanski. 1994. J. Am. Stat. Assoc. 89: 1314-1328) in catch-effort studies. The technique involves a simulation component and an extrapolation step, hence the name SIMEX estimation. We describe SIMEX estimation in general terms and illustrate its use with applications to real and simulated catch and effort data. Correcting for measurement error with SIMEX estimation resulted in population size and catchability coefficient estimates that were substantially less than naive estimates, which ignored measurement errors in some cases. In a simulation of the procedure, we compared estimators from SIMEX with "naive" estimators that ignore measurement errors in catch and effort to determine the ability of SIMEX to produce bias-corrected estimates. The SIMEX estimators were less biased than the naive estimators but in some cases were also more variable. Despite the bias reduction, the SIMEX estimator had a larger mean squared error than the naive estimator for one of two artificial populations studied. However, our results suggest the SIMEX estimator may outperform the naive estimator in terms of bias and precision for larger populations.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Olivier Delaneau ◽  
Jean-François Zagury ◽  
Matthew R. Robinson ◽  
Jonathan L. Marchini ◽  
Emmanouil T. Dermitzakis

AbstractThe number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.


2020 ◽  
Author(s):  
Raed Alzghool

This chapter considers estimation of autoregressive conditional heteroscedasticity (ARCH) and the generalized autoregressive conditional heteroscedasticity (GARCH) models using quasi-likelihood (QL) and asymptotic quasi-likelihood (AQL) approaches. The QL and AQL estimation methods for the estimation of unknown parameters in ARCH and GARCH models are developed. Distribution assumptions are not required of ARCH and GARCH processes by QL method. Nevertheless, the QL technique assumes knowing the first two moments of the process. However, the AQL estimation procedure is suggested when the conditional variance of process is unknown. The AQL estimation substitutes the variance and covariance by kernel estimation in QL. Reports of simulation outcomes, numerical cases, and applications of the methods to daily exchange rate series and weekly prices’ changes of crude oil are presented.


2020 ◽  
Author(s):  
Zhaotong Lin ◽  
Souvik Seal ◽  
Saonli Basu

AbstractSNP heritability of a trait is measured by the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms (SNPs). Linear mixed models are routinely used to estimate SNP heritability for many complex traits. The basic concept behind this approach is to model genetic contribution as a random effect, where the variance of this genetic contribution attributes to the heritability of the trait. This linear mixed model approach requires estimation of ‘relatedness’ among individuals in the sample, which is usually captured by estimating a genetic relationship matrix (GRM). Heritability is estimated by the restricted maximum likelihood (REML) or method of moments (MOM) approaches, and this estimation relies heavily on the GRM computed from the genetic data on individuals. Presence of population substructure in the data could significantly impact the GRM estimation and may introduce bias in heritability estimation. The common practice of accounting for such population substructure is to adjust for the top few principal components of the GRM as covariates in the linear mixed model. Here we propose an alternative way of estimating heritability in multi-ethnic studies. Our proposed approach is a MOM estimator derived from the Haseman-Elston regression and gives an asymptotically unbiased estimate of heritability in presence of population stratification. It introduces adjustments for the population stratification in a second-order estimating equation and allows for the total phenotypic variance vary by ethnicity. We study the performance of different MOM and REML approaches in presence of population stratification through extensive simulation studies. We estimate the heritability of height, weight and other anthropometric traits in the UK Biobank cohort to investigate the impact of subtle population substructure on SNP heritability estimation.


Sign in / Sign up

Export Citation Format

Share Document