Using Step-Wise Linear Regression to Detect “Functional” Sequence Variants: Application to Simulated Data

2001 ◽  
Vol 21 (S1) ◽  
pp. S353-S357
Author(s):  
Braxton D. Mitchell ◽  
Wen-Chi Hsueh ◽  
Jennifer L. Schneider ◽  
John Blangero
2016 ◽  
Vol 29 (6) ◽  
pp. 1977-1998 ◽  
Author(s):  
Alexis Hannart

Abstract The present paper introduces and illustrates methodological developments intended for so-called optimal fingerprinting methods, which are of frequent use in detection and attribution studies. These methods used to involve three independent steps: preliminary reduction of the dimension of the data, estimation of the covariance associated to internal climate variability, and, finally, linear regression inference with associated uncertainty assessment. It is argued that such a compartmentalized treatment presents several issues; an integrated method is thus introduced to address them. The suggested approach is based on a single-piece statistical model that represents both linear regression and control runs. The unknown covariance is treated as a nuisance parameter that is eliminated by integration. This allows for the introduction of regularization assumptions. Point estimates and confidence intervals follow from the integrated likelihood. Further, it is shown that preliminary dimension reduction is not required for implementability and that computational issues associated to using the raw, high-dimensional, spatiotemporal data can be resolved quite easily. Results on simulated data show improved performance compared to existing methods w.r.t. both estimation error and accuracy of confidence intervals and also highlight the need for further improvements regarding the latter. The method is illustrated on twentieth-century precipitation and surface temperature, suggesting a potentially high informational benefit of using the raw, nondimension-reduced data in detection and attribution (D&A), provided model error is appropriately built into the inference.


PLoS ONE ◽  
2016 ◽  
Vol 11 (4) ◽  
pp. e0153815 ◽  
Author(s):  
Xiaoyun Yin ◽  
Shuchao Pang ◽  
Jian Huang ◽  
Yinghua Cui ◽  
Bo Yan

2014 ◽  
Vol 3 (1) ◽  
pp. 8
Author(s):  
DWI LARAS RIYANTINI ◽  
MADE SUSILAWATI ◽  
KARTIKA SARI

Multicollinearity is a problem that often occurs in multiple linear regression. The existence of multicollinearity in the independent variables resulted in a regression model obtained is far from accurate. Latent root regression is an alternative in dealing with the presence of multicollinearity in multiple linear regression. In the latent root regression, multicollinearity was overcome by reducing the original variables into new variables through principal component analysis techniques. In this regression the estimation of parameters is modified least squares method. In this study, the data used are eleven groups of simulated data with varying number of independent variables. Based on the VIF value and the value of correlation, latent root regression is capable of handling multicollinearity completely. On the other hand, a regression model that was obtained by latent root regression has   value of 0.99, which indicates that the independent variables can explain the diversity of the response variables accurately.


2006 ◽  
Vol 7 (1) ◽  
Author(s):  
Mahboob A Chowdhury ◽  
Helena Kuivaniemi ◽  
Roberto Romero ◽  
Samuel Edwin ◽  
Tinnakorn Chaiworapongsa ◽  
...  

2014 ◽  
Vol 2014 ◽  
pp. 1-7
Author(s):  
Xuedong Chen ◽  
Qianying Zeng ◽  
Qiankun Song

An extension of some standard likelihood and variable selection criteria based on procedures of linear regression models under the skew-normal distribution or the skew-tdistribution is developed. This novel class of models provides a useful generalization of symmetrical linear regression models, since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions. A generalized expectation-maximization algorithm is developed for computing thel1penalized estimator. Efficacy of the proposed methodology and algorithm is demonstrated by simulated data.


2018 ◽  
Vol 87 (3) ◽  
Author(s):  
Tommy Löfstedt ◽  
Vincent Guillemot ◽  
Vincent Frouin ◽  
Edouard Duchesnay ◽  
Fouad Hadj-Selem

2018 ◽  
Vol 20 (1) ◽  
pp. 96-119 ◽  
Author(s):  
Xavier Bry ◽  
Catherine Trottier ◽  
Frédéric Mortier ◽  
Guillaume Cornu

We address component-based regularization of a multivariate generalized linear model (GLM). A vector of random responses [Formula: see text] is assumed to depend, through a GLM, on a set [Formula: see text] of explanatory variables, as well as on a set [Formula: see text] of additional covariates. [Formula: see text] is partitioned into [Formula: see text] conceptually homogenous variable groups [Formula: see text], viewed as explanatory themes. Variables in each [Formula: see text] are assumed many and redundant. Thus, generalized linear regression demands dimension reduction and regularization with respect to each [Formula: see text]. By contrast, variables in [Formula: see text] are assumed few and selected so as to demand no regularization. Regularization is performed searching each [Formula: see text] for an appropriate number of orthogonal components that both contribute to model [Formula: see text] and capture relevant structural information in [Formula: see text]. To estimate a single-theme model, we first propose an enhanced version of Supervised Component Generalized Linear Regression (SCGLR), based on a flexible measure of structural relevance of components, and able to deal with mixed-type explanatory variables. Then, to estimate the multiple-theme model, we develop an algorithm encapsulating this enhanced SCGLR: THEME-SCGLR. The method is tested on simulated data and then applied to rainforest data in order to model the abundance of tree species.


2005 ◽  
Vol 62 (7) ◽  
pp. 1256-1269 ◽  
Author(s):  
Bernard A. Megrey ◽  
Yong-Woo Lee ◽  
S. Allen Macklin

Abstract Many of the factors affecting recruitment in marine populations are still poorly understood, complicating the prediction of strong year classes. Despite numerous attempts, the complexity of the problem often seems beyond the capabilities of traditional statistical analysis paradigms. This study examines the utility of four statistical procedures to identify relationships between recruitment and the environment. Because we can never really know the parameters or underlying relationships of actual data, we chose to use simulated data with known properties and different levels of measurement error to test and compare the methods, especially their ability to forecast future recruitment states. Methods examined include traditional linear regression, non-linear regression, Generalized Additive Models (GAM), and Artificial Neural Networks (ANN). Each is compared according to its ability to recover known patterns and parameters from simulated data, as well as to accurately forecast future recruitment states. We also apply the methods to published Norwegian spring-spawning herring (Clupea harengus L.) spawner–recruit–environment data. Results were not consistently conclusive, but in general, flexible non-parametric methods such as GAMs and ANNs performed better than parametric approaches in both parameter estimation and forecasting. Even under controlled data simulation procedures, we saw evidence of spurious correlations. Models fit to the Norwegian spring-spawning herring data show the importance of sea temperature and spawning biomass. The North Atlantic Oscillation (NAO) did not appear to be an influential factor affecting herring recruitment.


2020 ◽  
Author(s):  
Jie Yuan ◽  
Ben Lai ◽  
Itsik Pe’er

1AbstractTranscriptome-Wide Association Studies discover SNP effects mediated by gene expression through a two-stage process: a typically small reference panel is used to infer SNP-expression effects, and then these are applied to discover associations between imputed expression and phenotypes. We investigate whether the accuracy of SNP-expression and expression-phenotype associations can be increased by performing inference on both the reference panel and independent GWAS cohorts simultaneously. We develop EMBER (Estimation of Mediated Binary Effects in Regression) to re-estimate these effects using a liability threshold model with an adjustment to variance components accounting for imputed expression from GWAS data. In simulated data with only gene-mediated effects, EMBER more than doubles the performance of SNP-expression linear regression, increasing mean r2 from 0.3 to 0.65 with a gene-mediated variance explained of 0.01. EMBER also improves estimation accuracy when the fraction of cis-SNP variance mediated by genes is as low as 30%. We apply EMBER to genotype and gene expression data in schizophrenia by combining 512 samples from the CommonMind Consortium and 56,081 samples from the Psychiatric Genomic Consortium. We evaluate performance of EMBER in 36 genes suggested by TWAS by concordance of inferred effects with effects reported independently for frontal cortex expression. Applying the EMBER framework to a baseline linear regression model increases performance in 26 out of 36 genes (sign test p-value .0020) with an increase in mean r2 from 0.200 to 0.235.


Sign in / Sign up

Export Citation Format

Share Document