scholarly journals Stochastic imputation for integrated transcriptome association analysis of a longitudinally measured trait

2019 ◽  
Vol 29 (4) ◽  
pp. 1167-1180 ◽  
Author(s):  
Evan L Ray ◽  
Jing Qian ◽  
Regina Brecha ◽  
Muredach P Reilly ◽  
Andrea S Foulkes

The mechanistic pathways linking genetic polymorphisms and complex disease traits remain largely uncharacterized. At the same time, expansive new transcriptome data resources offer unprecedented opportunity to unravel the mechanistic underpinnings of complex disease associations. Two-stage strategies involving conditioning on a single, penalized regression imputation for transcriptome association analysis have been described for cross-sectional traits. In this manuscript, we propose an alternative two-stage approach based on stochastic regression imputation that additionally incorporates error in the predictive model. Application of a bootstrap procedure offers flexibility when a closed form predictive distribution is not available. The two-stage strategy is also generalized to longitudinally measured traits, using a linear mixed effects modeling framework and a composite test statistic to evaluate whether the genetic component of gene-level expression modifies the biomarker trajectory over time. Simulations studies are performed to evaluate relative performance with respect to type-1 error rates, coverage, estimation error, and power under a range of conditions. A case study is presented to investigate the association between whole blood expression for each of five inflammasome genes with inflammatory response over time after endotoxin challenge.

Genetics ◽  
2002 ◽  
Vol 161 (2) ◽  
pp. 905-914 ◽  
Author(s):  
Hakkyo Lee ◽  
Jack C M Dekkers ◽  
M Soller ◽  
Massoud Malek ◽  
Rohan L Fernando ◽  
...  

Abstract Controlling the false discovery rate (FDR) has been proposed as an alternative to controlling the genomewise error rate (GWER) for detecting quantitative trait loci (QTL) in genome scans. The objective here was to implement FDR in the context of regression interval mapping for multiple traits. Data on five traits from an F2 swine breed cross were used. FDR was implemented using tests at every 1 cM (FDR1) and using tests with the highest test statistic for each marker interval (FDRm). For the latter, a method was developed to predict comparison-wise error rates. At low error rates, FDR1 behaved erratically; FDRm was more stable but gave similar significance thresholds and number of QTL detected. At the same error rate, methods to control FDR gave less stringent significance thresholds and more QTL detected than methods to control GWER. Although testing across traits had limited impact on FDR, single-trait testing was recommended because there is no theoretical reason to pool tests across traits for FDR. FDR based on FDRm was recommended for QTL detection in interval mapping because it provides significance tests that are meaningful, yet not overly stringent, such that a more complete picture of QTL is revealed.


2021 ◽  
Author(s):  
Konstantinos Slavakis ◽  
Gaurav Shetty ◽  
Loris Cannelli ◽  
Gesualdo Scutari ◽  
Ukash Nakarmi ◽  
...  

This paper introduces a non-parametric kernel-based modeling framework for imputation by regression on data that are assumed to lie close to an unknown-to-the-user smooth manifold in a Euclidean space. The proposed framework, coined kernel regression imputation in manifolds (KRIM), needs no training data to operate. Aiming at computationally efficient solutions, KRIM utilizes a small number of ``landmark'' data-points to extract geometric information from the measured data via parsimonious affine combinations (``linear patches''), which mimic the concept of tangent spaces to smooth manifolds and take place in functional approximation spaces, namely reproducing kernel Hilbert spaces (RKHSs). Multiple complex RKHSs are combined in a data-driven way to surmount the obstacle of pin-pointing the ``optimal'' parameters of a single kernel through cross-validation. The extracted geometric information is incorporated into the design via a novel bi-linear data-approximation model, and the imputation-by-regression task takes the form of an inverse problem which is solved by an iterative algorithm with guaranteed convergence to a stationary point of the non-convex loss function. To showcase the modular character and wide applicability of KRIM, this paper highlights the application of KRIM to dynamic magnetic resonance imaging (dMRI), where reconstruction of high-resolution images from severely under-sampled dMRI data is desired. Extensive numerical tests on synthetic and real dMRI data demonstrate the superior performance of KRIM over state-of-the-art approaches under several metrics and with a small computational footprint.<br>


2020 ◽  
Vol 16 (12) ◽  
pp. e1008473
Author(s):  
Pamela N. Luna ◽  
Jonathan M. Mansbach ◽  
Chad A. Shaw

Changes in the composition of the microbiome over time are associated with myriad human illnesses. Unfortunately, the lack of analytic techniques has hindered researchers’ ability to quantify the association between longitudinal microbial composition and time-to-event outcomes. Prior methodological work developed the joint model for longitudinal and time-to-event data to incorporate time-dependent biomarker covariates into the hazard regression approach to disease outcomes. The original implementation of this joint modeling approach employed a linear mixed effects model to represent the time-dependent covariates. However, when the distribution of the time-dependent covariate is non-Gaussian, as is the case with microbial abundances, researchers require different statistical methodology. We present a joint modeling framework that uses a negative binomial mixed effects model to determine longitudinal taxon abundances. We incorporate these modeled microbial abundances into a hazard function with a parameterization that not only accounts for the proportional nature of microbiome data, but also generates biologically interpretable results. Herein we demonstrate the performance improvements of our approach over existing alternatives via simulation as well as a previously published longitudinal dataset studying the microbiome during pregnancy. The results demonstrate that our joint modeling framework for longitudinal microbiome count data provides a powerful methodology to uncover associations between changes in microbial abundances over time and the onset of disease. This method offers the potential to equip researchers with a deeper understanding of the associations between longitudinal microbial composition changes and disease outcomes. This new approach could potentially lead to new diagnostic biomarkers or inform clinical interventions to help prevent or treat disease.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S293-S293
Author(s):  
Jonathan Altamirano ◽  
Grace Tam ◽  
Marcela Lopez ◽  
India Robinson ◽  
Leanne Chun ◽  
...  

Abstract Background While pediatric cases of COVID-19 are at low risk for adverse events, schoolchildren should be considered for surveillance as they can become infected at school and serve as sources of household or community transmission. Our team assessed the feasibility of young children self-collecting SARS-CoV-2 samples for surveillance testing in an educational setting. Methods Students at a K-8 school were tested weekly for SARS-CoV-2 from September 2020 - June 2021. Error rates were collected from September 2020 - January 2021. Clinical staff provided all students with instructions for anterior nares specimen self-collection and then observed them to ensure proper technique. Instructions included holding the sterile swab while making sure not to touch the tip, inserting the swab into their nostril until they start to feel resistance, and rubbing the swab in four circles before repeating the process in their other nostril. An independent observer timed random sample self-collections from April - June 2021. Results 2,590 samples were collected from 209 students during the study period when data on error rates were collected. Errors occurred in 3.3% of all student encounters (n=87). Error rates over time are shown in Figure 1, with the highest rate occurring on the first day of testing (n=20/197, 10.2%) and the lowest in January 2021 (n=1/202, 0.5%). 2,574 visits for sample self-collection occurred during the study period when independent timing data was collected (April - June 2021). Of those visits, 7.5% (n=193) were timed. The average duration of each visit was 70 seconds. Figure 1. Swab Error Rates Over Time Conclusion Pediatric self-collected lower nasal swabs are a viable and easily tolerated specimen collection method for SARS-CoV-2 surveillance in school settings, as evidenced by the low error rate and short time window of sample self-collection during testing. School administrators should expect errors to drop quickly after implementing testing. Disclosures All Authors: No reported disclosures


2013 ◽  
Vol 2013 ◽  
pp. 1-9
Author(s):  
Xiuli Wang

We consider the testing problem for the parameter and restricted estimator for the nonparametric component in the additive partially linear errors-in-variables (EV) models under additional restricted condition. We propose a profile Lagrange multiplier test statistic based on modified profile least-squares method and two-stage restricted estimator for the nonparametric component. We derive two important results. One is that, without requiring the undersmoothing of the nonparametric components, the proposed test statistic is proved asymptotically to be a standard chi-square distribution under the null hypothesis and a noncentral chi-square distribution under the alternative hypothesis. These results are the same as the results derived by Wei and Wang (2012) for their adjusted test statistic. But our method does not need an adjustment and is easier to implement especially for the unknown covariance of measurement error. The other is that asymptotic distribution of proposed two-stage restricted estimator of the nonparametric component is asymptotically normal and has an oracle property in the sense that, though the other component is unknown, the estimator performs well as if it was known. Some simulation studies are carried out to illustrate relevant performances with a finite sample. The asymptotic distribution of the restricted corrected-profile least-squares estimator, which has not been considered by Wei and Wang (2012), is also investigated.


Author(s):  
A. H. ZAPATA ◽  
M. R. V. CHAUDRON

This paper is the result of two related studies done on the estimation of IT projects at a large Dutch multinational company. The first one is a study about the accuracy of different dimensions of IT project estimating: schedule, budget and effort. [Note: This paper is an extension of the paper published by the authors as "An analysis of accuracy and learning in software project estimating" [28].] This study is based on a dataset of 171 projects collected at the IT department of the company. We analyzed the estimation error of budget, effort and schedule. Also, we analyzed whether there is any learning (improvement) effect over time. With the results of the first study we proceeded to research what is causing the current estimation error (inaccuracy). The results of our first study show that there is no relation between accuracy of budget, schedule and effort in the company analyzed. Besides, they show that over time there is no change in the inaccuracy (effectiveness and efficiency of the estimates). In our second study we discovered that the sources of this inaccuracy are: (IT estimation) process complexity, misuse of estimates, technical complexity, requirements redefinition and business domain instability. This paper reflects and provides recommendations on how to improve the learning from historical estimates and how to manage the diverse sources of inaccuracy inside this particular company and also in other organizations.


Biostatistics ◽  
2019 ◽  
Author(s):  
Jingchunzi Shi ◽  
Michael Boehnke ◽  
Seunggeun Lee

Summary Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.


1980 ◽  
Vol 5 (2) ◽  
pp. 129-156 ◽  
Author(s):  
George B. Macready ◽  
C. Mitchell Dayton

A variety of latent class models has been presented during the last 10 years which are restricted forms of a more general class of probability models. Each of these models involves an a priori dependency structure among a set of dichotomously scored tasks that define latent class response patterns across the tasks. In turn, the probabilities related to these latent class patterns along with a set of “Omission” and “intrusion” error rates for each task are the parameters used in defining models within this general class. One problem in using these models is that the defining parameters for a specific model may not be “identifiable.” To deal with this problem, researchers have considered curtailing the form of the model of interest by placing restrictions on the defining parameters. The purpose of this paper is to describe a two-stage conditional estimation procedure which results in reasonable estimates of specific models even though they may be nonidentifiable. This procedure involves the following stages: (a) establishment of initial parameter estimates and (b) step-wise maximum likelihood solutions for latent class probabilities and classification errors with iteration of this process until stable parameter estimates across successive iterations are obtained.


2018 ◽  
Vol 28 (8) ◽  
pp. 2418-2438
Author(s):  
Xi Shen ◽  
Chang-Xing Ma ◽  
Kam C Yuen ◽  
Guo-Liang Tian

Bilateral correlated data are often encountered in medical researches such as ophthalmologic (or otolaryngologic) studies, in which each unit contributes information from paired organs to the data analysis, and the measurements from such paired organs are generally highly correlated. Various statistical methods have been developed to tackle intra-class correlation on bilateral correlated data analysis. In practice, it is very important to adjust the effect of confounder on statistical inferences, since either ignoring the intra-class correlation or confounding effect may lead to biased results. In this article, we propose three approaches for testing common risk difference for stratified bilateral correlated data under the assumption of equal correlation. Five confidence intervals of common difference of two proportions are derived. The performance of the proposed test methods and confidence interval estimations is evaluated by Monte Carlo simulations. The simulation results show that the score test statistic outperforms other statistics in the sense that the former has robust type [Formula: see text] error rates with high powers. The score confidence interval induced from the score test statistic performs satisfactorily in terms of coverage probabilities with reasonable interval widths. A real data set from an otolaryngologic study is used to illustrate the proposed methodologies.


Sign in / Sign up

Export Citation Format

Share Document