scholarly journals Combined statistical modeling enables accurate mining of circadian transcription

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Andrea Rubio-Ponce ◽  
Iván Ballesteros ◽  
Juan A Quintana ◽  
Guiomar Solanas ◽  
Salvador A Benitah ◽  
...  

Abstract Circadian-regulated genes are essential for tissue homeostasis and organismal function, and are therefore common targets of scrutiny. Detection of rhythmic genes using current analytical tools requires exhaustive sampling, a demand that is costly and raises ethical concerns, making it unfeasible in certain mammalian systems. Several non-parametric methods have been commonly used to analyze short-term (24 h) circadian data, such as JTK_cycle and MetaCycle. However, algorithm performance varies greatly depending on various biological and technical factors. Here, we present CircaN, an ad-hoc implementation of a non-linear mixed model for the identification of circadian genes in all types of omics data. Based on the variable but complementary results obtained through several biological and in silico datasets, we propose a combined approach of CircaN and non-parametric models to dramatically improve the number of circadian genes detected, without affecting accuracy. We also introduce an R package to make this approach available to the community.

Author(s):  
Yang Hai ◽  
Yalu Wen

Abstract Motivation Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. Results We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. Availability The R-package is available at https://github.com/yhai943/BLMM Supplementary information Supplementary data are available at Bioinformatics online.


Forests ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 254 ◽  
Author(s):  
Omar Cabrera ◽  
Andreas Fries ◽  
Patrick Hildebrandt ◽  
Sven Günter ◽  
Reinhard Mosandl

Research Highlights: This study determined that treatment “release from competitors” causes different reactions in selected timber species respective to diametrical growth, in which the initial size of the tree (diametric class) is important. Also, the growth habit and phenological traits (defoliation) of the species must be considered, which may have an influence on growth after release. Background and Objectives: The objective of the study was to analyze the diametric growth of nine timber species after their release to answer the following questions: (i) Can the diametric growth of the selected timber species be increased by release? (ii) Does the release cause different responses among the tree species? (iii) Are other factors important, such as the initial diameter at breast height (DBH) or the general climate conditions? Materials and Methods: Four-hundred and eighty-eight trees belonging to nine timber species were selected and monitored over a three-year period. Release was applied to 197 trees, whereas 251 trees served as control trees to evaluate the response of diametrical growth. To determine the response of the trees, a linear mixed model (GLMM, R package: LMER4) was used, which was adjusted by a one-way ANOVA test. Results: All species showed a similar annual cycle respective to diametric increases, which is due to the per-humid climate in the area. Precipitation is secondary for the diametric growth because sufficient rainfall occurs throughout year. What is more important, however, are variations in temperature. However, the species responded differently to release. This is because the initial DBH and growth habit are more important factors. Therefore, the species could be classified into three specific groups: Positive, negative and no response to release. Conclusions: Species which prefer open sites responded positively to release, while shade tolerant species and species with pronounced phenological traits responded negatively. The initial DBH was also an important factor for diametric increases. This is because trees of class I (20 cm to 30 cm DBH) responded positively to the treatment, whereas for bigger or older individuals, the differences decreased or became negative.


2019 ◽  
Vol 36 (6) ◽  
pp. 1785-1794
Author(s):  
Jun Li ◽  
Qing Lu ◽  
Yalu Wen

Abstract Motivation The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. Results We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer’s Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. Availability and implementation The R-package is available at https://github.com/YaluWen/OmicPred. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Jihong Zhang ◽  
Terry Ackerman ◽  
Yurou Wang

Fitting item response theory (IRT) models using the generalized mixed logistic regression model (GLMM) has become more popular in large-scale assessment because GLMM allows combining complicated multilevel structures (i.e., students are nested in classrooms which are nested in schools) with IRT measurement models. However, the estimation accuracy of item parameters between these two models is not well examined. This study aimed to compare the estimation results of the GLMM based 2PL model (using the PLmixed R package) with the traditional IRT model (using flexMIRT software) under different sample sizes (N= 500, 1000, 5000) and test length (J = 15, 21) conditions. The simulation results showed that for both the GLMM-based method and the traditional method, item threshold estimates had lower bias than item discrimination parameters. We also found that according to the simulation study, GLMM estimates via PLmixed had lower accuracy than traditional IRT modeling via flexMIRT for items with high discrimination.


2021 ◽  
Vol 31 (5) ◽  
Author(s):  
Thomas Maullin-Sapey ◽  
Thomas E. Nichols

AbstractThe analysis of longitudinal, heterogeneous or unbalanced clustered data is of primary importance to a wide range of applications. The linear mixed model (LMM) is a popular and flexible extension of the linear model specifically designed for such purposes. Historically, a large proportion of material published on the LMM concerns the application of popular numerical optimization algorithms, such as Newton–Raphson, Fisher Scoring and expectation maximization to single-factor LMMs (i.e. LMMs that only contain one “factor” by which observations are grouped). However, in recent years, the focus of the LMM literature has moved towards the development of estimation and inference methods for more complex, multi-factored designs. In this paper, we present and derive new expressions for the extension of an algorithm classically used for single-factor LMM parameter estimation, Fisher Scoring, to multiple, crossed-factor designs. Through simulation and real data examples, we compare five variants of the Fisher Scoring algorithm with one another, as well as against a baseline established by the R package lme4, and find evidence of correctness and strong computational efficiency for four of the five proposed approaches. Additionally, we provide a new method for LMM Satterthwaite degrees of freedom estimation based on analytical results, which does not require iterative gradient estimation. Via simulation, we find that this approach produces estimates with both lower bias and lower variance than the existing methods.


2015 ◽  
Author(s):  
Pierre de Villemereuil ◽  
Holger Schielzeth ◽  
Shinichi Nakagawa ◽  
Michael B. Morrissey

AbstractMethods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioural traits, have inherently non-normal distributions. The generalised linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for non-normal traits. However, whereas GLMMs provide inference on a statistically-convenient latent scale, it will often be desirable to estimate quantitative genetic parameters on the scale upon which traits are expressed. The parameters of a fitted GLMM, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulae for quantities such as heritability of traits with Binomial and Poisson distributions are special cases of our expressions. Additionally, we show how a fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation, and apply our approach to data from a pedigreed vertebrate population.


2020 ◽  
Author(s):  
James L. Peugh ◽  
Sarah J. Beal ◽  
Meghan E. McGrady ◽  
Michael D. Toland ◽  
Constance Mara

2020 ◽  
Vol 641 ◽  
pp. 159-175
Author(s):  
J Runnebaum ◽  
KR Tanaka ◽  
L Guan ◽  
J Cao ◽  
L O’Brien ◽  
...  

Bycatch remains a global problem in managing sustainable fisheries. A critical aspect of management is understanding the timing and spatial extent of bycatch. Fisheries management often relies on observed bycatch data, which are not always available due to a lack of reporting or observer coverage. Alternatively, analyzing the overlap in suitable habitat for the target and non-target species can provide a spatial management tool to understand where bycatch interactions are likely to occur. Potential bycatch hotspots based on suitable habitat were predicted for cusk Brosme brosme incidentally caught in the Gulf of Maine American lobster Homarus americanus fishery. Data from multiple fisheries-independent surveys were combined in a delta-generalized linear mixed model to generate spatially explicit density estimates for use in an independent habitat suitability index. The habitat suitability indices for American lobster and cusk were then compared to predict potential bycatch hotspot locations. Suitable habitat for American lobster has increased between 1980 and 2013 while suitable habitat for cusk decreased throughout most of the Gulf of Maine, except for Georges Basin and the Great South Channel. The proportion of overlap in suitable habitat varied interannually but decreased slightly in the spring and remained relatively stable in the fall over the time series. As Gulf of Maine temperatures continue to increase, the interactions between American lobster and cusk are predicted to decline as cusk habitat continues to constrict. This framework can contribute to fisheries managers’ understanding of changes in habitat overlap as climate conditions continue to change and alter where bycatch interactions could occur.


Sign in / Sign up

Export Citation Format

Share Document