More than one way: exploring the capabilities of different estimation approaches to joint models for longitudinal and time-to-event outcomes

Abstract The development of physical functioning after a caesura in an aged population is still widely unexplored. Analysis of this topic would need to model the longitudinal trajectories of physical functioning and simultaneously take terminal events (deaths) into account. Separate analysis of both results in biased estimates, since it neglects the inherent connection between the two outcomes. Thus, this type of data generating process is best modelled jointly. To facilitate this several software applications were made available. They differ in model formulation, estimation technique (likelihood-based, Bayesian inference, statistical boosting) and a comparison of the different approaches is necessary to identify their capabilities and limitations. Therefore, we compared the performance of the packages JM, joineRML, JMbayes and JMboost of the R software environment with respect to estimation accuracy, variable selection properties and prediction precision. With these findings we then illustrate the topic of physical functioning after a caesura with data from the German ageing survey (DEAS). The results suggest that in smaller data sets and theory driven modelling likelihood-based methods (expectation maximation, JM, joineRML) or Bayesian inference (JMbayes) are preferable, whereas statistical boosting (JMboost) is a better choice with high-dimensional data and data exploration settings.

Download Full-text

Improvements for research data repositories: The case of text spam

Journal of Information Science ◽

10.1177/0165551521998636 ◽

2021 ◽

pp. 016555152199863

Author(s):

Ismael Vázquez ◽

María Novo-Lourés ◽

Reyes Pavón ◽

Rosalía Laza ◽

José Ramón Méndez ◽

...

Keyword(s):

Web Application ◽

Research Data ◽

Data Sets ◽

Data Repositories ◽

Software Applications ◽

Public Data ◽

Protection Mechanisms ◽

Experimental Protocols ◽

Learning Research ◽

Processing Steps

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Download Full-text

Bayesian Inference of Species Trees using Diffusion Models

Systematic Biology ◽

10.1093/sysbio/syaa051 ◽

2020 ◽

Vol 70 (1) ◽

pp. 145-161 ◽

Cited By ~ 1

Author(s):

Marnus Stoltz ◽

Boris Baeumer ◽

Remco Bouckaert ◽

Colin Fox ◽

Gordon Hiscott ◽

...

Keyword(s):

Bayesian Inference ◽

Numerical Algorithms ◽

Diffusion Models ◽

Model Parameters ◽

Data Sets ◽

Species Trees ◽

Computationally Efficient ◽

Data Set ◽

Snp Data ◽

Binary Markers

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]

Download Full-text

A super-Earth and a mini-Neptune around Kepler-59

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz3369 ◽

2019 ◽

Vol 491 (4) ◽

pp. 5238-5247 ◽

Cited By ~ 1

Author(s):

X Saad-Olivera ◽

C F Martinez ◽

A Costa de Souza ◽

F Roig ◽

D Nesvorný

Keyword(s):

Stability Analysis ◽

Bayesian Inference ◽

Spectroscopic Analysis ◽

The Other ◽

Inversion Method ◽

Outer Planet ◽

Data Sets ◽

Dynamical Study ◽

Data Set ◽

Orbital Parameters

ABSTRACT We characterize the radii and masses of the star and planets in the Kepler-59 system, as well as their orbital parameters. The star parameters are determined through a standard spectroscopic analysis, resulting in a mass of $1.359\pm 0.155\, \mathrm{M}_\odot$ and a radius of $1.367\pm 0.078\, \mathrm{R}_\odot$. The obtained planetary radii are $1.5\pm 0.1\, R_\oplus$ for the inner and $2.2\pm 0.1\, R_\oplus$ for the outer planet. The orbital parameters and the planetary masses are determined by the inversion of Transit Timing Variations (TTV) signals. We consider two different data sets: one provided by Holczer et al. (2016), with TTVs only for Kepler-59c, and the other provided by Rowe et al. (2015), with TTVs for both planets. The inversion method applies an algorithm of Bayesian inference (MultiNest) combined with an efficient N-body integrator (Swift). For each of the data set, we found two possible solutions, both having the same probability according to their corresponding Bayesian evidences. All four solutions appear to be indistinguishable within their 2-σ uncertainties. However, statistical analyses show that the solutions from Rowe et al. (2015) data set provide a better characterization. The first solution infers masses of $5.3_{-2.1}^{+4.0}~M_{\mathrm{\oplus }}$ and $4.6_{-2.0}^{+3.6}~M_{\mathrm{\oplus }}$ for the inner and outer planet, respectively, while the second solution gives masses of $3.0^{+0.8}_{-0.8}~M_{\mathrm{\oplus }}$ and $2.6^{+0.9}_{-0.8}~M_{\mathrm{\oplus }}$. These values point to a system with an inner super-Earth and an outer mini-Neptune. A dynamical study shows that the planets have almost co-planar orbits with small eccentricities (e < 0.1), close to the 3:2 mean motion resonance. A stability analysis indicates that this configuration is stable over million years of evolution.

Download Full-text

Estimation of Emissions at Signalized Intersections Using an Improved MOVES Model with GPS Data

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16193647 ◽

2019 ◽

Vol 16 (19) ◽

pp. 3647 ◽

Cited By ~ 7

Author(s):

Ciyun Lin ◽

Xiangyu Zhou ◽

Dayong Wu ◽

Bowen Gong

Keyword(s):

Real Time ◽

Motor Vehicle ◽

Urban Air Pollution ◽

Scientific Basis ◽

Estimation Accuracy ◽

Transport Sector ◽

Gps Data ◽

Data Sets ◽

Environmental Decision Making ◽

Pollution Emissions

Emissions from the transport sector are responsible for a large proportion of urban air pollution. Scientific and efficient measurements on traffic pollution emissions have already been a vital concern of decision makers in environmental protection. In China or other counties, many high-technology companies, such as Baidu, DiDi, have a large number of real-time GPS traffic data, but such data have not been fully exploited, especially in purpose of estimation of vehicle fuel consumption and emissions. In this paper, the traditional MOVES (Motor Vehicle Emission Simulator) model has been improved by adding the real-time GPS data and tested in representative signalized intersection in Changchun, China. The results showed that adding the GPS data sets in the MOVES model can effectively improve the estimation accuracy of traffic emissions and provide a strong scientific basis for environmental decision-making, planning and management.

Download Full-text

A STATISTICAL FRAMEWORK FOR HAPLOTYPE BLOCK INFERENCE

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000500151x ◽

2005 ◽

Vol 03 (05) ◽

pp. 1021-1038

Author(s):

AO YUAN ◽

GUANJIE CHEN ◽

CHARLES ROTIMI ◽

GEORGE E. BONNEY

Keyword(s):

Bayesian Inference ◽

Model Selection ◽

Haplotype Block ◽

Block Structure ◽

Real Data ◽

Data Sets ◽

Block Partitioning ◽

Haplotype Blocks ◽

Statistical Framework ◽

Statistical Model Selection

The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.

Download Full-text

Simultaneous Bayesian inference on a finite mixture of mixed-effects Tobit joint models for longitudinal data with multiple features

Statistics and Its Interface ◽

10.4310/sii.2017.v10.n4.a3 ◽

2017 ◽

Vol 10 (4) ◽

pp. 557-573

Author(s):

Yangxin Huang ◽

Jiaqing Chen ◽

Ping Yin ◽

Huahai Qiu

Keyword(s):

Bayesian Inference ◽

Longitudinal Data ◽

Mixed Effects ◽

Finite Mixture ◽

Joint Models ◽

Multiple Features

Download Full-text

Eucalypt phylogeny — molecules and morphology

Australian Systematic Botany ◽

10.1071/sb9950483 ◽

1995 ◽

Vol 8 (4) ◽

pp. 483 ◽

Cited By ~ 32

Author(s):

PY Ladiges ◽

F Udovicic ◽

AN Drinnan

Keyword(s):

Parallel Evolution ◽

5S Rdna ◽

Taxonomic Revision ◽

Morphological Characters ◽

Morphological Data ◽

Separate Analysis ◽

Data Sets ◽

Data Set ◽

Seed Characters ◽

Different Levels

Molecular (5S rDNA spacer and chloroplast DNA RnPs) and morphological data sets are informative at different levels of the eucalypt clade. They allow separate analysis of major subclades, the results of which, when combined, give a single, phylogenetic tree for Angophora Cav. and Eucalyptus L'Hér. For taxonomic revision, the tree supports the recognition of bloodwood eucalypts as monophyletic, but shows that informal subgenus Corymbia Pryor & Johnson is paraphyletic. The tree supports recognition of three major clades within the non-bloodwood eucalypts ('eudesmids', 'symphyomyrts' and 'monocalypts') and suggests relationships for taxa within each of these. Ovule and seed characters proved to be most informative in the morphological data set. The phylogenetic hypothesis suggests interpretations for homoplasious morphological characters, including parallel evolution of sepaline and petaline opercula (and associated stemonophore) and types of conflorescence.

Download Full-text

Implementing Relevance Feedback in Ligand-Based Virtual Screening Using Bayesian Inference Network

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057111416658 ◽

2011 ◽

Vol 16 (9) ◽

pp. 1081-1088 ◽

Cited By ~ 13

Author(s):

Ammar Abdo ◽

Naomie Salim ◽

Ali Ahmed

Keyword(s):

Bayesian Inference ◽

Virtual Screening ◽

Bayesian Network ◽

Relevance Feedback ◽

Data Sets ◽

Reference Structure ◽

Feedback Information ◽

Screening Experiments ◽

Inference Network ◽

Report Data

Recently, the use of the Bayesian network as an alternative to existing tools for similarity-based virtual screening has received noticeable attention from researchers in the chemoinformatics field. The main aim of the Bayesian network model is to improve the retrieval effectiveness of similarity-based virtual screening. To this end, different models of the Bayesian network have been developed. In our previous works, the retrieval performance of the Bayesian network was observed to improve significantly when multiple reference structures or fragment weightings were used. In this article, the authors enhance the Bayesian inference network (BIN) using the relevance feedback information. In this approach, a few high-ranking structures of unknown activity were filtered from the outputs of BIN, based on a single active reference structure, to form a set of active reference structures. This set of active reference structures was used in two distinct techniques for carrying out such BIN searching: reweighting the fragments in the reference structures and group fusion techniques. Simulated virtual screening experiments with three MDL Drug Data Report data sets showed that the proposed techniques provide simple ways of enhancing the cost-effectiveness of ligand-based virtual screening searches, especially for higher diversity data sets.

Download Full-text

A Bayesian Hyperparameter Inference for Radon-Transformed Image Reconstruction

International Journal of Biomedical Imaging ◽

10.1155/2011/870252 ◽

2011 ◽

Vol 2011 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Hayaru Shouno ◽

Madomi Yamasaki ◽

Masato Okada

Keyword(s):

Bayesian Inference ◽

Image Reconstruction ◽

Prior Information ◽

Additive White Gaussian Noise ◽

Estimation Accuracy ◽

Reconstruction Method ◽

Model Framework ◽

Poissonian Noise ◽

Observation Noise

We develop a hyperparameter inference method for image reconstruction from Radon transform which often appears in the computed tomography, in the manner of Bayesian inference. Hyperparameters are often introduced in Bayesian inference to control the strength ratio between prior information and the fidelity to the observation. Since the quality of the reconstructed image is controlled by the estimation accuracy of these hyperparameters, we apply Bayesian inference into the filtered back-projection (FBP) reconstruction method with hyperparameters inference and demonstrate that the estimated hyperparameters can adapt to the noise level in the observation automatically. In the computer simulation, at first, we show that our algorithm works well in the model framework environment, that is, observation noise is an additive white Gaussian noise case. Then, we also show that our algorithm works well in the more realistic environment, that is, observation noise is Poissonian noise case. After that, we demonstrate an application for the real chest CT image reconstruction under the Gaussian and Poissonian observation noises.

Download Full-text