Reproducible Research in R: A tutorial on how to do the same thing more than once

Mapping Intimacies ◽

10.31234/osf.io/fwxs4 ◽

2021 ◽

Author(s):

Aaron Peikert ◽

Caspar J. Van Lissa ◽

Andreas Markus Brandmaier

Keyword(s):

Degrees Of Freedom ◽

Scientific Progress ◽

Computer Code ◽

Scientific Productivity ◽

R Package ◽

Reproducible Research ◽

Final Report ◽

Wide Range ◽

Independent Person ◽

Primary Instrument

Reproducibility has long been considered integral to the scientific method. Something is called reproducible when an independent person obtains the same results from the same data. Until recently, detailed descriptions of methods and analyses were the primary instrument for ensuring scientific reproducibility. Technological advancements now enable scientists to achieve a more comprehensive standard; one in which any individual can be granted access to a digital research repository, and reproduce the analyses from the raw data to the final report including all relevant statistical analyses with a single command. This method has far-reaching implications for scientific archiving, reproducibility and replication, scientific productivity, and the credibility and reliability of scientific findings. One obstacle preventing the widespread adoption of this method is that the underlying technological advancements are complicated to use. This paper introduces `repro`, an R-package, which guides researchers in the installation and use of the tools required for making a research project reproducible. Finally, we suggest the use of the proposed tools for the preregistration of study plans as reproducible computer code (preregistration as code; PAC). Since computer code represents the planned analyses exactly as they will be executed, it is more precise than natural language descriptions of those analyses, which merely complement the PAC as a more readable summary. PAC circumvents the shortcomings of ambiguous preregistrations that may give researchers undesired degrees of freedom. Hence, reproducibility made convenient with automation has a wide range of applications to accelerate scientific progress.

Get full-text (via PubEx)

Reproducible Research in R: A Tutorial on How to Do the Same Thing More Than Once

Psych ◽

10.3390/psych3040053 ◽

2021 ◽

Vol 3 (4) ◽

pp. 836-867

Author(s):

Aaron Peikert ◽

Caspar J. van Lissa ◽

Andreas M. Brandmaier

Keyword(s):

Software Engineering ◽

Building Block ◽

Community Building ◽

Computer Code ◽

R Package ◽

Research Process ◽

Reproducible Research ◽

Computational Results ◽

Research Projects ◽

Engineering Community

Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, to reproduce and thus understand how results came about, while avoiding a variety of errors that may lead to erroneous reporting of statistical and computational results. In this tutorial, we demonstrate how the R package repro supports researchers in creating fully computationally reproducible research projects with tools from the software engineering community. Building upon this notion of fully automated reproducibility, we present several applications including the preregistration of research plans with code (Preregistration as Code, PAC). PAC eschews all ambiguity of traditional preregistration and offers several more advantages. Making technical advancements that serve reproducibility more widely accessible for researchers holds the potential to innovate the research process and to help it become more productive, credible, and reliable.

Get full-text (via PubEx)

Fisher Scoring for crossed factor linear mixed models

Statistics and Computing ◽

10.1007/s11222-021-10026-6 ◽

2021 ◽

Vol 31 (5) ◽

Author(s):

Thomas Maullin-Sapey ◽

Thomas E. Nichols

Keyword(s):

Degrees Of Freedom ◽

Mixed Model ◽

Linear Mixed Model ◽

Real Data ◽

R Package ◽

Single Factor ◽

Gradient Estimation ◽

Fisher Scoring ◽

Wide Range ◽

Inference Methods

AbstractThe analysis of longitudinal, heterogeneous or unbalanced clustered data is of primary importance to a wide range of applications. The linear mixed model (LMM) is a popular and flexible extension of the linear model specifically designed for such purposes. Historically, a large proportion of material published on the LMM concerns the application of popular numerical optimization algorithms, such as Newton–Raphson, Fisher Scoring and expectation maximization to single-factor LMMs (i.e. LMMs that only contain one “factor” by which observations are grouped). However, in recent years, the focus of the LMM literature has moved towards the development of estimation and inference methods for more complex, multi-factored designs. In this paper, we present and derive new expressions for the extension of an algorithm classically used for single-factor LMM parameter estimation, Fisher Scoring, to multiple, crossed-factor designs. Through simulation and real data examples, we compare five variants of the Fisher Scoring algorithm with one another, as well as against a baseline established by the R package lme4, and find evidence of correctness and strong computational efficiency for four of the five proposed approaches. Additionally, we provide a new method for LMM Satterthwaite degrees of freedom estimation based on analytical results, which does not require iterative gradient estimation. Via simulation, we find that this approach produces estimates with both lower bias and lower variance than the existing methods.

Get full-text (via PubEx)

Understanding Conformational Entropy in Small Molecules

10.26434/chemrxiv.12671027 ◽

2020 ◽

Author(s):

Lucian Chan ◽

Garrett Morris ◽

Geoffrey Hutchison

Keyword(s):

Small Molecules ◽

Degrees Of Freedom ◽

Absolute Error ◽

Low Energy ◽

Standard Entropy ◽

Free Energies ◽

Conformational Entropy ◽

Wide Range ◽

High Degree ◽

Empirical Corrections

The calculation of the entropy of flexible molecules can be challenging, since the number of possible conformers grows exponentially with molecule size and many low-energy conformers may be thermally accessible. Different methods have been proposed to approximate the contribution of conformational entropy to the molecular standard entropy, including performing thermochemistry calculations with all possible stable conformations, and developing empirical corrections from experimental data. We have performed conformer sampling on over 120,000 small molecules generating some 12 million conformers, to develop models to predict conformational entropy across a wide range of molecules. Using insight into the nature of conformational disorder, our cross-validated physically-motivated statistical model can outperform common machine learning and deep learning methods, with a mean absolute error ≈4.8 J/mol•K, or under 0.4 kcal/mol at 300 K. Beyond predicting molecular entropies and free energies, the model implies a high degree of correlation between torsions in most molecules, often as- sumed to be independent. While individual dihedral rotations may have low energetic barriers, the shape and chemical functionality of most molecules necessarily correlate their torsional degrees of freedom, and hence restrict the number of low-energy conformations immensely. Our simple models capture these correlations, and advance our understanding of small molecule conformational entropy.

Get full-text (via PubEx)

Soil Chemical Pollution and Aggressive Pathologies

Revista de Chimie ◽

10.37358/rc.18.8.6515 ◽

2018 ◽

Vol 69 (8) ◽

pp. 2278-2282

Author(s):

Stelian Ioan Morariu ◽

Letitia Doina Duceac ◽

Alina Costina Luca ◽

Florina Popescu ◽

Liliana Pavel ◽

...

Keyword(s):

Health Care Professionals ◽

Scientific Progress ◽

Volcanic Eruptions ◽

Chemical Pollution ◽

Optimal Parameters ◽

Chemical Pollutants ◽

Wide Range ◽

Health Physician ◽

Solid Liquid

Maintaining the soil in optimal parameters is vital for mankind, given its essential role in providing the alimentary base, as well as its extremely slow formation and regeneration (hundreds or thousands of years). The direct and indirect pollution of the soil and especially its chemical pollution represent a corollary of other types of pollution, given that it is produced by solid, liquid and gaseous residues. It may be involved in a wide range of diseases (respiratory, cardiovascular, digestive, renal, haematological, osteoarticular, neurological) of allergic, infectious, degenerative or neoplastic nature, from infancy to the old age. Although there are natural causes of soil pollution (e.g. volcanic eruptions), most pollutants come from human activities, which are the most incriminated in its pollution, degradation and erosion at an accelerated pace. The growing concern of all nations for the adoption of measures to limit the chemical pollution of the soil is partially found so far in viable and effective solutions intended to combat soil contamination and degradation and ensure its restoration. Chemical industrialization leads to technical and scientific progress, but at the same time it can develop related pathologies, which means that the role of the occupational health physician is essential in ensuring prophylaxis and the early detection of occupational diseases. Besides that, the role of the pediatrician is equally precious for the detection of specific diseases caused by chemical pollutants to children, because they will develop into adults with pathological stigma.The chemical pollution of the soil is a major challenge for ecologists, given that it is an important risk factor for many types of afflictions. It requires maximum attention from civil society, health care professionals and government institutions. The specialist in occupational medicine, as well as the pediatrician bear an essential responsibility in both, prevention and treatment.

Get full-text (via PubEx)

Quantum information probes of charge fractionalization in large-N gauge theories

Journal of High Energy Physics ◽

10.1007/jhep05(2021)149 ◽

2021 ◽

Vol 2021 (5) ◽

Author(s):

Brandon S. DiNunno ◽

Niko Jokela ◽

Juan F. Pedraza ◽

Arttu Pönni

Keyword(s):

Degrees Of Freedom ◽

Gauge Theories ◽

Entanglement Entropy ◽

Coarse Grained ◽

Strongly Coupled ◽

Information Theoretic ◽

Large N ◽

Wide Range ◽

Charge Fractionalization ◽

Electric Flux

Abstract We study in detail various information theoretic quantities with the intent of distinguishing between different charged sectors in fractionalized states of large-N gauge theories. For concreteness, we focus on a simple holographic (2 + 1)-dimensional strongly coupled electron fluid whose charged states organize themselves into fractionalized and coherent patterns at sufficiently low temperatures. However, we expect that our results are quite generic and applicable to a wide range of systems, including non-holographic. The probes we consider include the entanglement entropy, mutual information, entanglement of purification and the butterfly velocity. The latter turns out to be particularly useful, given the universal connection between momentum and charge diffusion in the vicinity of a black hole horizon. The RT surfaces used to compute the above quantities, though, are largely insensitive to the electric flux in the bulk. To address this deficiency, we propose a generalized entanglement functional that is motivated through the Iyer-Wald formalism, applied to a gravity theory coupled to a U(1) gauge field. We argue that this functional gives rise to a coarse grained measure of entanglement in the boundary theory which is obtained by tracing over (part) of the fractionalized and cohesive charge degrees of freedom. Based on the above, we construct a candidate for an entropic c-function that accounts for the existence of bulk charges. We explore some of its general properties and their significance, and discuss how it can be used to efficiently account for charged degrees of freedom across different energy scales.

Get full-text (via PubEx)

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Get full-text (via PubEx)

Analysis of Dynamic Response of a Two Degrees of Freedom (2-DOF) Ball Bearing Nonlinear Model

Applied Sciences ◽

10.3390/app11020787 ◽

2021 ◽

Vol 11 (2) ◽

pp. 787

Author(s):

Bartłomiej Ambrożkiewicz ◽

Grzegorz Litak ◽

Anthimos Georgiadis ◽

Nicolas Meier ◽

Alexander Gassner

Keyword(s):

Mathematical Model ◽

Degrees Of Freedom ◽

Ball Bearing ◽

Hertzian Contact ◽

Two Degrees Of Freedom ◽

Wide Range ◽

Shape Errors ◽

Nonlinear Features ◽

Fourier Transform Phase ◽

Hertzian Contact Theory

Often the input values used in mathematical models for rolling bearings are in a wide range, i.e., very small values of deformation and damping are confronted with big values of stiffness in the governing equations, which leads to miscalculations. This paper presents a two degrees of freedom (2-DOF) dimensionless mathematical model for ball bearings describing a procedure, which helps to scale the problem and reveal the relationships between dimensionless terms and their influence on the system’s response. The derived mathematical model considers nonlinear features as stiffness, damping, and radial internal clearance referring to the Hertzian contact theory. Further, important features are also taken into account including an external load, the eccentricity of the shaft-bearing system, and shape errors on the raceway investigating variable dynamics of the ball bearing. Analysis of obtained responses with Fast Fourier Transform, phase plots, orbit plots, and recurrences provide a rich source of information about the dynamics of the system and it helped to find the transition between the periodic and chaotic response and how it affects the topology of RPs and recurrence quantificators.

Get full-text (via PubEx)

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Get full-text (via PubEx)

Estimation of the Thermodynamic Properties of Branched Hydrocarbons

Journal of Energy Resources Technology ◽

10.1115/1.1286123 ◽

2000 ◽

Vol 122 (3) ◽

pp. 147-152 ◽

Cited By ~ 4

Author(s):

Hui He ◽

Mohamad Metghalchi ◽

James C. Keck

Keyword(s):

Thermodynamic Properties ◽

Degrees Of Freedom ◽

Statistical Thermodynamic ◽

Motion Modes ◽

Branched Alkanes ◽

Branched Hydrocarbons ◽

Wide Range ◽

On Line ◽

Kinetic Calculations ◽

Good Agreement

A simple model has been developed to estimate the sensible thermodynamic properties such as Gibbs free energy, enthalpy, heat capacity, and entropy of hydrocarbons over a wide range of temperatures with special attention to the branched molecules. The model is based on statistical thermodynamic expressions incorporating translational, rotational and vibrational motions of the atoms. A method to determine the number of degrees of freedom for different motion modes (bending and torsion) has been established. Branched rotational groups, such as CH3 and OH, have been considered. A modification of the characteristic temperatures for different motion mode has been made which improves the agreement with the exact values for simple cases. The properties of branched alkanes up to 2,3,4,-trimthylpentane have been calculated and the results are in good agreement with the experimental data. A relatively small number of parameters are needed in this model to estimate the sensible thermodynamic properties of a wide range of species. The model may also be used to estimate the properties of molecules and their isomers, which have not been measured, and is simple enough to be easily programmed as a subroutine for on-line kinetic calculations. [S0195-0738(00)00902-X]

Get full-text (via PubEx)

Numerical investigation of the propagation of shock waves in rigid porous materials: development of the computer code and comparison with experimental results

Journal of Fluid Mechanics ◽

10.1017/s0022112096007872 ◽

1996 ◽

Vol 324 ◽

pp. 163-179 ◽

Cited By ~ 21

Author(s):

A. Levy ◽

G. Ben-Dor ◽

S. Sorek

Keyword(s):

Porous Materials ◽

Initial Conditions ◽

Computer Code ◽

Experimental Results ◽

Numerical Code ◽

Materials Development ◽

Numerical Predictions ◽

Wide Range ◽

Dimensional Version ◽

The One

The governing equations of the flow field which is obtained when a thermoelastic rigid porous medium is struck head-one by a shock wave are developed using the multiphase approach. The one-dimensional version of these equations is solved numerically using a TVD-based numerical code. The numerical predictions are compared to experimental results and good to excellent agreements are obtained for different porous materials and a wide range of initial conditions.

Get full-text (via PubEx)