FIPS: An R Package for Biomathematical Modelling of Human Fatigue Related Impairment

In many workplace contexts, accurate predictions of a human’s fatigue state can drastically improve system safety. Biomathematical models of fatigue (BMMs) are a family of dynamic phenomenological models that predict the neurobehavioural outcomes of fatigue (e.g., sleepiness, performance impairment) based on sleep/wake history (Dawson, Darwent, & Roach, 2017). However, to-date there are no open source implementations of BMMs, and this presents a significant barrier to their broadscale adoption by researchers and industry practitioners. FIPS is an open source R package (R Core Team, 2020) to facilitate BMM research and simulation. FIPS has implementations of several published bio-mathematical models and includes functions for easily manipulating sleep history data into the required data structures. FIPS also includes default plot and summary methods to aid model interpretation. Model objects follow tidy data conventions (Wickham, 2014), enabling FIPS to be integrated into existing research workflows of R users.

Download Full-text

tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R

BMC Bioinformatics ◽

10.1186/s12859-021-03967-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Charlie M. Carpenter ◽

Daniel N. Frank ◽

Kayla Williamson ◽

Jaron Arbet ◽

Brandie D. Wagner ◽

...

Keyword(s):

Microbial Communities ◽

Open Source ◽

Data Structures ◽

Negative Binomial ◽

Rocky Mountain ◽

R Package ◽

Microbiome Analysis ◽

External Data ◽

Data Tables ◽

Microbiome Data

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.

Download Full-text

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Twin Research and Human Genetics ◽

10.1017/thg.2014.25 ◽

2014 ◽

Vol 17 (4) ◽

Cited By ~ 2

Author(s):

Raymond K. Walters ◽

Charles Laurin ◽

Gitta H. Lubke

Keyword(s):

Power Analysis ◽

R Package ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Epistatic Interactions ◽

Model Interpretation ◽

Genome Wide ◽

Using Data ◽

Power Analyses

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Download Full-text

mixl: An open-source R package for estimating complex choice models on large datasets

Journal of Choice Modelling ◽

10.1016/j.jocm.2021.100284 ◽

2021 ◽

Vol 39 ◽

pp. 100284

Author(s):

Joseph Molloy ◽

Felix Becker ◽

Basil Schmid ◽

Kay W. Axhausen

Keyword(s):

Open Source ◽

R Package ◽

Choice Models ◽

Large Datasets

Download Full-text

An open-source R-package and web application for high-quality probabilistic predictions in hydrology

10.5194/egusphere-egu21-8549 ◽

2021 ◽

Author(s):

Jason Hunter ◽

Mark Thyer ◽

Dmitri Kavetski ◽

David McInerney

Keyword(s):

Open Source ◽

Web Application ◽

R Package ◽

Error Model ◽

Objective Functions ◽

High Quality ◽

Wide Range ◽

Probabilistic Error

Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application. &#160;We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.

Download Full-text

Development of a New Bead Movement-Based Computational Framework Shows that Bacterial Amyloid Curli Reduces Bead Mobility in Biofilms

Journal of Bacteriology ◽

10.1128/jb.00253-20 ◽

2020 ◽

Vol 202 (18) ◽

Author(s):

K. Malhotra ◽

T. Hunter ◽

B. Henry ◽

Y. Ishmail ◽

P. Gaddameedi ◽

...

Keyword(s):

Mathematical Models ◽

Open Source ◽

Material Properties ◽

Laser Scanning ◽

Cell Movement ◽

Laser Scanning Confocal Microscopy ◽

Critical Parameters ◽

Gastrointestinal Microbiota ◽

Content Type ◽

Scanning Confocal Microscopy

ABSTRACT Biofilms exist in complex environments, including the intestinal tract, as a part of the gastrointestinal microbiota. The interaction of planktonic bacteria with biofilms can be influenced by material properties of the biofilm. During previous confocal studies, we observed that amyloid curli-containing Salmonella enterica serotype Typhimurium and Escherichia coli biofilms appeared rigid. In these studies, Enterococcus faecalis, which lacks curli-like protein, showed more fluid movement. To better characterize the material properties of the biofilms, a four-dimensional (4D) model was designed to track the movement of 1-μm glyoxylate beads in 10- to 20-μm-thick biofilms over approximately 20 min using laser-scanning confocal microscopy. Software was developed to analyze the bead trajectories, the amount of time they could be followed (trajectory life span), the velocity of movement, the surface area covered (bounding boxes), and cellular density around each bead. Bead movement was found to be predominantly Brownian motion. Curli-containing biofilms had very little bead movement throughout the low- and high-density regions of the biofilm compared to E. faecalis and isogenic curli mutants. Curli-containing biofilms tended to have more stable bead interactions (longer trajectory life spans) than biofilms lacking curli. In biofilms lacking curli, neither the velocity of bead movement nor the bounding box volume was strictly dependent on cell density, suggesting that other material properties of the biofilms were influencing the movement of the beads and flexibility of the material. Taken together, these studies present a 4D method to analyze bead movement over time in a 3D biofilm and suggest curli confers rigidity to the extracellular matrix of biofilms. IMPORTANCE Mathematical models are necessary to understand how the material composition of biofilms can influence their physical properties. Here, we developed a 4D computational toolchain for the analysis of bead trajectories, which laid the groundwork for establishing critical parameters for mathematical models of particle movement in biofilms. Using this open-source trajectory analyzer, we determined that the presence of bacterial amyloid curli changes the material properties of a biofilm, making the biofilm matrix rigid. This software is a powerful tool to analyze treatment- and environment-induced changes in biofilm structure and cell movement in biofilms. The open-source analyzer is fully adaptable and extendable in a modular fashion using VRL-Studio to further enhance and extend its functions.

Download Full-text

An open source database for the synthesis of soil radiocarbon data: ISRaD version 1.0

10.5194/essd-2019-55 ◽

2019 ◽

Cited By ~ 2

Author(s):

Corey R. Lawrence ◽

Jeffery Beem-Miller ◽

Alison M. Hoyt ◽

Grey Monroe ◽

Carlos A. Sierra ◽

...

Keyword(s):

Soil Carbon ◽

Open Source ◽

Soil Surface ◽

Original Data ◽

R Package ◽

Direct Access ◽

Temporal Scales ◽

Spatial And Temporal Scales ◽

Starting Point ◽

Radiocarbon Data

Abstract. Radiocarbon is a critical constraint on our estimates of the timescales of soil carbon cycling that can aid in identifying mechanisms of carbon stabilization and destabilization, and improve forecast of soil carbon response to management or environmental change. Despite the wealth of soil radiocarbon data that has been reported over the past 75 years, the ability to apply these data to global scale questions is limited by our capacity to synthesis and compare measurements generated using a variety of methods. Here we describe the International Soil Radiocarbon Database (ISRaD, soilradiocarbon.org), an open-source archive of soils data that include data from bulk soils, or whole-soils; distinct soil carbon pools isolated in the laboratory by a variety of soil fractionation methods; samples of soil gas or water collected interstitially from within an intact soil profile; CO2 gas isolated from laboratory soil incubations; and fluxes collected in situ from a soil surface. The core of ISRaD is a relational database structured around individual datasets (entries) and organized hierarchically to report soil radiocarbon data, measured at different physical and temporal scales, as well as other soil or environmental properties that may also be measured at one or more levels of the hierarchy that may assist with interpretation and context. Anyone may contribute their own data to the database by entering it into the ISRaD template and subjecting it to quality assurance protocols. ISRaD can be accessed through: (1) a web-based interface, (2) an R package (ISRaD), or (3) direct access to code and data through the GitHub repository, which hosts both code and data. The design of ISRaD allows for participants to become directly involved in the management, design, and application of ISRaD data. The synthesized dataset is available in two forms: the original data as reported by the authors of the datasets; and an enhanced dataset that includes ancillary geospatial data calculated within the ISRaD framework. ISRaD also provides data management tools in the ISRaD-R package that provide a starting point for data analysis. This community-based dataset and platform for soil radiocarbon and a wide array of additional soils data information in soils where data are easy to contribute and the community is invited to add tools and ideas for improvement. As a whole, ISRaD provides resources that can aid our evaluation of soil dynamics and improve our understanding of controls on soil carbon dynamics across a range of spatial and temporal scales. The ISRaD v1.0 dataset (Lawrence et al., 2019) is archived and freely available at https://doi.org/10.5281/zenodo.2613911.

Download Full-text

fullsibQTL: an R package for QTL mapping in biparental populations of outcrossing species

10.1101/2020.12.04.412262 ◽

2020 ◽

Author(s):

Rodrigo Gazaffi ◽

Rodrigo R. Amadeu ◽

Marcelo Mollinari ◽

João R. B. F. Rosa ◽

Cristiane H. Taniguti ◽

...

Keyword(s):

Qtl Mapping ◽

Open Source ◽

Qtl Analysis ◽

Source Code ◽

R Package ◽

Genetic Maps ◽

Linkage Phase ◽

Position Effects ◽

Genetic Features ◽

Outcrossing Species

ABSTRACTAccurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL, a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL.

Download Full-text

PITR: A New Open Source R Package for PIT Telemetry Data

Fisheries ◽

10.1002/fsh.10027 ◽

2018 ◽

Vol 43 (1) ◽

pp. 5-5 ◽

Cited By ~ 1

Author(s):

Joel M. S. Harding ◽

Douglas C. Braun ◽

Nicholas J. Burnett ◽

Annika Putt

Keyword(s):

Open Source ◽

R Package ◽

Telemetry Data ◽

Pit Telemetry

Download Full-text

Flexible modelling of spatial variation in agricultural field trials with the R package INLA

Theoretical and Applied Genetics ◽

10.1007/s00122-019-03424-y ◽

2019 ◽

Vol 132 (12) ◽

pp. 3277-3293 ◽

Cited By ~ 6

Author(s):

Maria Lie Selle ◽

Ingelin Steinsland ◽

John M. Hickey ◽

Gregor Gorjanc

Keyword(s):

Spatial Variation ◽

Open Source ◽

Field Trial ◽

Spatial Models ◽

R Package ◽

Field Trials ◽

Wheat Breeding ◽

Genetic Effects ◽

Agricultural Field ◽

Combining Data

Abstract Key message Established spatial models improve the analysis of agricultural field trials with or without genomic data and can be fitted with the open-source R package INLA. Abstract The objective of this paper was to fit different established spatial models for analysing agricultural field trials using the open-source R package INLA. Spatial variation is common in field trials, and accounting for it increases the accuracy of estimated genetic effects. However, this is still hindered by the lack of available software implementations. We compare some established spatial models and show possibilities for flexible modelling with respect to field trial design and joint modelling over multiple years and locations. We use a Bayesian framework and for statistical inference the integrated nested Laplace approximations (INLA) implemented in the R package INLA. The spatial models we use are the well-known independent row and column effects, separable first-order autoregressive ($$\mathrm{AR1} \otimes \mathrm{AR1}$$ AR 1 ⊗ AR 1 ) models and a Gaussian random field (Matérn) model that is approximated via the stochastic partial differential equation approach. The Matérn model can accommodate flexible field trial designs and yields interpretable parameters. We test the models in a simulation study imitating a wheat breeding programme with different levels of spatial variation, with and without genome-wide markers and with combining data over two locations, modelling spatial and genetic effects jointly. The results show comparable predictive performance for both the $$\mathrm{AR1} \otimes \mathrm{AR1}$$ AR 1 ⊗ AR 1 and the Matérn models. We also present an example of fitting the models to a real wheat breeding data and simulated tree breeding data with the Nelder wheel design to show the flexibility of the Matérn model and the R package INLA.

Download Full-text

SIGN: similarity identification in gene expression

Bioinformatics ◽

10.1093/bioinformatics/btz485 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4830-4833 ◽

Cited By ~ 1

Author(s):

Seyed Ali Madani Tonekaboni ◽

Venkata Satya Kumar Manem ◽

Nehme El-Hachem ◽

Benjamin Haibe-Kains

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Open Source ◽

Supervised Learning ◽

Cancer Patients ◽

Biomedical Research ◽

Expression Patterns ◽

R Package ◽

Gene Expression Patterns ◽

Learning Schemes

Abstract Motivation High-throughput molecular profiles of human cells have been used in predictive computational approaches for stratification of healthy and malignant phenotypes and identification of their biological states. In this regard, pathway activities have been used as biological features in unsupervised and supervised learning schemes. Results We developed SIGN (Similarity Identification in Gene expressioN), a flexible open-source R package facilitating the use of pathway activities and their expression patterns to identify similarities between biological samples. We defined a new measure, the transcriptional similarity coefficient, which captures similarity of gene expression patterns, instead of quantifying overall activity, in biological pathways between the samples. To demonstrate the utility of SIGN in biomedical research, we establish that SIGN discriminates subtypes of breast tumors and patients with good or poor overall survival. SIGN outperforms the best models in DREAM challenge in predicting survival of breast cancer patients using the data from the Molecular Taxonomy of Breast Cancer International Consortium. In summary, SIGN can be used as a new tool for interrogating pathway activity and gene expression patterns in unsupervised and supervised learning schemes to improve prognostic risk estimation for cancer patients by the biomedical research community. Availability and implementation An open-source R package is available (https://cran.r-project.org/web/packages/SIGN/).

Download Full-text