CorDiffViz: an R package for visualizing multi-omics differential correlation networks

Abstract Background Differential correlation networks are increasingly used to delineate changes in interactions among biomolecules. They characterize differences between omics networks under two different conditions, and can be used to delineate mechanisms of disease initiation and progression. Results We present a new R package, , that facilitates the estimation and visualization of differential correlation networks using multiple correlation measures and inference methods. The software is implemented in , and , and is available at https://github.com/sqyu/CorDiffViz. Visualization has been tested for the Chrome and Firefox web browsers. A demo is available at https://diffcornet.github.io/CorDiffViz/demo.html. Conclusions Our software offers considerable flexibility by allowing the user to interact with the visualization and choose from different estimation methods and visualizations. It also allows the user to easily toggle between correlation networks for samples under one condition and differential correlations between samples under two conditions. Moreover, the software facilitates integrative analysis of cross-correlation networks between two omics data sets.

Download Full-text

A descriptive marker gene approach to single-cell pseudotime inference

10.1101/060442 ◽

2016 ◽

Cited By ~ 5

Author(s):

Kieran R Campbell ◽

Christopher Yau

Keyword(s):

Single Cell ◽

Marker Gene ◽

Cell Types ◽

R Package ◽

Estimation Methods ◽

Marker Genes ◽

Peak Time ◽

Transient Behaviour ◽

Link Type ◽

Cell Gene Expression

AbstractPseudotime estimation from single-cell gene expression allows the recovery of temporal information from otherwise static profiles of individual cells. This pseudotemporal information can be used to characterise transient events in temporally evolving biological systems. Conventional algorithms typically emphasise an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. Here we introduce an orthogonal approach termed “Ouija” that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. In the following we introduce our model and demonstrate that in many instances a small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify “metastable” states - discrete cell types along the continuous trajectories - that recapitulate known cell types. Ouija therefore provides a powerful complimentary approach to existing whole transcriptome based pseudotime estimation methods. An open source implementation is available at http://www.github.com/kieranrcampbell/ouija as an R package and at http://www.github.com/kieranrcampbell/ouijaflow as a Python/TensorFlow package.

Download Full-text

AlleleShift: an R package to predict and visualize population-level changes in allele frequencies in response to climate change

PeerJ ◽

10.7717/peerj.11534 ◽

2021 ◽

Vol 9 ◽

pp. e11534

Author(s):

Roeland Kindt

Keyword(s):

Climate Change ◽

Environmental Gradients ◽

Additive Model ◽

Population Level ◽

R Package ◽

Allele Frequencies ◽

Data Sets ◽

Climate Data ◽

Link Type ◽

And Migration

Background At any particular location, frequencies of alleles that are associated with adaptive traits are expected to change in future climates through local adaption and migration, including assisted migration (human-implemented when climate change is more rapid than natural migration rates). Making the assumption that the baseline frequencies of alleles across environmental gradients can act as a predictor of patterns in changed climates (typically future but possibly paleo-climates), a methodology is provided by AlleleShift of predicting changes in allele frequencies at the population level. Methods The prediction procedure involves a first calibration and prediction step through redundancy analysis (RDA), and a second calibration and prediction step through a generalized additive model (GAM) with a binomial family. As such, the procedure is fundamentally different to an alternative approach recently proposed to predict changes in allele frequencies from canonical correspondence analysis (CCA). The RDA step is based on the Euclidean distance that is also the typical distance used in Analysis of Molecular Variance (AMOVA). Because the RDA step or CCA approach sometimes predict negative allele frequencies, the GAM step ensures that allele frequencies are in the range of 0 to 1. Results AlleleShift provides data sets with predicted frequencies and several visualization methods to depict the predicted shifts in allele frequencies from baseline to changed climates. These visualizations include ‘dot plot’ graphics (function shift.dot.ggplot), pie diagrams (shift.pie.ggplot), moon diagrams (shift.moon.ggplot), ‘waffle’ diagrams (shift.waffle.ggplot) and smoothed surface diagrams of allele frequencies of baseline or future patterns in geographical space (shift.surf.ggplot). As these visualizations were generated through the ggplot2 package, methods of generating animations for a climate change time series are straightforward, as shown in the documentation of AlleleShift and in the supplemental videos. Availability AlleleShift is available as an open-source R package from https://cran.r-project.org/package=AlleleShift and https://github.com/RoelandKindt/AlleleShift. Genetic input data is expected to be in the adegenet::genpop format, which can be generated from the adegenet::genind format. Climate data is available from various resources such as WorldClim and Envirem.

Download Full-text

rbioacc: an R-package to analyse toxicokinetic data

10.1101/2021.09.08.459421 ◽

2021 ◽

Author(s):

Aude Ratier ◽

Virgile Baudrot ◽

Miléna Kaag ◽

Aurélie Siberchicot ◽

Christelle Lopes ◽

...

Keyword(s):

Test Data ◽

Compartment Model ◽

R Package ◽

Data Set ◽

Link Type ◽

Full Compliance ◽

Regulatory Guidelines ◽

On Line ◽

Fit For Purpose ◽

Inference Methods

SummaryThe R package rbioacc is dedicated to the analysis of experimental data collected from bioaccumulation tests. It provides ready-to-use functions to visualise a data set and to estimate bioaccumulation metrics to be further used in support of environmental risk assessment, in full compliance with regulatory requirements. Such metrics are classically requested by standardised regulatory guidelines on which national agencies base their evaluation of applications for marketing authorisation of chemical active substances.Package rbioacc can be used to get estimates of toxicokinetic (TK) parameters (uptake and elimination rates) and bioaccumulation metrics (e.g., BCF, BSAF, BMF) by fitting a one compartment TK model on exposure-depuration test data. The bioaccumulation metrics estimates as well as the parameters and the predictions of the internal concentrations are given with the quantification of their uncertainty.This paper illustrates some classical uses of rbioacc with internal concentrations collected over time possibly at several exposure concentrations, analysed with a generic TK one-compartment model. These examples can be followed step-by-step to analyse any new data set, as long as the data set format is respected.Statement of needPackage rbioacc (Baudrot et al. 2021) has been tested using R (version 4.1.0 and later) on Linux and Windows machines. Regarding the particular case of TK models, package rbioacc was compared with published results considering other TK implementations under different software platforms. Giving very similar results than the other implementations, package rbioacc was thus confirmed as fit-for-purpose in fitting TK models on bioaccumulation test data. All functions in package rbioacc can be used without a deep knowledge of their underlying probabilistic model or inference methods. Rather, they were designed to behave as well as possible, without requiring the user to provide values for some obscure parameters. Nevertheless, models implemented in rbioacc can also be used as a first step to create specially new models for more specific situations. Note that package rbioacc benefits from a web interface, MOSAICbioacc, from which the same analyses can be reproduced directly on-line without needs to invest in R programming. MOSAICbioacc is freely available on the MOSAIC platform at https://mosaic.univ-lyon1.fr/ (Charles et al. 2021) or directly at https://mosaic.univ-lyon1.fr/bioacc (Ratier et al. 2021).AvailabilityPackage rbioacc is available as an R package (with R >= 4.1.0); it can be directly downloaded from CRAN https://CRAN.R-project.org/package=rbioacc, where package dependencies and system requirements are also documented.

Download Full-text

AB0210 ACREULAR: AN R PACKAGE FOR THE CALCULATION AND VISUALISATION OF ACR/EULAR RELATED RHEUMATOID ARTHRITIS MEASURES

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.2326 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1405.1-1406

Author(s):

F. Morton ◽

J. Nijjar ◽

C. Goodyear ◽

D. Porter

Keyword(s):

Rheumatoid Arthritis ◽

Functional Status ◽

Rheumatic Diseases ◽

Web Application ◽

R Package ◽

Diagnostic Classification ◽

Microsoft Excel ◽

Link Type ◽

Large Joint ◽

Programming Skills

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared

Download Full-text

SambaR: An R package for fast, easy and reproducible population‐genetic analyses of biallelic SNP data sets

Molecular Ecology Resources ◽

10.1111/1755-0998.13339 ◽

2021 ◽

Author(s):

Menno J. Jong ◽

Joost F. Jong ◽

A. Rus Hoelzel ◽

Axel Janke

Keyword(s):

Population Genetic ◽

R Package ◽

Data Sets ◽

Genetic Analyses ◽

Snp Data ◽

Population Genetic Analyses

Download Full-text

An alternative distribution to Lindley and Power Lindley distributions with characterizations, different estimation methods and data applications

Mathematica Slovaca ◽

10.1515/ms-2017-0406 ◽

2020 ◽

Vol 70 (4) ◽

pp. 953-978

Author(s):

Mustafa Ç. Korkmaz ◽

G. G. Hamedani

Keyword(s):

Hazard Function ◽

Mixture Distribution ◽

Real Data ◽

Quantile Function ◽

Estimation Methods ◽

Data Sets ◽

Unknown Parameters ◽

Lorenz Curves ◽

Proposed Model ◽

New Distribution

AbstractThis paper proposes a new extended Lindley distribution, which has a more flexible density and hazard rate shapes than the Lindley and Power Lindley distributions, based on the mixture distribution structure in order to model with new distribution characteristics real data phenomena. Its some distributional properties such as the shapes, moments, quantile function, Bonferonni and Lorenz curves, mean deviations and order statistics have been obtained. Characterizations based on two truncated moments, conditional expectation as well as in terms of the hazard function are presented. Different estimation procedures have been employed to estimate the unknown parameters and their performances are compared via Monte Carlo simulations. The flexibility and importance of the proposed model are illustrated by two real data sets.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

A Survey on Causal Inference

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3444944 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-46

Author(s):

Liuyi Yao ◽

Zhixuan Chu ◽

Sheng Li ◽

Yaliang Li ◽

Jing Gao ◽

...

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Causal Effect ◽

Research Direction ◽

Estimation Methods ◽

Potential Outcome ◽

Outcome Framework ◽

Benchmark Datasets ◽

Inference Methods

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

Download Full-text

kataegis: an R package for identification and visualization of the genomic localized hypermutation regions using high-throughput sequencing

BMC Genomics ◽

10.1186/s12864-021-07696-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xue Lin ◽

Yingying Hua ◽

Shuanglin Gu ◽

Li Lv ◽

Xingyu Li ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Somatic Mutations ◽

R Package ◽

Frequency Of Occurrence ◽

Link Type ◽

Genomic Landscape ◽

One Step ◽

Flanking Regions

Abstract Background Genomic localized hypermutation regions were found in cancers, which were reported to be related to the prognosis of cancers. This genomic localized hypermutation is quite different from the usual somatic mutations in the frequency of occurrence and genomic density. It is like a mutations “violent storm”, which is just what the Greek word “kataegis” means. Results There are needs for a light-weighted and simple-to-use toolkit to identify and visualize the localized hypermutation regions in genome. Thus we developed the R package “kataegis” to meet these needs. The package used only three steps to identify the genomic hypermutation regions, i.e., i) read in the variation files in standard formats; ii) calculate the inter-mutational distances; iii) identify the hypermutation regions with appropriate parameters, and finally one step to visualize the nucleotide contents and spectra of both the foci and flanking regions, and the genomic landscape of these regions. Conclusions The kataegis package is available on Bionconductor/Github (https://github.com/flosalbizziae/kataegis), which provides a light-weighted and simple-to-use toolkit for quickly identifying and visualizing the genomic hypermuation regions.

Download Full-text

Moulting growth of the Australian giant crab, Pseudocarcinus gigas

Marine and Freshwater Research ◽

10.1071/mf00074 ◽

2002 ◽

Vol 53 (5) ◽

pp. 869 ◽

Cited By ~ 5

Author(s):

Richard McGarvey ◽

Andrew H. Levings ◽

Janet M. Matthews

Keyword(s):

South Australia ◽

Growth Increment ◽

Likelihood Method ◽

Estimation Methods ◽

Most Probable Number ◽

Minimum Length ◽

Data Sets ◽

Probable Number ◽

Commercial Harvest ◽

Female Data

The growth of Australian giant crabs, Pseudocarcinus gigas, has not been previously studied. A tagging program was undertaken in four Australian states where the species is subject to commercial exploitation. Fishers reported a recapture sample of 1372 females and 383 males from commercial harvest, of which 190 females and 160 males had moulted at least once. Broad-scale modes of growth increment were readily identified and interpreted as 0 , 1 and 2 moults during time at large. Single-moult increments were normally distributed for six of seven data sets. Moult increments were constant with length for males and declined slowly for three of four female data sets. Seasonality of moulting in South Australia was inferred from monthly proportions captured with newly moulted shells. Female moulting peaked strongly in winter (June and July). Males moult in summer (November and December). Intermoult period estimates for P. gigas varied from 3 to 4 years for juvenile males and females (80–120 mm carapace length, CL), with rapid lengthening in time between moulting events to approximately seven years for females and four and a half years for males at legal minimum length of 150 mm CL. New moulting growth estimation methods include a generalization of the anniversary method for estimating intermoult period that uses (rather than rejects) most capture–recapture data and a multiple likelihood method for assigning recaptures to their most probable number of moults during time at large.

Download Full-text