tensorBF: an R package for Bayesian tensor factorization

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]

Download Full-text

Stable Tensor Principal Component Pursuit: Error Bounds and Efficient Algorithms

Sensors ◽

10.3390/s19235335 ◽

2019 ◽

Vol 19 (23) ◽

pp. 5335 ◽

Cited By ~ 1

Author(s):

Wei Fang ◽

Dongxu Wei ◽

Ran Zhang

Keyword(s):

Error Bounds ◽

Rapid Development ◽

Principal Component ◽

Real Data ◽

Superior Performance ◽

Sensor Technology ◽

Data Sets ◽

Tensor Factorization ◽

Principal Component Pursuit ◽

Tensor Data

The rapid development of sensor technology gives rise to the emergence of huge amounts of tensor (i.e., multi-dimensional array) data. For various reasons such as sensor failures and communication loss, the tensor data may be corrupted by not only small noises but also gross corruptions. This paper studies the Stable Tensor Principal Component Pursuit (STPCP) which aims to recover a tensor from its corrupted observations. Specifically, we propose a STPCP model based on the recently proposed tubal nuclear norm (TNN) which has shown superior performance in comparison with other tensor nuclear norms. Theoretically, we rigorously prove that under tensor incoherence conditions, the underlying tensor and the sparse corruption tensor can be stably recovered. Algorithmically, we first develop an ADMM algorithm and then accelerate it by designing a new algorithm based on orthogonal tensor factorization. The superiority and efficiency of the proposed algorithms is demonstrated through experiments on both synthetic and real data sets.

Download Full-text

FORESEE: a tool for the systematic comparison of translational drug response modeling pipelines

10.7287/peerj.preprints.27256 ◽

2019 ◽

Author(s):

Lisa-Katrin Turnhoff ◽

Ali Hadizadeh Esfahani ◽

Maryam Montazeri ◽

Nina Kusch ◽

Andreas Schuppert

Keyword(s):

Drug Response ◽

Drug Efficacy ◽

Response Prediction ◽

R Package ◽

Supplementary Information ◽

Supplementary File ◽

Data Sets ◽

Training Algorithms ◽

Model Training

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]

Download Full-text

FORESEE: a tool for the systematic comparison of translational drug response modeling pipelines

10.7287/peerj.preprints.27256v2 ◽

2019 ◽

Author(s):

Lisa-Katrin Turnhoff ◽

Ali Hadizadeh Esfahani ◽

Maryam Montazeri ◽

Nina Kusch ◽

Andreas Schuppert

Keyword(s):

Drug Response ◽

Drug Efficacy ◽

Response Prediction ◽

R Package ◽

Supplementary Information ◽

Supplementary File ◽

Data Sets ◽

Training Algorithms ◽

Model Training

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]

Download Full-text

SambaR: An R package for fast, easy and reproducible population‐genetic analyses of biallelic SNP data sets

Molecular Ecology Resources ◽

10.1111/1755-0998.13339 ◽

2021 ◽

Author(s):

Menno J. Jong ◽

Joost F. Jong ◽

A. Rus Hoelzel ◽

Axel Janke

Keyword(s):

Population Genetic ◽

R Package ◽

Data Sets ◽

Genetic Analyses ◽

Snp Data ◽

Population Genetic Analyses

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

WORCS: A workflow for open reproducible code in science

Data Science ◽

10.3233/ds-210031 ◽

2021 ◽

pp. 1-21

Author(s):

Caspar J. Van Lissa ◽

Andreas M. Brandmaier ◽

Loek Brinkman ◽

Anna-Lena Lamprecht ◽

Aaron Peikert ◽

...

Keyword(s):

Best Practices ◽

Source Code ◽

R Package ◽

Open Science ◽

Research Projects ◽

Tabular Data ◽

Step Procedure ◽

Starting Point ◽

Conducting Research ◽

And Training

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.

Download Full-text

PS2-54: Best Practices: Improving Quality and Reliability in Research Data Sets

Clinical Medicine & Research ◽

10.3121/cmr.2013.1176.ps2-54 ◽

2013 ◽

Vol 11 (3) ◽

pp. 157-157

Author(s):

L. McFarland ◽

J. Richter ◽

C. Bredfeldt

Keyword(s):

Best Practices ◽

Research Data ◽

Data Sets ◽

Quality And Reliability

Download Full-text

Evaluation of business models of urban IoT-applications for a medium sized city

Central and Eastern European eDem and eGov Days ◽

10.24989/ocg.338.31 ◽

2020 ◽

Vol 338 ◽

pp. 393-403

Author(s):

Ferdinand Fischer ◽

Birgit Schenk

Keyword(s):

Best Practices ◽

Business Models ◽

Smart Cities ◽

Research Question ◽

The Public ◽

Iot Applications ◽

Evaluation Scheme ◽

Number Of Factors ◽

German City ◽

The Internet Of Things

Digitalization of the public sector is being driven by a number of factors. In particular, the concept of "Smart Cities" has become an important driver of this development. This relies heavily on an intelligent infrastructure including the Internet of Things (IoT). But does it make sense for small and medium-sized municipalities to develop this? Is it justified to invest in IoT? (How) can a mediumsized city benefit from it? This paper presents the application of an evaluation scheme for business models of urban IoT applications to answer these questions. The research question focuses on how best practices of urban IoT applications in general and in particular can be evaluated. In order to establish a concrete practical reference we evaluated ten chosen IoT applications for the German city of Herrenberg.

Download Full-text

SpiderSeqR: an R package for crawling the web of high-throughput multi-omic data repositories for data-sets and annotation

10.1101/2020.04.13.039420 ◽

2020 ◽

Author(s):

Anna M. Sozanska ◽

Charles Fletcher ◽

Dóra Bihary ◽

Shamith A. Samarajiwa

Keyword(s):

High Throughput ◽

R Package ◽

Data Reuse ◽

Massively Parallel ◽

Data Sets ◽

Similar Data ◽

Data Generation ◽

Data Repositories ◽

Public Data ◽

Omic Data

AbstractMore than three decades ago, the microarray revolution brought about high-throughput data generation capability to biology and medicine. Subsequently, the emergence of massively parallel sequencing technologies led to many big-data initiatives such as the human genome project and the encyclopedia of DNA elements (ENCODE) project. These, in combination with cheaper, faster massively parallel DNA sequencing capabilities, have democratised multi-omic (genomic, transcriptomic, translatomic and epigenomic) data generation leading to a data deluge in bio-medicine. While some of these data-sets are trapped in inaccessible silos, the vast majority of these data-sets are stored in public data resources and controlled access data repositories, enabling their wider use (or misuse). Currently, most peer reviewed publications require the deposition of the data-set associated with a study under consideration in one of these public data repositories. However, clunky and difficult to use interfaces, subpar or incomplete annotation prevent discovering, searching and filtering of these multi-omic data and hinder their re-purposing in other use cases. In addition, the proliferation of multitude of different data repositories, with partially redundant storage of similar data are yet another obstacle to their continued usefulness. Similarly, interfaces where annotation is spread across multiple web pages, use of accession identifiers with ambiguous and multiple interpretations and lack of good curation make these data-sets difficult to use. We have produced SpiderSeqR, an R package, whose main features include the integration between NCBI GEO and SRA databases, enabling an integrated unified search of SRA and GEO data-sets and associated annotations, conversion between database accessions, as well as convenient filtering of results and saving past queries for future use. All of the above features aim to promote data reuse to facilitate making new discoveries and maximising the potential of existing data-sets.Availabilityhttps://github.com/ss-lab-cancerunit/SpiderSeqR

Download Full-text