scholarly journals Reproducible Bioinformatics Project: A community for reproducible bioinformatics analysis pipelines

2017 ◽  
Author(s):  
Neha Kulkarni ◽  
Luca Alessandrì ◽  
Riccardo Panero ◽  
Maddalena Arigoni ◽  
Martina Olivero ◽  
...  

AbstractBackgroundReproducibility of a research is a key element in the modern science and it is mandatory for any industrial application. It represents the ability of replicating an experiment independently by the location and the operator. Therefore, a study can be considered reproducible only if all used data are available and the exploited computational analysis workflow is clearly described. However, today for reproducing a complex bioinformatics analysis, the raw data and a list of tools used in the workflow could be not enough to guarantee the reproducibility of the results obtained. Indeed, different releases of the same tools and/or of the system libraries (exploited by such tools) might lead to sneaky reproducibility issues.ResultsTo address this challenge, we established the Reproducible Bioinformatics Project (RBP), which is a non-profit and open-source project, whose aim is to provide a schema and an infrastructure, based on docker images and R package, to provide reproducible results in Bioinformatics. One or more Docker images are then defined for a workflow (typically one for each task), while the workflow implementation is handled via R-functions embedded in a package available at github repository. Thus, a bioinformatician participating to the project has firstly to integrate her/his workflow modules into Docker image(s) exploiting an Ubuntu docker image developed ad hoc by RPB to make easier this task. Secondly, the workflow implementation must be realized in R according to an R-skeleton function made available by RPB to guarantee homogeneity and reusability among different RPB functions. Moreover she/he has to provide the R vignette explaining the package functionality together with an example dataset which can be used to improve the user confidence in the workflow utilization.ConclusionsReproducible Bioinformatics Project provides a general schema and an infrastructure to distribute robust and reproducible workflows. Thus, it guarantees to final users the ability to repeat consistently any analysis independently by the used UNIX-like architecture.

2006 ◽  
Vol 5 (2) ◽  
pp. 68-71 ◽  
Author(s):  
Simone Kauffeld
Keyword(s):  
Ad Hoc ◽  

Zusammenfassung. Der FEO, der in Kooperation mit betrieblichen Praktikern entwickelt wurde, dient der Erfassung des Organisationsklimas. Er umfasst 82 Items und bildet 12 Skalen ab. Eine Stärke des FEO im Gegensatz zu ad hoc entwickelten Befragungsinstrumenten sind die Vergleichsdaten, die für Profit- und Non-Profit-Organisationen bereit gestellt werden. Kritisch diskutiert wird die theoretische Verortung, die Anwenderfreundlichkeit sowie der Nutzen der individuellen Auswertung. Die konsensuale, konvergente, diskriminante und kriterienbezogene Validierung steht aus.


2019 ◽  
Vol 35 (21) ◽  
pp. 4356-4363 ◽  
Author(s):  
Gaëlle Lefort ◽  
Laurence Liaubet ◽  
Cécile Canlet ◽  
Patrick Tardivel ◽  
Marie-Christine Père ◽  
...  

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.


Biostatistics ◽  
2018 ◽  
Vol 21 (3) ◽  
pp. 432-448 ◽  
Author(s):  
William J Artman ◽  
Inbal Nahum-Shani ◽  
Tianshuang Wu ◽  
James R Mckay ◽  
Ashkan Ertefaie

Summary Sequential, multiple assignment, randomized trial (SMART) designs have become increasingly popular in the field of precision medicine by providing a means for comparing more than two sequences of treatments tailored to the individual patient, i.e., dynamic treatment regime (DTR). The construction of evidence-based DTRs promises a replacement to ad hoc one-size-fits-all decisions pervasive in patient care. However, there are substantial statistical challenges in sizing SMART designs due to the correlation structure between the DTRs embedded in the design (EDTR). Since a primary goal of SMARTs is the construction of an optimal EDTR, investigators are interested in sizing SMARTs based on the ability to screen out EDTRs inferior to the optimal EDTR by a given amount which cannot be done using existing methods. In this article, we fill this gap by developing a rigorous power analysis framework that leverages the multiple comparisons with the best methodology. Our method employs Monte Carlo simulation to compute the number of individuals to enroll in an arbitrary SMART. We evaluate our method through extensive simulation studies. We illustrate our method by retrospectively computing the power in the Extending Treatment Effectiveness of Naltrexone (EXTEND) trial. An R package implementing our methodology is available to download from the Comprehensive R Archive Network.


2021 ◽  
Author(s):  
Qingqing Chen ◽  
Ate Poorthuis

Identifying meaningful locations, such as home or work, from human mobility data has become an increasingly common prerequisite for geographic research. Although location-based services (LBS) and other mobile technology have rapidly grown in recent years, it can be challenging to infer meaningful places from such data, which - compared to conventional datasets – can be devoid of context. Existing approaches are often developed ad-hoc and can lack transparency and reproducibility. To address this, we introduce an R software package for inferring home locations from LBS data. The package implements pre-existing algorithms and provides building blocks to make writing algorithmic ‘recipes’ more convenient. We evaluate this approach by analyzing a de-identified LBS dataset from Singapore that aims to balance ethics and privacy with the research goal of identifying meaningful locations. We show that ensemble approaches, combining multiple algorithms, can be especially valuable in this regard as the resulting patterns of inferred home locations closely correlate with the distribution of residential population. We hope this package, and others like it, will contribute to an increase in use and sharing of comparable algorithms, research code and data. This will increase transparency and reproducibility in mobility analyses and further the ongoing discourse around ethical big data research.


2018 ◽  
Author(s):  
Jianfeng Li ◽  
Bowen Cui ◽  
Yuting Dai ◽  
Ling Bai ◽  
Jinyan Huang

The number of bioinformatics resources, such as tools/scripts and databases are growing exponentially. This poses a great challenge for users to access, manage, and integrate the corresponding bioinformatics resources. To facilitate the request, we proposed a comprehensive R package, BioInstaller, which includes the R functions, Shiny application, and the HTTP representational state transfer (REST) application programming interfaces (APIs). We also established a community-based configuration pool to collect, access and share bioinformatics resources. The source code of BioInstaller is freely available at our lab website http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller or popular package host GitHub at: https://github.com/JhuangLab/BioInstaller. Also, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller).


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Fanyan Meng ◽  
Ningna Du ◽  
Daoming Xu ◽  
Li Kuai ◽  
Lanying Liu ◽  
...  

Ankylosing spondylitis (AS) is an autoimmune disease that mainly affects the spinal joints, sacroiliac joints, and adjacent soft tissues. We conducted bioinformatics analysis to explore the molecular mechanism related to AS pathogenesis and uncover novel potential molecular targets for the treatment of AS. The profiles of GSE25101, containing gene expression data extracted from the blood of 16 AS patients and 16 matched controls, were acquired from the Gene Expression Omnibus (GEO) database. The background correction and standardization were carried out utilizing the transcript per million (TPM) method. After analysis of AS patients and the normal groups, we identified 199 differentially expressed genes (DEGs) with upregulation and 121 DEGs with downregulation by the limma R package. The results of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) biological process enrichment analysis revealed that the DEGs with upregulation were mainly associated with spliceosome, ribosome, RNA-catabolic process, electron transport chain, etc. And the DEGs with downregulation primarily participated in T cell-associated pathways and processes. After analysis of the protein-protein interaction (PPI) network, our data revealed that the hub genes, comprising MRPL13, MRPL22, LSM3, COX7A2, COX7C, EP300, PTPRC, and CD4, could be the treatment targets in AS. Our data furnish new hints to uncover the features of AS and explore more promising treatment targets towards AS.


Cells ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 622 ◽  
Author(s):  
Marianna Talia ◽  
Ernestina De Francesco ◽  
Damiano Rigiracciolo ◽  
Maria Muoio ◽  
Lucia Muglia ◽  
...  

The G protein-coupled estrogen receptor (GPER, formerly known as GPR30) is a seven-transmembrane receptor that mediates estrogen signals in both normal and malignant cells. In particular, GPER has been involved in the activation of diverse signaling pathways toward transcriptional and biological responses that characterize the progression of breast cancer (BC). In this context, a correlation between GPER expression and worse clinical-pathological features of BC has been suggested, although controversial data have also been reported. In order to better assess the biological significance of GPER in the aggressive estrogen receptor (ER)-negative BC, we performed a bioinformatics analysis using the information provided by The Invasive Breast Cancer Cohort of The Cancer Genome Atlas (TCGA) project and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets. Gene expression correlation and the statistical analysis were carried out with R studio base functions and the tidyverse package. Pathway enrichment analysis was evaluated with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway on the Database for Annotation, Visualization and Integrated Discovery (DAVID) website, whereas gene set enrichment analysis (GSEA) was performed with the R package phenoTest. The survival analysis was determined with the R package survivALL. Analyzing the expression data of more than 2500 primary BC, we ascertained that GPER levels are associated with pro-migratory and metastatic genes belonging to cell adhesion molecules (CAMs), extracellular matrix (ECM)-receptor interaction, and focal adhesion (FA) signaling pathways. Thereafter, evaluating the disease-free interval (DFI) in ER-negative BC patients, we found that the subjects expressing high GPER levels exhibited a shorter DFI in respect to those exhibiting low GPER levels. Overall, our results may pave the way to further dissect the network triggered by GPER in the breast malignancies lacking ER toward a better assessment of its prognostic significance and the action elicited in mediating the aggressive features of the aforementioned BC subtype.


Author(s):  
Marne C Hagemeijer ◽  
Annelotte M Vonk ◽  
Nikhil T Awatade ◽  
Iris A L Silva ◽  
Christian Tischer ◽  
...  

Abstract Motivation The forskolin-induced swelling (FIS) assay has become the preferential assay to predict the efficacy of approved and investigational CFTR-modulating drugs for individuals with cystic fibrosis (CF). Currently, no standardized quantification method of FIS data exists thereby hampering inter-laboratory reproducibility. Results We developed a complete open-source workflow for standardized high-content analysis of CFTR function measurements in intestinal organoids using raw microscopy images as input. The workflow includes tools for (i) file and metadata handling; (ii) image quantification and (iii) statistical analysis. Our workflow reproduced results generated by published proprietary analysis protocols and enables standardized CFTR function measurements in CF organoids. Availability All workflow components are open-source and freely available: the htmrenamer R package for file handling https://github.com/hmbotelho/htmrenamer; CellProfiler and ImageJ analysis scripts/pipelines https://github.com/hmbotelho/FIS_image_analysis; the Organoid Analyst application for statistical analysis https://github.com/hmbotelho/organoid_analyst; detailed usage instructions and a demonstration dataset https://github.com/hmbotelho/FIS_analysis. Distributed under GPL v3.0. Supplementary information Supplementary information and a stepwise guide for software installation and data analysis for training purposes are available at Bioinformatics online.


2021 ◽  
Vol 56 (6) ◽  
pp. 337-340
Author(s):  
Giovanni Dosi

AbstractThis article discusses the medical/therapeutical responses to the COVID-19 pandemic and their political economy context. First, the very quick development of several vaccines highlights the richness of the basic knowledge waiting for therapeutical exploitation. Such knowledge has largely originated in public or non-profit institutions. Second, symmetrically, there is longer-term evidence that the private sector (essentially big pharma) has decreased its investment in basic research in general and has long been uninterested in vaccines in particular. Only when flooded with an enormous amount of public money did it become eager to undertake applied research, production scale-up and testing. Third, the political economy of the underlying public-private relationship reveals a profound dysfunctionality with the public being unable to determine the rates and direction of innovation, but at the same time confined to the role of payer of first and last resort, with dire consequences for both advanced, and more so developing countries. Fourth, on normative grounds, measures like ad hoc patent waivers are certainly welcome, but this will not address the fundamental challenge, involving a deep reform of the intellectual property rights regimes and their international protection.


Sign in / Sign up

Export Citation Format

Share Document