Validating the Knowledge Bank Approach for Personalized Prediction of Survival in Acute Myeloid Leukemia: a Reproducibility Study

Mapping Intimacies ◽

10.21203/rs.3.rs-881649/v1 ◽

2021 ◽

Author(s):

Yujun Xu ◽

Ulrich Mansmann

Keyword(s):

Myeloid Leukemia ◽

Source Code ◽

Original Data ◽

Open Science ◽

Reproducible Research ◽

Future Application ◽

Reproducibility Study ◽

Predictive Algorithms ◽

Prediction Of Survival ◽

Algorithmic Techniques

Abstract Reproducibility is not only essential for the integrity of scientific research, but is also a prerequisite of model validation and refinement for future application of (predictive) algorithms. However, reproducible research is becoming increasingly challenging, particularly in high-dimensional genomic data analyses with complex statistical or algorithmic techniques. Given that there are no mandatory requirements in most biomedical and statistical journals to provide the original data, analytical source code, or other relevant materials for publication, accessibility to these supplements naturally suggests a greater credibility of published work. In this study, we performed a reproducibility assessment of the notable paper by Gerstung et al. published in Nature Genetics (2017) by rerunning the analysis using their original code and data, which are publicly accessible. Despite a perfect open science setting, it was challenging to reproduce the entire research project; reasons included coding errors, suboptimal code legibility, incomplete documentation, intensive computations, and an R computing environment that could no longer be re-established. We learn that availability of code and data does not guarantee transparency and reproducibility of a study; in contrast, the source code is still liable to error and obsolescence, essentially due to methodological complexity, lack of editorial reproducibility checking at submission, and updates of software and operating environment. Building on the experience gained, we propose practical criteria for the conduct and reporting of reproducibility studies for future researchers.

Download Full-text

pISA-tree - a data management framework for life science research projects using a standardised directory tree

10.1101/2021.11.18.468977 ◽

2021 ◽

Author(s):

Marko Petek ◽

Maja Zagorscak ◽

Andrej Blejec ◽

Ziva Ramsak ◽

Anna Coll ◽

...

Keyword(s):

Data Management ◽

Life Science ◽

Source Code ◽

Science Research ◽

Open Science ◽

Reproducible Research ◽

Public Repository ◽

Management Framework ◽

Fair Principles ◽

R Packages

We have developed pISA-tree, a straightforward and flexible data management solution for organisation of life science project-associated research data and metadata. It enables on-the-fly creation of enriched directory tree structure (project/Investigation/Study/Assay) via a series of sequential batch files in a standardised manner based on the ISA metadata framework. The system supports reproducible research and is in accordance with the Open Science initiative and FAIR principles. Compared with similar frameworks, it does not require any systems administration and maintenance as it can be run on a personal computer or network drive. It is complemented with two R packages, pisar and seekr, where the former facilitates integration of the pISA-tree datasets into bioinformatic pipelines and the latter enables synchronisation with the FAIRDOMHub public repository using the SEEK API. Source code and detailed documentation of pISA-tree and its supporting R packages are available from https://github.com/NIB-SI/pISA-tree.

Download Full-text

WORCS: A workflow for open reproducible code in science

Data Science ◽

10.3233/ds-210031 ◽

2021 ◽

pp. 1-21

Author(s):

Caspar J. Van Lissa ◽

Andreas M. Brandmaier ◽

Loek Brinkman ◽

Anna-Lena Lamprecht ◽

Aaron Peikert ◽

...

Keyword(s):

Best Practices ◽

Source Code ◽

R Package ◽

Open Science ◽

Research Projects ◽

Tabular Data ◽

Step Procedure ◽

Starting Point ◽

Conducting Research ◽

And Training

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.

Download Full-text

German Reproducibility Network - a new platform for Open Science in Germany

10.5194/egusphere-egu21-14724 ◽

2021 ◽

Author(s):

Bernadette Fritzsch ◽

Daniel Nüst

Keyword(s):

Local Level ◽

Scientific Work ◽

Research Process ◽

Open Science ◽

Further Education ◽

Early Career ◽

Reproducible Research ◽

Small Scale ◽

Research Software

Open Science has established itself as a movement across all scientific disciplines in recent years. It supports good practices in science and research that lead to more robust, comprehensible, and reusable results. The aim is to improve the transparency and quality of scientific results so that more trust is achieved, both in the sciences themselves and in society. Transparency requires that uncertainties and assumptions are made explicit and disclosed openly.&#160; Currently, the Open Science movement is largely driven by grassroots initiatives and small scale projects. We discuss some examples that have taken on different facets of the topic:<ul><li>The software developed and used in the research process is playing an increasingly important role. The Research Software Engineers (RSE) communities have therefore organized themselves in national and international initiatives to increase the quality of research software.</li> <li>Evaluating reproducibility of scientific articles as part of peer review requires proper creditation and incentives for both authors and specialised reviewers to spend extra efforts to facilitate workflow execution. The Reproducible AGILE initiative has established a reproducibility review at a major community conference in GIScience.</li> <li>Technological advances for more reproducible scholarly communication beyond PDFs, such as containerisation, exist, but are often inaccessible to domain experts who are not programmers. Targeting geoscience and geography, the project Opening Reproducible Research (o2r) develops infrastructure to support publication of research compendia, which capture data, software (incl. execution environment), text, and interactive figures and maps.</li> </ul>At the core of scientific work lie replicability and reproducibility. Even if different scientific communities use these terms differently, the recognition that these aspects need more attention is commonly shared and individual communities can learn a lot from each other. Networking is therefore of great importance. The newly founded initiative German Reproducibility Network (GRN) wants to be a platform for such networking and targets all of the above initiatives. GRN is embedded in a growing network of similar initiatives, e.g. in the UK, Switzerland and Australia. Its goals include&#160;<ul><li>Support of local open science groups</li> <li>Connecting local or topic-centered initiatives for the exchange of experiences</li> <li>Attracting facilities for the goals of Open Science&#160;</li> <li>Cultivate contacts to funding organizations, publishers and other actors in the scientific landscape</li> </ul>In particular, the GRN aims to promote the dissemination of best practices through various formats of further education, in order to sensitize particularly early career researchers to the topic. By providing a platform for networking, local and domain-specific groups should be able to learn from one another, strengthen one another, and shape policies at a local level.We present the GRN in order to address the existing local initiatives and to win them for membership in the GRN or sibling networks in other countries.

Download Full-text

My experiment with open science: Why the benefits of sharing go beyond source code

Nature Precedings ◽

10.1038/npre.2010.4602.1 ◽

2010 ◽

Author(s):

Carl Boettiger

Keyword(s):

Source Code ◽

Open Science

Download Full-text

Open and Reproducible Research on Open Science Framework

Current Protocols Essential Laboratory Techniques ◽

10.1002/cpet.32 ◽

2019 ◽

Vol 18 (1) ◽

Cited By ~ 8

Author(s):

Ian Sullivan ◽

Alexander DeHaven ◽

David Mellor

Keyword(s):

Open Science ◽

Reproducible Research ◽

Science Framework

Download Full-text

ITK: enabling reproducible research and open science

Frontiers in Neuroinformatics ◽

10.3389/fninf.2014.00013 ◽

2014 ◽

Vol 8 ◽

Cited By ~ 33

Author(s):

Matthew McCormick ◽

Xiaoxiao Liu ◽

Julien Jomier ◽

Charles Marion ◽

Luis Ibanez

Keyword(s):

Open Science ◽

Reproducible Research

Download Full-text

A Cloud-based Science Gateway for Enabling Science as a Service to Facilitate Open Science and Reproducible Research

10.5194/egusphere-egu2020-10761 ◽

2020 ◽

Author(s):

Mohan Ramamurthy

Keyword(s):

Cloud Computing ◽

Information Science ◽

Virtual Machines ◽

Scientific Discovery ◽

Open Data ◽

Open Science ◽

Reproducible Research ◽

Data Infrastructure ◽

Science Gateway ◽

Access To Data

The geoscience disciplines are either gathering or generating data in ever-increasing volumes. To ensure that the science community and society reap the utmost benefits in research and societal applications from such rich and diverse data resources, there is a growing interest in broad-scale, open data sharing to foster myriad scientific endeavors. However, open access to data is not sufficient; research outputs must be reusable and reproducible to accelerate scientific discovery and catalyze innovation.As part of its mission, Unidata, a geoscience cyberinfrastructure facility, has been developing and deploying data infrastructure and data-proximate scientific workflows and analysis tools using cloud computing technologies for accessing, analyzing, and visualizing geoscience data.Specifically, Unidata has developed techniques that combine robust access to well-documented datasets with easy-to-use tools, using workflow technologies. In addition to fostering the adoption of technologies like pre-configured virtual machines through Docker containers and Jupyter notebooks, other computational and analytic methods are enabled via &#8220;Software as a Service&#8221; and &#8220;Data as a Service&#8221; techniques with the deployment of the Cloud IDV, AWIPS Servers, and the THREDDS Data Server in the cloud. The collective impact of these services and tools is to enable scientists to use the Unidata Science Gateway capabilities to not only conduct their research but also share and collaborate with other researchers and advance the intertwined goals of Reproducibility of Science and Open Science, and in the process, truly enabling &#8220;Science as a Service&#8221;.Unidata has implemented the aforementioned services on the Unidata Science Gateway ((http://science-gateway.unidata.ucar.edu), which is hosted on the Jetstream cloud, a cloud-computing facility that is funded by the U. S. National Science Foundation. The aim is to give geoscientists an ecosystem that includes data, tools, models, workflows, and workspaces for collaboration and sharing of resources.In this presentation, we will discuss our work to date in developing the Unidata Science Gateway and the hosted services therein, as well as our future directions toward increasing expectations from funders and scientific communities that they will be Open and FAIR (Findable, Accessible, Interoperable, Reusable). In particular, we will discuss how Unidata is advancing data and software transparency, open science, and reproducible research. We will share our experiences in how the geoscience and information science communities are using the data, tools and services provided through the Unidata Science Gateway to advance research and education in the geosciences.

Download Full-text

The Generalized Data Model for Clinical Research

10.1101/194597 ◽

2017 ◽

Author(s):

Mark D. Danese ◽

Marc Halperin ◽

Jennifer Duryea ◽

Ryan Duryea

Keyword(s):

Data Model ◽

Semantic Representation ◽

Original Data ◽

Transformation Process ◽

Data Models ◽

Reproducible Research ◽

Practice Research ◽

Clinical Practice Research Datalink ◽

Data Provenance ◽

Subsequent Transformation

1.Abstract1.1 BackgroundMost healthcare data sources store information within their own unique schemas, making reliable and reproducible research challenging. Consequently, researchers have adopted various data models to improve the efficiency of research. Transforming and loading data into these models is a labor-intensive process that can alter the semantics of the original data. Therefore, we created a data model with a hierarchical structure that simplifies the transformation process and minimizes data alteration.1.2 MethodsThere were two design goals in constructing the tables and table relationships for the Generalized Data Model (GDM). The first was to focus on clinical codes in their original vocabularies to retain the original semantic representation of the data. The second was to retain hierarchical information present in the original data while retaining provenance. The model was tested by transforming synthetic Medicare data; Surveillance, Epidemiology, and End Results data linked to Medicare claims; and electronic health records from the Clinical Practice Research Datalink. We also tested a subsequent transformation from the GDM into the Sentinel data model.1.3 ResultsThe resulting data model contains 19 tables, with the Clinical Codes, Contexts, and Collections tables serving as the core of the model, and containing most of the clinical, provenance, and hierarchical information. In addition, a Mapping table allows users to apply an arbitrarily complex set of relationships among vocabulary elements to facilitate automated analyses.1.4 ConclusionsThe GDM offers researchers a simpler process for transforming data, clear data provenance, and a path for users to transform their data into other data models. The GDM is designed to retain hierarchical relationships among data elements as well as the original semantic representation of the data, ensuring consistency in protocol implementation as part of a complete data pipeline for researchers.

Download Full-text

Publishing open, reproducible research with undergraduates

10.31234/osf.io/f7kuy ◽

2019 ◽

Author(s):

Julia Feld Strand ◽

Violet Aurora Brown

Keyword(s):

Undergraduate Students ◽

Liberal Arts ◽

Undergraduate Research ◽

Publishing Research ◽

Open Science ◽

Reproducible Research ◽

Science Practices ◽

Rapid Pace ◽

Science Collaboration ◽

Science Framework

In response to growing concern in psychology and other sciences about low rates of replicability of published findings (Open Science Collaboration, 2015), there has been a movement toward conducting open and transparent research (see Chambers, 2017). This has led to changes in statistical reporting guidelines in journals (Appelbaum et al., 2018), new professional societies (e.g, Society for the Improvement of Psychological Science), frameworks for posting materials, data, code, and manuscripts (e.g., Open Science Framework, PsyArXiv), initiatives for sharing data and collaborating (e.g., Psych Science Accelerator, Study Swap), and educational resources for teaching through replication (e.g., Collaborative Replications and Education Project). This “credibility revolution” (Vazire, 2018) provides many opportunities for researchers. However, given the recency of the changes and the rapid pace of advancements (see Houtkoop et al., 2018), it may be overwhelming for faculty to know whether and how to begin incorporating open science practices into research with undergraduates.In this paper, we will not attempt to catalogue the entirety of the open science movement (see recommended resources below for more information), but will instead highlight why adopting open science practices may be particularly beneficial to conducting and publishing research with undergraduates. The first author is a faculty member at Carleton College (a small, undergraduate-only liberal arts college) and the second is a former undergraduate research assistant (URA) and lab manager in Dr. Strand’s lab, now pursuing a PhD at Washington University in St. Louis. We argue that open science practices have tremendous benefits for undergraduate students, both in creating publishable results and in preparing students to be critical consumers of science.

Download Full-text

Comparative Panel File: Household Panel Surveys from Seven Countries. Manual for CPF v.1.0 CPF

10.31219/osf.io/7zngy ◽

2020 ◽

Author(s):

Konrad Turek ◽

Matthijs Kalmijn ◽

Thomas Leopold

Keyword(s):

Source Code ◽

The United States ◽

Open Science ◽

Panel Surveys ◽

Social Science Community ◽

Level Data ◽

The Social ◽

Science Community ◽

Technical Details ◽

Household Panel Data

The Comparative Panel File (CPF) harmonises the world's largest and longest-running household panel surveys from seven countries: Australia (HILDA), Germany (SOEP), Great Britain (BHPS and UKHLS), South Korea (KLIPS), Russia (RLMS), Switzerland (SHP), and the United States (PSID). The project aims to support the social science community in the analysis of comparative life course data. The CPF is not a data product but an open-source code that integrates individual and household panel data from all seven surveys into a harmonised three-level data structure. In this manual, we present the design and content of the CPF, explain the logic of the project, workflow and technical details. We also describe the CPF's open-science platform. More at: www.cpfdata.com

Download Full-text