scholarly journals Frictionless Data: Making Research Data Quality Visible

2018 ◽  
Vol 12 (2) ◽  
pp. 274-285 ◽  
Author(s):  
Dan Fowler ◽  
Jo Barratt ◽  
Paul Walsh

There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson,2003). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the “Data Package”, a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal.

Solid Earth ◽  
2011 ◽  
Vol 2 (1) ◽  
pp. 53-63 ◽  
Author(s):  
S. Tavani ◽  
P. Arbues ◽  
M. Snidero ◽  
N. Carrera ◽  
J. A. Muñoz

Abstract. In this work we present the Open Plot Project, an open-source software for structural data analysis, including a 3-D environment. The software includes many classical functionalities of structural data analysis tools, like stereoplot, contouring, tensorial regression, scatterplots, histograms and transect analysis. In addition, efficient filtering tools are present allowing the selection of data according to their attributes, including spatial distribution and orientation. This first alpha release represents a stand-alone toolkit for structural data analysis. The presence of a 3-D environment with digitalising tools allows the integration of structural data with information extracted from georeferenced images to produce structurally validated dip domains. This, coupled with many import/export facilities, allows easy incorporation of structural analyses in workflows for 3-D geological modelling. Accordingly, Open Plot Project also candidates as a structural add-on for 3-D geological modelling software. The software (for both Windows and Linux O.S.), the User Manual, a set of example movies (complementary to the User Manual), and the source code are provided as Supplement. We intend the publication of the source code to set the foundation for free, public software that, hopefully, the structural geologists' community will use, modify, and implement. The creation of additional public controls/tools is strongly encouraged.


Ravnetrykk ◽  
2020 ◽  
Author(s):  
Philipp Conzett

Research data repositories play a crucial role in the FAIR (Findable, Accessible, Interoperable, Reusable) ecosystem of digital objects. DataverseNO is a national, generic repository for open research data, primarily from researchers affiliated with Norwegian research organizations. The repository runs on the open-source software Dataverse. This article presents the organization and operation of DataverseNO, and investigates how the repository contributes to the increased FAIRness of small and medium sized research data. Sections 1 to 3 present background information about the FAIR Data Principles (section 1), how FAIR may be turned into reality (section 2), and what these principles and recommendations imply for data from the so-called long tail of research, i.e. small and medium-sized datasets that are often heterogenous in nature and hard to standardize (section 3). Section 4 gives an overview of the key organizational features of DataverseNO, followed by an evaluation of how well DataverseNO and the repository application Dataverse as such support the FAIR Data Principles (section 5). Section 6 discusses how sustainable and trustworthy the repository is. The article is rounded up in section 7 by a brief summary including a look into the future of the repository.


2021 ◽  
Author(s):  
Fabian Kovacs ◽  
Max Thonagel ◽  
Marion Ludwig ◽  
Alexander Albrecht ◽  
Manuel Hegner ◽  
...  

BACKGROUND Big data in healthcare must be exploited to achieve a substantial increase in efficiency and competitiveness. Especially the analysis of patient-related data possesses huge potential to improve decision-making processes. However, most analytical approaches used today are highly time- and resource-consuming. OBJECTIVE The presented software solution Conquery is an open-source software tool providing advanced, but intuitive data analysis without the need for specialized statistical training. Conquery aims to simplify big data analysis for novice database users in the medical sector. METHODS Conquery is a document-oriented distributed timeseries database and analysis platform. Its main application is the analysis of per-person medical records by non-technical medical professionals. Complex analyses are realized in the Conquery frontend by dragging tree nodes into the query editor. Queries are evaluated by a bespoke distributed query-engine for medical records in a column-oriented fashion. We present a custom compression scheme to facilitate low response times that uses online calculated as well as precomputed metadata and data statistics. RESULTS Conquery allows for easy navigation through the hierarchy and enables complex study cohort construction whilst reducing the demand on time and resources. The UI of Conquery and a query output is exemplified by the construction of a relevant clinical cohort. CONCLUSIONS Conquery is an efficient and intuitive open-source software for performant and secure data analysis and aims at supporting decision-making processes in the healthcare sector.


GigaScience ◽  
2019 ◽  
Vol 8 (9) ◽  
Author(s):  
Peter Georgeson ◽  
Anna Syme ◽  
Clare Sloggett ◽  
Jessica Chung ◽  
Harriet Dashnow ◽  
...  

Abstract Background Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. Findings We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. Conclusions Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.


Author(s):  
Mateusz Kuzak ◽  
Jen Harrow ◽  
Rafael C. Jimenez ◽  
Paula Andrea Martinez ◽  
Fotis E. Psomopoulos ◽  
...  

SoftwareX ◽  
2016 ◽  
Vol 5 ◽  
pp. 121-126 ◽  
Author(s):  
Tobias Weber ◽  
Robert Georgii ◽  
Peter Böni

Metabolites ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 28 ◽  
Author(s):  
Álvaro Fernández-Ochoa ◽  
Rosa Quirantes-Piné ◽  
Isabel Borrás-Linares ◽  
María de la Luz Cádiz-Gurrea ◽  
Marta E. Alarcón Riquelme ◽  
...  

Data pre-processing of the LC-MS data is a critical step in untargeted metabolomics studies in order to achieve correct biological interpretations. Several tools have been developed for pre-processing, and these can be classified into either commercial or open source software. This case report aims to compare two specific methodologies, Agilent Profinder vs. R pipeline, for a metabolomic study with a large number of samples. Specifically, 369 plasma samples were analyzed by HPLC-ESI-QTOF-MS. The collected data were pre-processed by both methodologies and later evaluated by several parameters (number of peaks, degree of missingness, quality of the peaks, degree of misalignments, and robustness in multivariate models). The vendor software was characterized by ease of use, friendly interface and good quality of the graphs. The open source methodology could more effectively correct the drifts due to between and within batch effects. In addition, the evaluated statistical methods achieved better classification results with higher parsimony for the open source methodology, indicating higher data quality. Although both methodologies have strengths and weaknesses, the open source methodology seems to be more appropriate for studies with a large number of samples mainly due to its higher capacity and versatility that allows combining different packages, functions, and methods in a single environment.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Konstantinos Nasiotis ◽  
Martin Cousineau ◽  
François Tadel ◽  
Adrien Peyrache ◽  
Richard M. Leahy ◽  
...  

Abstract The methods for electrophysiology in neuroscience have evolved tremendously over the recent years with a growing emphasis on dense-array signal recordings. Such increased complexity and augmented wealth in the volume of data recorded, have not been accompanied by efforts to streamline and facilitate access to processing methods, which too are susceptible to grow in sophistication. Moreover, unsuccessful attempts to reproduce peer-reviewed publications indicate a problem of transparency in science. This growing problem could be tackled by unrestricted access to methods that promote research transparency and data sharing, ensuring the reproducibility of published results. Here, we provide a free, extensive, open-source software that provides data-analysis, data-management and multi-modality integration solutions for invasive neurophysiology. Users can perform their entire analysis through a user-friendly environment without the need of programming skills, in a tractable (logged) way. This work contributes to open-science, analysis standardization, transparency and reproducibility in invasive neurophysiology.


2015 ◽  
Vol 14 (3) ◽  
pp. 1557-1565 ◽  
Author(s):  
Thilo Muth ◽  
Alexander Behne ◽  
Robert Heyer ◽  
Fabian Kohrs ◽  
Dirk Benndorf ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document