A process-driven platform to manage datasets for research

ABSTRACT Objectives Accumulate, manage and control shared access to research data; Transform and maintain transformation state information about research data; Analyse and investigate data in related sets using open and bespoke tools; Publish extracted data to a secure safe haven environment. ApproachThe Research Data Management Platform (RDMP) is a set of data structures and processes, sharing a core Catalogue, to manage electronic health records, genomic data and imaging data throughout their lifecycle from identification and acquisition to safe disposal or archival and retention in secured Safe Havens (SH). The architecture components of the RDMP consist of the Catalogue and five internal processes: Data Load, Catalogue Management, Data Quality, Data Summary, and Data Extraction. These are designed to enforce rigorous information governance standards relevant to the processing and anonymisation of personal identifiable data. The Catalogue serves as the single ‘source of truth’ about the datasets which all RDMP processes consult. This facilitates repeatable, reliable and auditable operations on the data. The novelty of the RDMP is that it dynamically and seamlessly captures and preserves data transformation processes along with the primary research data to promote reuse and curation of continuously accruing research data repositories in a secure SH environment. Thus, the RDMP brings transparency and reproducibility that benefits research programmes in a way that archival of static data objects does not. ResultsThe RDMP has been in production use since July 1st 2014. There are 107 datasets configured in the Catalogue, with up to 67 dataset extractions for each of 48 research projects. It has provided data for 32 high-impact journal papers published in the last year. Improvements in turnaround time: Research project data provision reduced from six months to two weeks; Data loading reduced from two days to a few hours; Research query response reduced from days to within a day, due to improved and standardised metadata catalogue ConclusionThe RDMP is a key component in automating the regular release of datasets and rationalising dataset changes over time to ensure reliable delivery of extracts to research projects. The tools and processes comprising the RDMP not only fulfil the RDM requirements of researchers, but also support seamless collaboration of data cleaning, data transformation, data summarisation and data quality assessment activities by different research groups.

Download Full-text

Concept and implementation of a single source information system in nuclear medicine for myocardial scintigraphy (SPECT-CT data)

Applied Clinical Informatics ◽

10.4338/aci-2009-12-ra-0017 ◽

2010 ◽

Vol 01 (01) ◽

pp. 50-67 ◽

Cited By ~ 4

Author(s):

K. Rahbar ◽

L. Stegger ◽

M. Schäfers ◽

M. Dugas ◽

S. Herzberg

Keyword(s):

Information System ◽

Data Quality ◽

Medical History ◽

Myocardial Scintigraphy ◽

Research Data ◽

Quality Data ◽

Single Source ◽

Source Information ◽

Documentation System ◽

Ct Data

Summary Objective: Data for clinical documentation and medical research are usually managed in separate systems. We developed, implemented and assessed a documentation system for myocardial scintigraphy (SPECT/CT-data) in order to integrate clinical and research documentation. This paper presents concept, implementation and evaluation of this single source system including methods to improve data quality by plausibility checks. Methods: We analyzed the documentation process for myocardial scintigraphy, especially for collecting medical history, symptoms and medication as well as stress and rest injection protocols. Corresponding electronic forms were implemented in our hospital information system (HIS) including plausibility checks to support correctness and completeness of data entry. Research data can be extracted from routine data by dedicated HIS reports. Results: A single source system based on HIS-electronic documentation merges clinical and scientific documentation and thus avoids multiple documentation. Within nine months 495 patients were documented with our system by 8 physicians and 6 radiographers (466 medical history protocols, 466 stress and 414 rest injection protocols). Documentation consists of 295 attributes, three quarters are conditional items. Data quality improved substantially compared to previous paper-based documentation. Conclusion: A single source system to collect routine and research data for myocardial scintigraphy is feasible in a real-world setting and can generate high-quality data through online plausibility checks.

Download Full-text

Introduction to the Special JeSLIB Issue on Data Curation in Practice

Journal of eScience Librarianship ◽

10.7191/jeslib.2021.1222 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Cynthia Hudson Vitale ◽

Jake R. Carlson ◽

Hannah Hadley ◽

Lisa Johnston

Keyword(s):

Research Integrity ◽

Scientific Communication ◽

Research Data ◽

Data Curation ◽

Research Projects ◽

Academic Institutions ◽

Data Repositories ◽

Communication Processes ◽

Potential Impact

Research data curation is a set of scientific communication processes and activities that support the ethical reuse of research data and uphold research integrity. Data curators act as key collaborators with researchers to enrich the scholarly value and potential impact of their data through preparing it to be shared with others and preserved for the long term. This special issues focuses on practical data curation workflows and tools that have been developed and implemented within data repositories, scholarly societies, research projects, and academic institutions.

Download Full-text

Research Data Management Tools and Workflows: Experimental Work at the University of Porto

IASSIST Quarterly ◽

10.29173/iq925 ◽

2018 ◽

Vol 42 (2) ◽

pp. 1-16

Author(s):

Cristina Ribeiro ◽

João Rocha da Silva ◽

João Aguiar Castro ◽

Ricardo Carvalho Amorim ◽

João Correia Lopes ◽

...

Keyword(s):

Data Management ◽

Research Data ◽

Sensor Data ◽

Preliminary Evaluation ◽

Laboratory Notebook ◽

Data Repositories ◽

Long Tail ◽

Management Platform ◽

Research Data Management ◽

The University

Research datasets include all kinds of objects, from web pages to sensor data, and originate in every domain. Concerns with data generated in large projects and well-funded research areas are centered on their exploration and analysis. For data in the long tail, the main issues are still how to get data visible, satisfactorily described, preserved, and searchable. Our work aims to promote data publication in research institutions, considering that researchers are the core stakeholders and need straightforward workflows, and that multi-disciplinary tools can be designed and adapted to specific areas with a reasonable effort. For small groups with interesting datasets but not much time or funding for data curation, we have to focus on engaging researchers in the process of preparing data for publication, while providing them with measurable outputs. In larger groups, solutions have to be customized to satisfy the requirements of more specific research contexts. We describe our experience at the University of Porto in two lines of enquiry. For the work with long-tail groups we propose general-purpose tools for data description and the interface to multi-disciplinary data repositories. For areas with larger projects and more specific requirements, namely wind infrastructure, sensor data from concrete structures and marine data, we define specialized workflows. In both cases, we present a preliminary evaluation of results and an estimate of the kind of effort required to keep the proposed infrastructures running. The tools available to researchers can be decisive for their commitment. We focus on data preparation, namely on dataset organization and metadata creation. For groups in the long tail, we propose Dendro, an open-source research data management platform, and explore automatic metadata creation with LabTablet, an electronic laboratory notebook. For groups demanding a domain-specific approach, our analysis has resulted in the development of models and applications to organize the data and support some of their use cases. Overall, we have adopted ontologies for metadata modeling, keeping in sight metadata dissemination as Linked Open Data.

Download Full-text

Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository

Applied Clinical Informatics ◽

10.1055/s-0041-1733847 ◽

2021 ◽

Vol 12 (04) ◽

pp. 826-835

Author(s):

Lorenz A. Kapsner ◽

Jonathan M. Mang ◽

Sebastian Mate ◽

Susanne A. Seuchter ◽

Abishaa Vengadeswaran ◽

...

Keyword(s):

Data Integration ◽

Data Quality ◽

Quality Assessment ◽

Research Data ◽

Specific Information ◽

Data Element ◽

Data Repositories ◽

Data Quality Assessment ◽

Metadata Repository ◽

Data Elements

Abstract Background Many research initiatives aim at using data from electronic health records (EHRs) in observational studies. Participating sites of the German Medical Informatics Initiative (MII) established data integration centers to integrate EHR data within research data repositories to support local and federated analyses. To address concerns regarding possible data quality (DQ) issues of hospital routine data compared with data specifically collected for scientific purposes, we have previously presented a data quality assessment (DQA) tool providing a standardized approach to assess DQ of the research data repositories at the MIRACUM consortium's partner sites. Objectives Major limitations of the former approach included manual interpretation of the results and hard coding of analyses, making their expansion to new data elements and databases time-consuming and error prone. We here present an enhanced version of the DQA tool by linking it to common data element definitions stored in a metadata repository (MDR), adopting the harmonized DQA framework from Kahn et al and its application within the MIRACUM consortium. Methods Data quality checks were consequently aligned to a harmonized DQA terminology. Database-specific information were systematically identified and represented in an MDR. Furthermore, a structured representation of logical relations between data elements was developed to model plausibility-statements in the MDR. Results The MIRACUM DQA tool was linked to data element definitions stored in a consortium-wide MDR. Additional databases used within MIRACUM were linked to the DQ checks by extending the respective data elements in the MDR with the required information. The evaluation of DQ checks was automated. An adaptable software implementation is provided with the R package DQAstats. Conclusion The enhancements of the DQA tool facilitate the future integration of new data elements and make the tool scalable to other databases and data models. It has been provided to all ten MIRACUM partners and was successfully deployed and integrated into their respective data integration center infrastructure.

Download Full-text

Frictionless Data: Making Research Data Quality Visible

International Journal of Digital Curation ◽

10.2218/ijdc.v12i2.577 ◽

2018 ◽

Vol 12 (2) ◽

pp. 274-285 ◽

Cited By ~ 2

Author(s):

Dan Fowler ◽

Jo Barratt ◽

Paul Walsh

Keyword(s):

Data Analysis ◽

Best Practices ◽

Data Quality ◽

Open Source ◽

Open Source Software ◽

Research Data ◽

Data Preparation ◽

Data Repositories ◽

Preparation Techniques ◽

Ongoing Project

There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson,2003). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the “Data Package”, a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal.

Download Full-text

Involving Data Creators in an Ontology-Based Design Process for Metadata Models

Advances in Web Technologies and Engineering - Developing Metadata Application Profiles ◽

10.4018/978-1-5225-2221-8.ch008 ◽

2017 ◽

pp. 181-214 ◽

Cited By ~ 4

Author(s):

João Aguiar Castro ◽

Ricardo Carvalho Amorim ◽

Rúbia Gattelli ◽

Yulia Karimova ◽

João Rocha da Silva ◽

...

Keyword(s):

Data Management ◽

Data Interpretation ◽

Research Data ◽

Data Repositories ◽

Vehicle Simulation ◽

Management Platform ◽

Biological Oceanography ◽

Research Data Management ◽

External Data ◽

Data Management Platform

Research data are the cornerstone of science and their current fast rate of production is disquieting researchers. Adequate research data management strongly depends on accurate metadata records that capture the production context of the datasets, thus enabling data interpretation and reuse. This chapter reports on the authors' experience in the development of the metadata models, formalized as ontologies, for several research domains, involving members from small research teams in the overall process. This process is instantiated with four case studies: vehicle simulation; hydrogen production; biological oceanography and social sciences. The authors also present a data description workflow that includes a research data management platform, named Dendro, where researchers can prepare their datasets for further deposit in external data repositories.

Download Full-text

Playing Well on the Data FAIRground: Initiatives and Infrastructure in Research Data Management

Data Intelligence ◽

10.1162/dint_a_00020 ◽

2019 ◽

Vol 1 (4) ◽

pp. 350-367 ◽

Cited By ~ 1

Author(s):

Danielle Descoteaux ◽

Chiara Farinelli ◽

Marina Soares e Silva ◽

Anita de Waard

Keyword(s):

Data Management ◽

Management Practices ◽

Research Data ◽

Current Status ◽

Data Repositories ◽

Data Preservation ◽

Management Platform ◽

Research Data Management ◽

Future Goals ◽

External Data

Over the past five years, Elsevier has focused on implementing FAIR and best practices in data management, from data preservation through reuse. In this paper we describe a series of efforts undertaken in this time to support proper data management practices. In particular, we discuss our journal data policies and their implementation, the current status and future goals for the research data management platform Mendeley Data, and clear and persistent linkages to individual data sets stored on external data repositories from corresponding published papers through partnership with Scholix. Early analysis of our data policies implementation confirms significant disparities at the subject level regarding data sharing practices, with most uptake within disciplines of Physical Sciences. Future directions at Elsevier include implementing better discoverability of linked data within an article and incorporating research data usage metrics.

Download Full-text

Leveraging Data Quality to Better Prepare for Process Mining: An Approach Illustrated Through Analysing Road Trauma Pre-Hospital Retrieval and Transport Processes in Queensland

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16071138 ◽

2019 ◽

Vol 16 (7) ◽

pp. 1138 ◽

Cited By ~ 6

Author(s):

Robert Andrews ◽

Moe Wynn ◽

Kirsten Vallmuur ◽

Arthur ter Hofstede ◽

Emma Bosley ◽

...

Keyword(s):

Data Quality ◽

Process Mining ◽

Data Extraction ◽

Case Study Research ◽

Poor Quality ◽

Transport Processes ◽

Quality Data ◽

Trauma Patients ◽

Quality Issues

While noting the importance of data quality, existing process mining methodologies (i) do not provide details on how to assess the quality of event data (ii) do not consider how the identification of data quality issues can be exploited in the planning, data extraction and log building phases of any process mining analysis, (iii) do not highlight potential impacts of poor quality data on different types of process analyses. As our key contribution, we develop a process-centric, data quality-driven approach to preparing for a process mining analysis which can be applied to any existing process mining methodology. Our approach, adapted from elements of the well known CRISP-DM data mining methodology, includes conceptual data modeling, quality assessment at both attribute and event level, and trial discovery and conformance to develop understanding of system processes and data properties to inform data extraction. We illustrate our approach in a case study involving the Queensland Ambulance Service (QAS) and Retrieval Services Queensland (RSQ). We describe the detailed preparation for a process mining analysis of retrieval and transport processes (ground and aero-medical) for road-trauma patients in Queensland. Sample datasets obtained from QAS and RSQ are utilised to show how quality metrics, data models and exploratory process mining analyses can be used to (i) identify data quality issues, (ii) anticipate and explain certain observable features in process mining analyses, (iii) distinguish between systemic and occasional quality issues, and (iv) reason about the mechanisms by which identified quality issues may have arisen in the event log. We contend that this knowledge can be used to guide the data extraction and pre-processing stages of a process mining case study to properly align the data with the case study research questions.

Download Full-text

Designing Information Product (IP) Maps On the Process of Data Processing and Academic Information

International Journal of New Media Technology ◽

10.31937/ijnmt.v4i1.534 ◽

2017 ◽

Vol 4 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Diana Effendi

Keyword(s):

Data Quality ◽

Data Management ◽

Information Management ◽

Information Quality ◽

Quality Data ◽

Management Approach ◽

Quality Of Data ◽

Information Product ◽

Academic Activities

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management. Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.

Download Full-text

The socioeconomic burden of antibiotic resistance in conflict-affected settings and refugee hosting countries: a systematic scoping review

Conflict and Health ◽

10.1186/s13031-021-00357-6 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Elsa Kobeissi ◽

Marilyne Menassa ◽

Krystel Moussally ◽

Ernestina Repetto ◽

Ismail Soboh ◽

...

Keyword(s):

Antibiotic Resistance ◽

Scoping Review ◽

Statistical Power ◽

Data Extraction ◽

Quality Data ◽

Sufficient Information ◽

Socioeconomic Burden ◽

General Populations ◽

Global Threat ◽

The Cost

Abstract Background Antibiotic resistance (ABR) is a major global threat. Armed and protracted conflicts act as multipliers of infection and ABR, thus leading to increased healthcare and societal costs. We aimed to understand and describe the socioeconomic burden of ABR in conflict-affected settings and refugee hosting countries by conducting a systematic scoping review. Methods A systematic search of PubMed, Medline (Ovid), Embase, Web of Science, SCOPUS and Open Grey databases was conducted to identify all relevant human studies published between January 1990 and August 2019. An updated search was also conducted in April 2020 using Medline/Ovid. Independent screenings of titles/abstracts followed by full texts were performed using pre-defined criteria. The Newcastle-Ottawa Scale was used to assess study quality. Data extraction and analysis were based on the PICOS framework and following the PRISMA-ScR guideline. Results The search yielded 8 studies (7 publications), most of which were single-country, mono-center and retrospective studies. The studies were conducted in Lebanon (n = 3), Iraq (n = 2), Jordan (n = 1), Palestine (n = 1) and Yemen (n = 1). Most of the studies did not have a primary aim to assess the socioeconomic impact of ABR and were small studies with limited statistical power that could not demonstrate significant associations. The included studies lacked sufficient information for the accurate evaluation of the cost incurred by antibiotic resistant infections in conflict-affected countries. Conclusion This review highlights the scarcity of research on the socioeconomic burden of ABR on general populations in conflict-affected settings and on refugees and migrants in host countries, and lists recommendations for consideration in future studies. Further studies are needed to understand the cost of ABR in these settings to develop and implement adaptable policies.

Download Full-text