The New Data Quality Task Group (DQTG): ensuring high quality data today and in the future

Metabolomics ◽  
2014 ◽  
Vol 10 (4) ◽  
pp. 539-540 ◽  
Author(s):  
Daniel W. Bearden ◽  
Richard D. Beger ◽  
David Broadhurst ◽  
Warwick Dunn ◽  
Arthur Edison ◽  
...  
2020 ◽  
Author(s):  
Maryam Zolnoori ◽  
Mark D Williams ◽  
William B Leasure ◽  
Kurt B Angstman ◽  
Che Ngufor

BACKGROUND Patient-centered registries are essential in population-based clinical care for patient identification and monitoring of outcomes. Although registry data may be used in real time for patient care, the same data may further be used for secondary analysis to assess disease burden, evaluation of disease management and health care services, and research. The design of a registry has major implications for the ability to effectively use these clinical data in research. OBJECTIVE This study aims to develop a systematic framework to address the data and methodological issues involved in analyzing data in clinically designed patient-centered registries. METHODS The systematic framework was composed of 3 major components: visualizing the multifaceted and heterogeneous patient-centered registries using a data flow diagram, assessing and managing data quality issues, and identifying patient cohorts for addressing specific research questions. RESULTS Using a clinical registry designed as a part of a collaborative care program for adults with depression at Mayo Clinic, we were able to demonstrate the impact of the proposed framework on data integrity. By following the data cleaning and refining procedures of the framework, we were able to generate high-quality data that were available for research questions about the coordination and management of depression in a primary care setting. We describe the steps involved in converting clinically collected data into a viable research data set using registry cohorts of depressed adults to assess the impact on high-cost service use. CONCLUSIONS The systematic framework discussed in this study sheds light on the existing inconsistency and data quality issues in patient-centered registries. This study provided a step-by-step procedure for addressing these challenges and for generating high-quality data for both quality improvement and research that may enhance care and outcomes for patients. INTERNATIONAL REGISTERED REPORT DERR1-10.2196/18366


10.2196/18366 ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. e18366
Author(s):  
Maryam Zolnoori ◽  
Mark D Williams ◽  
William B Leasure ◽  
Kurt B Angstman ◽  
Che Ngufor

Background Patient-centered registries are essential in population-based clinical care for patient identification and monitoring of outcomes. Although registry data may be used in real time for patient care, the same data may further be used for secondary analysis to assess disease burden, evaluation of disease management and health care services, and research. The design of a registry has major implications for the ability to effectively use these clinical data in research. Objective This study aims to develop a systematic framework to address the data and methodological issues involved in analyzing data in clinically designed patient-centered registries. Methods The systematic framework was composed of 3 major components: visualizing the multifaceted and heterogeneous patient-centered registries using a data flow diagram, assessing and managing data quality issues, and identifying patient cohorts for addressing specific research questions. Results Using a clinical registry designed as a part of a collaborative care program for adults with depression at Mayo Clinic, we were able to demonstrate the impact of the proposed framework on data integrity. By following the data cleaning and refining procedures of the framework, we were able to generate high-quality data that were available for research questions about the coordination and management of depression in a primary care setting. We describe the steps involved in converting clinically collected data into a viable research data set using registry cohorts of depressed adults to assess the impact on high-cost service use. Conclusions The systematic framework discussed in this study sheds light on the existing inconsistency and data quality issues in patient-centered registries. This study provided a step-by-step procedure for addressing these challenges and for generating high-quality data for both quality improvement and research that may enhance care and outcomes for patients. International Registered Report Identifier (IRRID) DERR1-10.2196/18366


2015 ◽  
Vol 21 (3) ◽  
pp. 358-374 ◽  
Author(s):  
Mustafa Aljumaili ◽  
Karina Wandt ◽  
Ramin Karim ◽  
Phillip Tretten

Purpose – The purpose of this paper is to explore the main ontologies related to eMaintenance solutions and to study their application area. The advantages of using these ontologies to improve and control data quality will be investigated. Design/methodology/approach – A literature study has been done to explore the eMaintenance ontologies in the different areas. These ontologies are mainly related to content structure and communication interface. Then, ontologies will be linked to each step of the data production process in maintenance. Findings – The findings suggest that eMaintenance ontologies can help to produce a high-quality data in maintenance. The suggested maintenance data production process may help to control data quality. Using these ontologies in every step of the process may help to provide management tools to provide high-quality data. Research limitations/implications – Based on this study, it can be concluded that further research could broaden the investigation to identify more eMaintenance ontologies. Moreover, studying these ontologies in more technical details may help to increase the understandability and the use of these standards. Practical implications – It has been concluded in this study that applying eMaintenance ontologies by companies needs additional cost and time. Also the lack or the ineffective use of eMaintenance tools in many enterprises is one of the limitations for using these ontologies. Originality/value – Investigating eMaintenance ontologies and connecting them to maintenance data production is important to control and manage the data quality in maintenance.


2018 ◽  
Vol 10 (11) ◽  
pp. 1739 ◽  
Author(s):  
Xianxian Guo ◽  
Le Wang ◽  
Jinyan Tian ◽  
Dameng Yin ◽  
Chen Shi ◽  
...  

Accurate measurement of the field leaf area index (LAI) is crucial for assessing forest growth and health status. Three-dimensional (3-D) structural information of trees from terrestrial laser scanning (TLS) have information loss to various extents because of the occlusion by canopy parts. The data with higher loss, regarded as poor-quality data, heavily hampers the estimation accuracy of LAI. Multi-location scanning, which proved effective in reducing the occlusion effects in other forests, is hard to carry out in the mangrove forest due to the difficulty of moving between mangrove trees. As a result, the quality of point cloud data (PCD) varies among plots in mangrove forests. To improve retrieval accuracy of mangrove LAI, it is essential to select only the high-quality data. Several previous studies have evaluated the regions of occlusion through the consideration of laser pulses trajectories. However, the model is highly susceptible to the indeterminate profile of complete vegetation object and computationally intensive. Therefore, this study developed a new index (vegetation horizontal occlusion index, VHOI) by combining unmanned aerial vehicle (UAV) imagery and TLS data to quantify TLS data quality. VHOI is asymptotic to 0.0 with increasing data quality. In order to test our new index, the VHOI values of 102 plots with a radius of 5 m were calculated with TLS data and UAV image. The results showed that VHOI had a strong linear relationship with estimation accuracy of LAI (R2 = 0.72, RMSE = 0.137). In addition, as TLS data were selected by VHOI less than different thresholds (1.0, 0.9, …, 0.1), the number of remaining plots decreased while the agreement between LAI derived from TLS and field-measured LAI was improved. When the VHOI threshold is 0.3, the optimal trade-off is reached between the number of plots and LAI measurement accuracy (R2 = 0.67). To sum up, VHOI can be used as an index to select high-quality data for accurately measuring mangrove LAI and the suggested threshold is 0.30.


2017 ◽  
Vol 6 (2) ◽  
pp. 505-521 ◽  
Author(s):  
Luděk Vecsey ◽  
Jaroslava Plomerová ◽  
Petr Jedlička ◽  
Helena Munzarová ◽  
Vladislav Babuška ◽  
...  

Abstract. This paper focuses on major issues related to the data reliability and network performance of 20 broadband (BB) stations of the Czech (CZ) MOBNET (MOBile NETwork) seismic pool within the AlpArray seismic experiments. Currently used high-resolution seismological applications require high-quality data recorded for a sufficiently long time interval at seismological observatories and during the entire time of operation of the temporary stations. In this paper we present new hardware and software tools we have been developing during the last two decades while analysing data from several international passive experiments. The new tools help to assure the high-quality standard of broadband seismic data and eliminate potential errors before supplying data to seismological centres. Special attention is paid to crucial issues like the detection of sensor misorientation, timing problems, interchange of record components and/or their polarity reversal, sensor mass centring, or anomalous channel amplitudes due to, for example, imperfect gain. Thorough data quality control should represent an integral constituent of seismic data recording, preprocessing, and archiving, especially for data from temporary stations in passive seismic experiments. Large international seismic experiments require enormous efforts from scientists from different countries and institutions to gather hundreds of stations to be deployed in the field during a limited time period. In this paper, we demonstrate the beneficial effects of the procedures we have developed for acquiring a reliable large set of high-quality data from each group participating in field experiments. The presented tools can be applied manually or automatically on data from any seismic network.


2021 ◽  
pp. 193896552110254
Author(s):  
Lu Lu ◽  
Nathan Neale ◽  
Nathaniel D. Line ◽  
Mark Bonn

As the use of Amazon’s Mechanical Turk (MTurk) has increased among social science researchers, so, too, has research into the merits and drawbacks of the platform. However, while many endeavors have sought to address issues such as generalizability, the attentiveness of workers, and the quality of the associated data, there has been relatively less effort concentrated on integrating the various strategies that can be used to generate high-quality data using MTurk samples. Accordingly, the purpose of this research is twofold. First, existing studies are integrated into a set of strategies/best practices that can be used to maximize MTurk data quality. Second, focusing on task setup, selected platform-level strategies that have received relatively less attention in previous research are empirically tested to further enhance the contribution of the proposed best practices for MTurk usage.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 1978 ◽  
Author(s):  
Argyro Mavrogiorgou ◽  
Athanasios Kiourtis ◽  
Konstantinos Perakis ◽  
Stamatios Pitsios ◽  
Dimosthenis Kyriazis

It is an undeniable fact that Internet of Things (IoT) technologies have become a milestone advancement in the digital healthcare domain, since the number of IoT medical devices is grown exponentially, and it is now anticipated that by 2020 there will be over 161 million of them connected worldwide. Therefore, in an era of continuous growth, IoT healthcare faces various challenges, such as the collection, the quality estimation, as well as the interpretation and the harmonization of the data that derive from the existing huge amounts of heterogeneous IoT medical devices. Even though various approaches have been developed so far for solving each one of these challenges, none of these proposes a holistic approach for successfully achieving data interoperability between high-quality data that derive from heterogeneous devices. For that reason, in this manuscript a mechanism is produced for effectively addressing the intersection of these challenges. Through this mechanism, initially, the collection of the different devices’ datasets occurs, followed by the cleaning of them. In sequel, the produced cleaning results are used in order to capture the levels of the overall data quality of each dataset, in combination with the measurements of the availability of each device that produced each dataset, and the reliability of it. Consequently, only the high-quality data is kept and translated into a common format, being able to be used for further utilization. The proposed mechanism is evaluated through a specific scenario, producing reliable results, achieving data interoperability of 100% accuracy, and data quality of more than 90% accuracy.


Forests ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 99
Author(s):  
Marieke Sandker ◽  
Oswaldo Carrillo ◽  
Chivin Leng ◽  
Donna Lee ◽  
Rémi d’Annunzio ◽  
...  

This article discusses the importance of quality deforestation area estimates for reliable and credible REDD+ monitoring and reporting. It discusses how countries can make use of global spatial tree cover change assessments, but how considerable additional efforts are required to translate these into national deforestation estimates. The article illustrates the relevance of countries’ continued efforts on improving data quality for REDD+ monitoring by looking at Mexico, Cambodia, and Ghana. The experience in these countries show differences between deforestation areas assessed directly from maps and improved sample-based deforestation area estimates, highlighting significant changes in both magnitude and trend of assessed deforestation from both methods. Forests play an important role in achieving the goals of the Paris Agreement, and therefore the ability of countries to accurately measure greenhouse gases from forests is critical. Continued efforts by countries are needed to produce credible and reliable data. Supporting countries to continually increase the quality of deforestation area estimates will also support more efficient allocation of finance that rewards REDD+ results-based payments.


High Quality Data are the precondition for examining and making use of enormous facts and for making sure the estimation of the facts. As of now, far reaching exam and research of price gauges and satisfactory appraisal strategies for massive records are inadequate. To begin with, this paper abridges audits of Data excellent studies. Second, this paper examines the records attributes of the enormous records condition, presents high-quality difficulties appeared by large data, and defines a progressive facts exceptional shape from the point of view of records clients. This system accommodates of big records best measurements, best attributes, and best files. At long last, primarily based on this system, this paper builds a dynamic appraisal technique for records exceptional. This technique has excellent expansibility and versatility and can address the troubles of enormous facts fine appraisal. A few explores have verified that preserving up the character of statistics is regularly recognized as hazardous, however at the equal time is considered as simple to effective basic leadership in building aid the executives. Enormous data sources are exceptionally wide and statistics structures are thoughts boggling. The facts got may additionally have satisfactory troubles, for example, facts mistakes, lacking data, irregularities, commotion, and so forth. The motivation behind facts cleansing (facts scouring) is to pick out and expel mistakes and irregularities from facts so as to decorate their quality. Information cleansing may be separated into 4 examples dependent on usage techniques and degrees manual execution, composing of splendid software programs, records cleaning inconsequential to specific software fields, and taking care of the difficulty of a kind of explicit software area. In these 4 methodologies, the 1/3 has terrific down to earth esteem and may be connected effectively.


Author(s):  
Mary Kay Gugerty ◽  
Dean Karlan

Without high-quality data, even the best-designed monitoring and evaluation systems will collapse. Chapter 7 introduces some the basics of collecting high-quality data and discusses how to address challenges that frequently arise. High-quality data must be clearly defined and have an indicator that validly and reliably measures the intended concept. The chapter then explains how to avoid common biases and measurement errors like anchoring, social desirability bias, the experimenter demand effect, unclear wording, long recall periods, and translation context. It then guides organizations on how to find indicators, test data collection instruments, manage surveys, and train staff appropriately for data collection and entry.


Sign in / Sign up

Export Citation Format

Share Document