A Review of Failure Handling Mechanisms for Data Quality Measures

2012 ◽  
Author(s):  
Nurul A. Emran ◽  
Noraswaliza Abdullah ◽  
Nuzaimah Mustafa
2012 ◽  
Author(s):  
Nurul Emran ◽  
Noraswaliza Abdullah ◽  
Nuzaimah Mustafa

2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2021 ◽  
Vol 560 ◽  
pp. 51-67
Author(s):  
Marilyn Bello ◽  
Gonzalo Nápoles ◽  
Koen Vanhoof ◽  
Rafael Bello

2017 ◽  
Vol 77 ◽  
pp. 31-39 ◽  
Author(s):  
Maarit K. Leinonen ◽  
Joonas Miettinen ◽  
Sanna Heikkinen ◽  
Janne Pitkäniemi ◽  
Nea Malila

2018 ◽  
Vol 60 (1) ◽  
pp. 32-49 ◽  
Author(s):  
Mingnan Liu ◽  
Laura Wronski

This study examines the use of trap questions as indicators of data quality in online surveys. Trap questions are intended to identify respondents who are not paying close attention to survey questions, which would mean that they are providing sub-optimal responses to not only the trap question itself but to other questions included in the survey. We conducted three experiments using an online non-probability panel. In the first experiment, we examine whether there is any difference in responses to surveys with one trap question as those that have two trap questions. In the second study, we examine responses to surveys with trap questions of varying difficulty. In the third experiment, we test the level of difficulty, the placement of the trap question, and other forms of attention checks. In all studies, we correlate the responses to the trap question(s) with other data quality checks, most of which were derived from the literature on satisficing. Also, we compare the responses to several substance questions by the response to the trap questions. This would tell us whether participants who failed the trap questions gave consistently different answers from those who passed the trap questions. We find that the rate of passing/failing various trap questions varies widely, from 27% to 87% among the types we tested. We also find evidence that some types of trap questions are more significantly correlated with other data quality measures.


2018 ◽  
pp. 1-10
Author(s):  
Rory J. Lettvin ◽  
Alpna Wayal ◽  
Amy McNutt ◽  
Robert S. Miller ◽  
Robert Hauser

Purpose A joint data quality initiative between the Cancer Treatment Centers of America and the ASCO big data health technology platform CancerLinQ® was initiated to document and codify the steps taken to evaluate, stratify, and determine the potential effect of data elements used for electronic clinical quality measures as captured within structured fields in electronic health records. Methods The processes involved the identification of clinical concepts required in measure population criteria and then to map these to the corresponding components of the CancerLinQ data model. A quantitative assessment of mappings between electronic clinical quality measure clinical concepts and attributes from the CancerLinQ clinical database was performed. In parallel, a qualitative analysis of high-impact data elements from the Cancer Treatment Centers of America clinical measures was made using local, expert consensus. Results An impact assessment was derived using a count of the data elements across measures and the specific population criteria affected. Conclusion A list of putative high-impact data elements can provide guidance for clinicians to facilitate specific data element capture related to quality metrics in an electronic environment.


Author(s):  
Juliusz L. Kulikowski

A state-of-the-art in the domain of data quality assessment and maintaining in modern information systems is presented in the paper. A short historical view on the development of this problem is given. Particular attention is paid to the development of the idea of multi-aspect data quality assessment. The problem of extension of single data quality assessment on higher-level data structures is considered. The methods of multi-aspect data quality measures ordering for comparison is analyzed and its solution based on the concept of semi-ordering in linear vector (Kantorovitsh) space is proposed. Remarks on organizational and technological tools for data quality maintenance in organizations are given. Expected future trends in the development of data quality assessment and maintenance methods are suggested.


Sign in / Sign up

Export Citation Format

Share Document