scholarly journals Deep Hash-based Relevance-aware Data Quality Assessment for Image Dark Data

2021 ◽  
Vol 2 (2) ◽  
pp. 1-26
Author(s):  
Yu Liu ◽  
Yangtao Wang ◽  
Lianli Gao ◽  
Chan Guo ◽  
Yanzhao Xie ◽  
...  

Data mining can hardly solve but always faces a problem that there is little meaningful information within the dataset serving a given requirement. Faced with multiple unknown datasets, to allocate data mining resources to acquire more desired data, it is necessary to establish a data quality assessment framework based on the relevance between the dataset and requirements. This framework can help the user to judge the potential benefits in advance, so as to optimize the resource allocation to those candidates. However, the unstructured data (e.g., image data) often presents dark data states, which makes it tricky for the user to understand the relevance based on content of the dataset in real time. Even if all data have label descriptions, how to measure the relevance between data efficiently under semantic propagation remains an urgent problem. Based on this, we propose a Deep Hash-based Relevance-aware Data Quality Assessment framework, which contains off-line learning and relevance mining parts as well as an on-line assessing part. In the off-line part, we first design a Graph Convolution Network (GCN)-AutoEncoder hash (GAH) algorithm to recognize the data (i.e., lighten the dark data), then construct a graph with restricted Hamming distance, and finally design a Cluster PageRank (CPR) algorithm to calculate the importance score for each node (image) so as to obtain the relevance representation based on semantic propagation. In the on-line part, we first retrieve the importance score by hash codes and then quickly get the assessment conclusion in the importance list. On the one hand, the introduction of GCN and co-occurrence probability in the GAH promotes the perception ability for dark data. On the other hand, the design of CPR utilizes hash collision to reduce the scale of graph and iteration matrix, which greatly decreases the consumption of space and computing resources. We conduct extensive experiments on both single-label and multi-label datasets to assess the relevance between data and requirements as well as test the resources allocation. Experimental results show our framework can gain the most desired data with the same mining resources. Besides, the test results on Tencent1M dataset demonstrate the framework can complete the assessment with a stability for given different requirements.

2018 ◽  
Vol 7 (4) ◽  
pp. e000353 ◽  
Author(s):  
Luke A Turcotte ◽  
Jake Tran ◽  
Joshua Moralejo ◽  
Nancy Curtin-Telegdi ◽  
Leslie Eckel ◽  
...  

BackgroundHealth information systems with applications in patient care planning and decision support depend on high-quality data. A postacute care hospital in Ontario, Canada, conducted data quality assessment and focus group interviews to guide the development of a cross-disciplinary training programme to reimplement the Resident Assessment Instrument–Minimum Data Set (RAI-MDS) 2.0 comprehensive health assessment into the hospital’s clinical workflows.MethodsA hospital-level data quality assessment framework based on time series comparisons against an aggregate of Ontario postacute care hospitals was used to identify areas of concern. Focus groups were used to evaluate assessment practices and the use of health information in care planning and clinical decision support. The data quality assessment and focus groups were repeated to evaluate the effectiveness of the training programme.ResultsInitial data quality assessment and focus group indicated that knowledge, practice and cultural barriers prevented both the collection and use of high-quality clinical data. Following the implementation of the training, there was an improvement in both data quality and the culture surrounding the RAI-MDS 2.0 assessment.ConclusionsIt is important for facilities to evaluate the quality of their health information to ensure that it is suitable for decision-making purposes. This study demonstrates the use of a data quality assessment framework that can be applied for quality improvement planning.


Author(s):  
Syed Mustafa Ali ◽  
Farah Naureen ◽  
Arif Noor ◽  
Maged Kamel N. Boulos ◽  
Javariya Aamir ◽  
...  

Background Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records by using data quality assessment framework. Methodology We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled TB clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n=59; 59 out of 117 TB01 cards) were digitized. There were 1,239 comparable data fields, out of which 65% (n=803) were correctly matched between paper based and digital records. However, 35% of the data fields (n=436) had anomalies, either in paper-based records or in digital records. 1.9 data quality issues were calculated per digital patient record, whereas it was 2.1 issues per record for paper-based record. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paper-based records (n=123) than in digital records (n=110). Conclusion There were fewer data quality issues in digital records as compared to the corresponding paper-based records. Greater use of mobile data capture and continued use of the data quality assessment framework can deliver more meaningful information for decision making.


Sign in / Sign up

Export Citation Format

Share Document