Quality Assurance Tool Suite for Error Detection in Digital Repositories

Author(s):  
Roman Graf ◽  
Ross King
Author(s):  
JUNG-WON LEE ◽  
BYOUNGJU CHOI

Today, businesses have to respond with flexibility and speed to ever-changing customer demand and market opportunities. Service-Oriented Architecture (SOA) is the best methodology for developing new services and integrating them with adaptability — the ability to respond to changing and new requirements. In this paper, we propose a framework for ensuring data quality between composite services, which solves semantic data transformation problems during service composition and detects data errors during service execution at the same time. We also minimize the human intervention by learning data constraints as a basis of data transformation and error detection. We developed a data quality assurance service based on SOA, which makes it possible to improve the quality of services and to manage data effectively for a variety of SOA-based applications. As an empirical study, we applied the service to detect data errors between CRM and ERP services and showed that the data error rate could be reduced by more than 30%. We also showed automation rate for setting detection rule is over 41% by learning data constraints from multiple registered services in the field of business.


2020 ◽  
Author(s):  
Victor Gabriel Leandro Alves ◽  
Mahmoud Ahmed ◽  
Eric Aliotta ◽  
Wookjin Choi ◽  
Jeffrey Vincent Siebers

Author(s):  
Sanja Seljan ◽  
Nikolina Škof Erdelja ◽  
Vlasta Kučiš ◽  
Ivan Dunđer ◽  
Mirjana Pejić Bach

Increased use of computer-assisted translation (CAT) technology in business settings with augmented amounts of tasks, collaborative work, and short deadlines give rise to errors and the need for quality assurance (QA). The research has three operational aims: 1) methodological framework for QA analysis, 2) comparative evaluation of four QA tools, 3) to justify introduction of QA into CAT process. The research includes building of translation memory, terminology extraction, and creation of terminology base. Error categorization is conducted by multidimensional quality (MQM) framework. The level of mistake is calculated considering detected, false, and not detected errors. Weights are assigned to errors (minor, major, or critical), penalties are calculated, and quality estimation for translation memory is given. Results show that process is prone to errors due to differences in error detection, harmonization, and error counting. Data analysis of detected errors leads to further data-driven decisions related to the quality of output results and improved efficacy of translation business process.


2017 ◽  
Vol 44 (4) ◽  
pp. 1212-1223 ◽  
Author(s):  
Michelle Passarge ◽  
Michael K. Fix ◽  
Peter Manser ◽  
Marco F. M. Stampanoni ◽  
Jeffrey V. Siebers

2010 ◽  
Vol 49 (8) ◽  
pp. 1615-1633 ◽  
Author(s):  
Imke Durre ◽  
Matthew J. Menne ◽  
Byron E. Gleason ◽  
Tamara G. Houston ◽  
Russell S. Vose

Abstract This paper describes a comprehensive set of fully automated quality assurance (QA) procedures for observations of daily surface temperature, precipitation, snowfall, and snow depth. The QA procedures are being applied operationally to the Global Historical Climatology Network (GHCN)-Daily dataset. Since these data are used for analyzing and monitoring variations in extremes, the QA system is designed to detect as many errors as possible while maintaining a low probability of falsely identifying true meteorological events as erroneous. The system consists of 19 carefully evaluated tests that detect duplicate data, climatological outliers, and various inconsistencies (internal, temporal, and spatial). Manual review of random samples of the values flagged as errors is used to set the threshold for each procedure such that its false-positive rate, or fraction of valid values identified as errors, is minimized. In addition, the tests are arranged in a deliberate sequence in which the performance of the later checks is enhanced by the error detection capabilities of the earlier tests. Based on an assessment of each individual check and a final evaluation for each element, the system identifies 3.6 million (0.24%) of the more than 1.5 billion maximum/minimum temperature, precipitation, snowfall, and snow depth values in GHCN-Daily as errors, has a false-positive rate of 1%−2%, and is effective at detecting both the grossest errors as well as more subtle inconsistencies among elements.


1985 ◽  
Vol 24 (04) ◽  
pp. 192-196 ◽  
Author(s):  
S. M. Finkelstein ◽  
J. R. Budd ◽  
Lisa B. Ewing ◽  
L. Catherine ◽  
W. J. Warwick ◽  
...  

AbstractThe objective of data quality assurance procedures in clinical studies is to reduce the number of data errors that appear on the data record to a level which is acceptable and compatible with the ultimate use of the recorded information. A semi-automatic procedure has been developed to detect and correct data entry errors in a study of the feasibility and efficacy of home health monitoring for patients with cystic fibrosis. Daily self-measurements are recorded in a diary, mailed to the study coordinating center weekly, and entered into the study’s INSIGHT clinical database. A statistical error detection test has been combined with manual error correction to provide a satisfactory, reasonable cost procedure for such a program. Approximately 76% of the errors from a test diary entry period were detected and corrected by this method. Those errors not detected were within an acceptable range so as not to impact the clinical decisions derived from this data. A completely manual method detected SS% of all errors, but the review and correction process was four times more costly, based on the time needed to conduct each procedure.


Sign in / Sign up

Export Citation Format

Share Document