A Framework to Assess Data Quality for Reliability Variables

RNA interference (RNAi) not only plays an important role in drug discovery but can also be developed directly into drugs. RNAi high-throughput screening (HTS) biotechnology allows us to conduct genome-wide RNAi research. A central challenge in genome-wide RNAi research is to integrate both experimental and computational approaches to obtain high quality RNAi HTS assays. Based on our daily practice in RNAi HTS experiments, we propose the implementation of 3 experimental and analytic processes to improve the quality of data from RNAi HTS biotechnology: (1) select effective biological controls; (2) adopt appropriate plate designs to display and/or adjust for systematic errors of measurement; and (3) use effective analytic metrics to assess data quality. The applications in 5 real RNAi HTS experiments demonstrate the effectiveness of integrating these processes to improve data quality. Due to the effectiveness in improving data quality in RNAi HTS experiments, the methods and guidelines contained in the 3 experimental and analytic processes are likely to have broad utility in genome-wide RNAi research. ( Journal of Biomolecular Screening 2008:378-389)

Download Full-text

Measuring Data Quality in Context

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch042 ◽

2009 ◽

pp. 385-395 ◽

Cited By ~ 1

Author(s):

G. Shankaranarayanan ◽

Adir Even

Keyword(s):

Data Quality ◽

Quality Standard ◽

Single Measure ◽

Data Utility ◽

Value Contribution ◽

Objective Quality ◽

Data Content ◽

Quality Dimensions ◽

Assess Data Quality ◽

Quality Measurements

Maintaining data at a high quality is critical to organizational success. Firms, aware of the consequences of poor data quality, have adopted methodologies and policies for measuring, monitoring, and improving it (Redman, 1996; Eckerson, 2002). Today’s quality measurements are typically driven by physical characteristics of the data (e.g., item counts, time tags, or failure rates) and assume an objective quality standard, disregarding the context in which the data is used. The alternative is to derive quality metrics from data content and evaluate them within specific usage contexts. The former approach is termed as structure-based (or structural), and the latter, content-based (Ballou and Pazer, 2003). In this chapter we propose a novel framework to assess data quality within specific usage contexts and link it to data utility (or utility of data) - a measure of the value contribution associated with data within specific usage contexts. Our utility-driven framework addresses the limitations of structural measurements and offers alternative measurements for evaluating completeness, validity, accuracy, and currency, as well as a single measure that aggregates these data quality dimensions.

Download Full-text

PROVIDING DATA QUALITY INFORMATION FOR REMOTE SENSING APPLICATIONS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-15-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 15-22 ◽

Cited By ~ 3

Author(s):

F. Albrecht ◽

T. Blaschke ◽

S. Lang ◽

H. M. Abdulmutalib ◽

G. Szabó ◽

...

Keyword(s):

Remote Sensing ◽

Data Quality ◽

Quality Criteria ◽

Assessment Data ◽

Complete Understanding ◽

Cloud Processing ◽

Sensing Applications ◽

Information Products ◽

Remote Sensing Applications ◽

Assess Data Quality

The availability and accessibility of remote sensing (RS) data, cloud processing platforms and provided information products and services has increased the size and diversity of the RS user community. This development also generates a need for validation approaches to assess data quality. Validation approaches employ quality criteria in their assessment. Data Quality (DQ) dimensions as the basis for quality criteria have been deeply investigated in the database area and in the remote sensing domain. Several standards exist within the RS domain but a general classification &ndash; established for databases &ndash; has been adapted only recently. For an easier identification of research opportunities, a better understanding is required how quality criteria are employed in the RS lifecycle. Therefore, this research investigates how quality criteria support decisions that guide the RS lifecycle and how they relate to the measured DQ dimensions. Subsequently follows an overview of the relevant standards in the RS domain that is matched to the RS lifecycle. Conclusively, the required research needs are identified that would enable a complete understanding of the interrelationships between the RS lifecycle, the data sources and the DQ dimensions, an understanding that would be very valuable for designing validation approaches in RS.

Download Full-text

Better models by discarding data?

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913001121 ◽

2013 ◽

Vol 69 (7) ◽

pp. 1215-1222 ◽

Cited By ~ 160

Author(s):

K. Diederichs ◽

P. A. Karplus

Keyword(s):

Data Quality ◽

Practical Importance ◽

Repeated Measurements ◽

Data Sets ◽

Model Quality ◽

Data Set ◽

X Ray Crystallography ◽

Typical Data ◽

The One ◽

Assess Data Quality

In macromolecular X-ray crystallography, typical data sets have substantial multiplicity. This can be used to calculate the consistency of repeated measurements and thereby assess data quality. Recently, the properties of a correlation coefficient, CC1/2, that can be used for this purpose were characterized and it was shown that CC1/2has superior properties compared with `merging'Rvalues. A derived quantity, CC*, links data and model quality. Using experimental data sets, the behaviour of CC1/2and the more conventional indicators were compared in two situations of practical importance: merging data sets from different crystals and selectively rejecting weak observations or (merged) unique reflections from a data set. In these situations controlled `paired-refinement' tests show that even though discarding the weaker data leads to improvements in the mergingRvalues, the refined models based on these data are of lower quality. These results show the folly of such data-filtering practices aimed at improving the mergingRvalues. Interestingly, in all of these tests CC1/2is the one data-quality indicator for which the behaviour accurately reflects which of the alternative data-handling strategies results in the best-quality refined model. Its properties in the presence of systematic error are documented and discussed.

Download Full-text

A Data Quality Model for Web Portals

Electronic Services ◽

10.4018/978-1-61520-967-5.ch048 ◽

2010 ◽

pp. 777-792

Author(s):

Angélica Caro ◽

Coral Calero ◽

Mario Piattini

Keyword(s):

Data Quality ◽

The Internet ◽

Web Portals ◽

Web Portal ◽

Quality Model ◽

Quality Models ◽

Key Aspects ◽

Assess Data Quality ◽

The Web ◽

Data Consumer

Web portals are Internet-based applications that provide a big amount of data. The data consumer who uses the data given by these applications needs to assess data quality. Due to the relevance of data quality on the Web together with the fact that DQ needs to be assessed within the context in which data are generated, data quality models specific to this context are necessary. In this chapter, we will introduce a model for data quality in Web portals (PDQM). PDQM has been built upon the foundation of three key aspects: (1) a set of Web data quality attributes identified in the literature in this area, (2) data quality expectations of data consumers on the Internet, and (3) the functionalities that a Web portal may offer its users.

Download Full-text

A Data Quality Model for Web Portals

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch008 ◽

2010 ◽

pp. 130-144

Author(s):

Angélica Caro ◽

Coral Calero ◽

Mario Piattini

Keyword(s):

Data Quality ◽

The Internet ◽

Web Portals ◽

Web Portal ◽

Quality Model ◽

Quality Models ◽

Key Aspects ◽

Assess Data Quality ◽

The Web ◽

Data Consumer

Web portals are Internet-based applications that provide a big amount of data. The data consumer who uses the data given by these applications needs to assess data quality. Due to the relevance of data quality on the Web together with the fact that DQ needs to be assessed within the context in which data are generated, data quality models specific to this context are necessary. In this chapter, we will introduce a model for data quality in Web portals (PDQM). PDQM has been built upon the foundation of three key aspects: (1) a set of Web data quality attributes identified in the literature in this area, (2) data quality expectations of data consumers on the Internet, and (3) the functionalities that a Web portal may offer its users.

Download Full-text

Portal Quality Issues

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch123 ◽

2011 ◽

pp. 747-754

Author(s):

Mª Ángeles Moraga ◽

Angélica Caro

Keyword(s):

Decision Making ◽

Data Quality ◽

Information Society ◽

Decisive Factor ◽

Competitive Environment ◽

Web Portals ◽

Quality Model ◽

Different Sources ◽

Assess Data Quality

Web portals are emerging Internet-based applications that enable access to different sources (providers). Through portals the organizations develop their businesses within what is a more and more competitive environment. A decisive factor for this competitiveness and for achieving the users’ loyalties is portal quality. In addition, we live in an information society, and the ability to rapidly define and assess data quality of Web portals for decision making provides a potential strategic advantage. With this in mind, our work was focused on quality of Web portals. In this article we present a part of it: a portal quality model and the first phases in the developing of a data quality model for Web portals.

Download Full-text

A novel method to assess data quality in large medical registries and databases

International Journal for Quality in Health Care ◽

10.1093/intqhc/mzy249 ◽

2019 ◽

Vol 31 (7) ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Andreas Perren ◽

Bernard Cerutti ◽

Mark Kaufmann ◽

Hans Ulrich Rothen ◽

Keyword(s):

Data Quality ◽

Error Rates ◽

Specific Information ◽

Data Quality Control ◽

Specific Feedback ◽

Novel Method ◽

Funnel Plots ◽

Data Auditing ◽

No Gold Standard ◽

Assess Data Quality

Abstract Background There is no gold standard to assess data quality in large medical registries. Data auditing may be impeded by data protection regulations. Objective To explore the applicability and usefulness of funnel plots as a novel tool for data quality control in critical care registries. Method The Swiss ICU-Registry from all 77 certified adult Swiss ICUs (2014 and 2015) was subjected to quality assessment (completeness/accuracy). For the analysis of accuracy, a list of logical rules and cross-checks was developed. Type and number of errors (true coding errors or implausible data) were calculated for each ICU, along with noticeable error rates (>mean + 3 SD in the variable’s summary measure, or >99.8% CI in the respective funnel-plot). Results We investigated 164 415 patient records with 31 items each (37 items: trauma diagnosis). Data completeness was excellent; trauma was the only incomplete item in 1495 of 9871 records (0.1%, 0.0%–0.6% [median, IQR]). In 15 572 patients records (9.5%), we found 3121 coding errors and 31 265 implausible situations; the latter primarily due to non-specific information on patients’ provenance/diagnosis or supposed incoherence between diagnosis and treatments. Together, the error rate was 7.6% (5.9%–11%; median, IQR). Conclusions The Swiss ICU-Registry is almost complete and data quality seems to be adequate. We propose funnel plots as suitable, easy to implement instrument to assist in quality assurance of such a registry. Based on our analysis, specific feedback to ICUs with special-cause variation is possible and may promote such ICUs to improve the quality of their data.

Download Full-text

Record Linkage Studies for Postmarketing Drug Surveillance: Data Quality and Validity Considerations

Drug Intelligence & Clinical Pharmacy ◽

10.1177/106002808802200216 ◽

1988 ◽

Vol 22 (2) ◽

pp. 157-161 ◽

Cited By ~ 24

Author(s):

Andy S. Stergachis

Keyword(s):

Data Quality ◽

Record Linkage ◽

Surveillance Data ◽

Postmarketing Surveillance ◽

Linkage Studies ◽

Drug Surveillance ◽

Surveillance Studies ◽

Linkage Methods ◽

Source Of Information ◽

Assess Data Quality

Large automated databases are the source of information for many record linkage studies, including postmarketing drug surveillance. Despite this reliance on prerecorded data, there have been few attempts to assess data quality and validity. This article presents some of the basic data quality and validity issues in applying record linkage methods to postmarketing surveillance. Studies based on prerecorded data, as in most record linkage studies, have all the inherent problems of the data from which they are derived. Sources of threats to the validity of record linkage studies include the completeness of data, the ability to accurately identify and follow the records of individuals through time and place, and the validity of data. This article also describes techniques for evaluating data quality and validity. Postmarketing surveillance could benefit from more attention to identifying and solving the problems associated with record linkage studies.

Download Full-text