Data Quality (Poor Quality Data: The Fly in the Data Analytics Ointment)

Data quality (DQ) is the degree to which a given dataset meets a user’s requirements. In the primary healthcare setting, poor quality data can lead to poor patient care, negatively affect the validity and reproducibility of research results and limit the value that such data may have for public health surveillance. To extract reliable and useful information from a large quantity of data and to make more effective and informed decisions, data should be as clean and free of errors as possible. Moreover, because DQ is defined within the context of different user requirements that often change, DQ should be considered to be an emergent construct. As such, we cannot expect that a sufficient level of DQ will last forever. Therefore, the quality of clinical data should be constantly assessed and reassessed in an iterative fashion to ensure that appropriate levels of quality are sustained in an acceptable and transparent manner. This document is based on our hands-on experiences dealing with DQ improvement for the Canadian Primary Care Sentinel Surveillance Network database. The DQ dimensions that are discussed here are accuracy and precision, completeness and comprehensiveness, consistency, timeliness, uniqueness, data cleaning and coherence.

Download Full-text

Multi-Party Privacy-Preserving Logistic Regression with Poor Quality Data Filtering for IoT Contributors

Electronics ◽

10.3390/electronics10172049 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2049

Author(s):

Kennedy Edemacu ◽

Jong Wook Kim

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Data Quality ◽

Logistic Regression Model ◽

Homomorphic Encryption ◽

Poor Quality ◽

Privacy Preserving ◽

Quality Data ◽

Data Filtering ◽

Poor Quality Data

Nowadays, the internet of things (IoT) is used to generate data in several application domains. A logistic regression, which is a standard machine learning algorithm with a wide application range, is built on such data. Nevertheless, building a powerful and effective logistic regression model requires large amounts of data. Thus, collaboration between multiple IoT participants has often been the go-to approach. However, privacy concerns and poor data quality are two challenges that threaten the success of such a setting. Several studies have proposed different methods to address the privacy concern but to the best of our knowledge, little attention has been paid towards addressing the poor data quality problems in the multi-party logistic regression model. Thus, in this study, we propose a multi-party privacy-preserving logistic regression framework with poor quality data filtering for IoT data contributors to address both problems. Specifically, we propose a new metric gradient similarity in a distributed setting that we employ to filter out parameters from data contributors with poor quality data. To solve the privacy challenge, we employ homomorphic encryption. Theoretical analysis and experimental evaluations using real-world datasets demonstrate that our proposed framework is privacy-preserving and robust against poor quality data.

Download Full-text

R factors in Rietveld analysis: How good is good enough?

Powder Diffraction ◽

10.1154/1.2179804 ◽

2006 ◽

Vol 21 (1) ◽

pp. 67-70 ◽

Cited By ~ 508

Author(s):

Brian H. Toby

Keyword(s):

Rietveld Analysis ◽

Poor Quality ◽

Quality Data ◽

High Quality ◽

High Quality Data ◽

Poor Quality Data ◽

Error Index ◽

R Factors ◽

Very High

The definitions for important Rietveld error indices are defined and discussed. It is shown that while smaller error index values indicate a better fit of a model to the data, wrong models with poor quality data may exhibit smaller values error index values than some superb models with very high quality data.

Download Full-text

TRAC's Report Claiming “Surprising Judge-to-Judge Variation” Fails to Compare Similar Cases, Relies on Poor Quality Data, Uses an Unreliable Method of Identifying Case Type, Uses Incorrect Methods of Reporting Sentence Length, and Contains Numerous Errors

Federal Sentencing Reporter ◽

10.1525/fsr.2012.25.1.20 ◽

2012 ◽

Vol 25 (1) ◽

pp. 20-30

Keyword(s):

Poor Quality ◽

Sentence Length ◽

Quality Data ◽

Poor Quality Data ◽

Case Type ◽

Unreliable Method

Download Full-text

The creation and use of big administrative data

Data in Society ◽

10.1332/policypress/9781447348214.003.0003 ◽

2019 ◽

pp. 23-34

Author(s):

Harvey Goldstein ◽

Ruth Gilbert

Keyword(s):

Administrative Data ◽

Poor Quality ◽

Quality Data ◽

Data Repositories ◽

Public And Private ◽

The Public ◽

Public Benefit ◽

Public Benefits ◽

Poor Quality Data ◽

Few Data

his chapter addresses data linkage which is key to using big administrative datasets to improve efficient and equitable services and policies. These benefits need to weigh against potential harms, which have mainly focussed on privacy. In this chapter we argue for the public and researchers to be alert also to other kinds of harms. These include misuses of big administrative data through poor quality data, misleading analyses, misinterpretation or misuse of findings, and restrictions limiting what questions can be asked and by whom, resulting in research not achieved and advances not made for the public benefit. Ensuring that big administrative data are validly used for public benefit requires increased transparency about who has access and whose access is denied, how data are processed, linked and analysed, and how analyses or algorithms are used in public and private services. Public benefits and especially trust require replicable analyses by many researchers not just a few data controllers. Wider use of big data will be helped by establishing a number of safe data repositories, fully accessible to researchers and their tools, and independent of the current monopolies on data processing, linkage, enhancement and uses of data.

Download Full-text

Poor quality data are major obstacle to improving road safety, says World Bank

BMJ ◽

10.1136/bmj.324.7346.1116/a ◽

2002 ◽

Vol 324 (7346) ◽

pp. 1116a-1116 ◽

Cited By ~ 3

Keyword(s):

World Bank ◽

Road Safety ◽

Poor Quality ◽

Major Obstacle ◽

Quality Data ◽

Poor Quality Data

Download Full-text

Lichen elements as pollution indicators: evaluation of methods for large monitoring programmes

The Lichenologist ◽

10.1017/s0024282917000299 ◽

2017 ◽

Vol 49 (4) ◽

pp. 415-424 ◽

Cited By ~ 5

Author(s):

Susan WILL-WOLF ◽

Sarah JOVAN ◽

Michael C. AMACHER

Keyword(s):

Poor Quality ◽

Community Context ◽

Quality Data ◽

Permanent Plots ◽

Target Species ◽

Element Analysis ◽

Lichen Community ◽

Poor Quality Data ◽

Specialist Field ◽

Flavoparmelia Caperata

AbstractLichen element content is a reliable indicator for relative air pollution load in research and monitoring programmes requiring both efficiency and representation of many sites. We tested the value of costly rigorous field and handling protocols for sample element analysis using five lichen species. No relaxation of rigour was supported; four relaxed protocols generated data significantly different from rigorous protocols for many of the 20 validated elements. Minimally restrictive site selection criteria gave quality data from 86% of 81 permanent plots in northern Midwest USA; more restrictive criteria would likely reduce indicator reliability. Use of trained non-specialist field collectors was supported when target species choice considers the lichen community context. Evernia mesomorpha, Flavoparmelia caperata and Physcia aipolia/stellaris were successful target species. Non-specialists were less successful at distinguishing Parmelia sulcata and Punctelia rudecta from lookalikes, leading to few samples and some poor quality data.

Download Full-text

Can gas sand have a large Poisson’s ratio?

Geophysics ◽

10.1190/1.2821820 ◽

2008 ◽

Vol 73 (2) ◽

pp. E51-E57 ◽

Cited By ~ 3

Author(s):

Jack P. Dvorkin

Keyword(s):

Poisson’S Ratio ◽

Elastic Anisotropy ◽

Laboratory Data ◽

Poor Quality ◽

Poisson's Ratio ◽

Quality Data ◽

Poor Quality Data ◽

Well Data ◽

Seismic Scale

Laboratory data supported by granular-medium and inclusion theories indicate that Poisson’s ratio in gas-saturated sand lies within a range of 0–0.25, with typical values of approximately 0.15. However, some well log measurements, especially in slow gas formations, persistently produce a Poisson’s ratio as large as 0.3. If this measurement is not caused by poor-quality data, three in situ situations — patchy saturation, subresolution thin layering, and elastic anisotropy — provide a plausible explanation. In the patchy saturation situation, the well data must be corrected to produce realistic synthetic seismic traces. In the second and third cases, the effect observed in a well is likely to persist at the seismic scale.

Download Full-text

Comparative refinement of correct and incorrect structural models of tetrabutylammonium tetrabutylborate – pitfalls arising from poor-quality data

Acta Crystallographica Section A Foundations of Crystallography ◽

10.1107/s0108767310013814 ◽

2010 ◽

Vol 66 (4) ◽

pp. 441-445 ◽

Cited By ~ 2

Author(s):

Vladimir Stilinović ◽

Branko Kaitner

Keyword(s):

Structural Models ◽

Poor Quality ◽

Quality Data ◽

Poor Quality Data

Download Full-text