The creation and use of big administrative data

Data in Society ◽

10.1332/policypress/9781447348214.003.0003 ◽

2019 ◽

pp. 23-34

Author(s):

Harvey Goldstein ◽

Ruth Gilbert

Keyword(s):

Administrative Data ◽

Poor Quality ◽

Quality Data ◽

Data Repositories ◽

Public And Private ◽

The Public ◽

Public Benefit ◽

Public Benefits ◽

Poor Quality Data ◽

Few Data

his chapter addresses data linkage which is key to using big administrative datasets to improve efficient and equitable services and policies. These benefits need to weigh against potential harms, which have mainly focussed on privacy. In this chapter we argue for the public and researchers to be alert also to other kinds of harms. These include misuses of big administrative data through poor quality data, misleading analyses, misinterpretation or misuse of findings, and restrictions limiting what questions can be asked and by whom, resulting in research not achieved and advances not made for the public benefit. Ensuring that big administrative data are validly used for public benefit requires increased transparency about who has access and whose access is denied, how data are processed, linked and analysed, and how analyses or algorithms are used in public and private services. Public benefits and especially trust require replicable analyses by many researchers not just a few data controllers. Wider use of big data will be helped by establishing a number of safe data repositories, fully accessible to researchers and their tools, and independent of the current monopolies on data processing, linkage, enhancement and uses of data.

Download Full-text

R factors in Rietveld analysis: How good is good enough?

Powder Diffraction ◽

10.1154/1.2179804 ◽

2006 ◽

Vol 21 (1) ◽

pp. 67-70 ◽

Cited By ~ 508

Author(s):

Brian H. Toby

Keyword(s):

Rietveld Analysis ◽

Poor Quality ◽

Quality Data ◽

High Quality ◽

High Quality Data ◽

Poor Quality Data ◽

Error Index ◽

R Factors ◽

Very High

The definitions for important Rietveld error indices are defined and discussed. It is shown that while smaller error index values indicate a better fit of a model to the data, wrong models with poor quality data may exhibit smaller values error index values than some superb models with very high quality data.

Download Full-text

TRAC's Report Claiming “Surprising Judge-to-Judge Variation” Fails to Compare Similar Cases, Relies on Poor Quality Data, Uses an Unreliable Method of Identifying Case Type, Uses Incorrect Methods of Reporting Sentence Length, and Contains Numerous Errors

Federal Sentencing Reporter ◽

10.1525/fsr.2012.25.1.20 ◽

2012 ◽

Vol 25 (1) ◽

pp. 20-30

Keyword(s):

Poor Quality ◽

Sentence Length ◽

Quality Data ◽

Poor Quality Data ◽

Case Type ◽

Unreliable Method

Download Full-text

Poor quality data are major obstacle to improving road safety, says World Bank

BMJ ◽

10.1136/bmj.324.7346.1116/a ◽

2002 ◽

Vol 324 (7346) ◽

pp. 1116a-1116 ◽

Cited By ~ 3

Keyword(s):

World Bank ◽

Road Safety ◽

Poor Quality ◽

Major Obstacle ◽

Quality Data ◽

Poor Quality Data

Download Full-text

Lichen elements as pollution indicators: evaluation of methods for large monitoring programmes

The Lichenologist ◽

10.1017/s0024282917000299 ◽

2017 ◽

Vol 49 (4) ◽

pp. 415-424 ◽

Cited By ~ 5

Author(s):

Susan WILL-WOLF ◽

Sarah JOVAN ◽

Michael C. AMACHER

Keyword(s):

Poor Quality ◽

Community Context ◽

Quality Data ◽

Permanent Plots ◽

Target Species ◽

Element Analysis ◽

Lichen Community ◽

Poor Quality Data ◽

Specialist Field ◽

Flavoparmelia Caperata

AbstractLichen element content is a reliable indicator for relative air pollution load in research and monitoring programmes requiring both efficiency and representation of many sites. We tested the value of costly rigorous field and handling protocols for sample element analysis using five lichen species. No relaxation of rigour was supported; four relaxed protocols generated data significantly different from rigorous protocols for many of the 20 validated elements. Minimally restrictive site selection criteria gave quality data from 86% of 81 permanent plots in northern Midwest USA; more restrictive criteria would likely reduce indicator reliability. Use of trained non-specialist field collectors was supported when target species choice considers the lichen community context. Evernia mesomorpha, Flavoparmelia caperata and Physcia aipolia/stellaris were successful target species. Non-specialists were less successful at distinguishing Parmelia sulcata and Punctelia rudecta from lookalikes, leading to few samples and some poor quality data.

Download Full-text

Can gas sand have a large Poisson’s ratio?

Geophysics ◽

10.1190/1.2821820 ◽

2008 ◽

Vol 73 (2) ◽

pp. E51-E57 ◽

Cited By ~ 3

Author(s):

Jack P. Dvorkin

Keyword(s):

Poisson’S Ratio ◽

Elastic Anisotropy ◽

Laboratory Data ◽

Poor Quality ◽

Poisson's Ratio ◽

Quality Data ◽

Poor Quality Data ◽

Well Data ◽

Seismic Scale

Laboratory data supported by granular-medium and inclusion theories indicate that Poisson’s ratio in gas-saturated sand lies within a range of 0–0.25, with typical values of approximately 0.15. However, some well log measurements, especially in slow gas formations, persistently produce a Poisson’s ratio as large as 0.3. If this measurement is not caused by poor-quality data, three in situ situations — patchy saturation, subresolution thin layering, and elastic anisotropy — provide a plausible explanation. In the patchy saturation situation, the well data must be corrected to produce realistic synthetic seismic traces. In the second and third cases, the effect observed in a well is likely to persist at the seismic scale.

Download Full-text

Comparative refinement of correct and incorrect structural models of tetrabutylammonium tetrabutylborate – pitfalls arising from poor-quality data

Acta Crystallographica Section A Foundations of Crystallography ◽

10.1107/s0108767310013814 ◽

2010 ◽

Vol 66 (4) ◽

pp. 441-445 ◽

Cited By ~ 2

Author(s):

Vladimir Stilinović ◽

Branko Kaitner

Keyword(s):

Structural Models ◽

Poor Quality ◽

Quality Data ◽

Poor Quality Data

Download Full-text

Creating Informative Data Warehouses: Exploring Data and Information Quality through Data Mining

10.28945/2584 ◽

2002 ◽

Author(s):

Herna L. Viktor ◽

Wayne Motha

Keyword(s):

Data Mining ◽

Information Quality ◽

Data Warehousing ◽

Poor Quality ◽

Quality Data ◽

Data Warehouses ◽

Quality Of Data ◽

Poor Quality Data ◽

Daunting Task

Increasingly, large organizations are engaging in data warehousing projects in order to achieve a competitive advantage through the exploration of the information as contained therein. It is therefore paramount to ensure that the data warehouse includes high quality data. However, practitioners agree that the improvement of the quality of data in an organization is a daunting task. This is especially evident in data warehousing projects, which are often initiated “after the fact”. The slightest suspicion of poor quality data often hinders managers from reaching decisions, when they waste hours in discussions to determine what portion of the data should be trusted. Augmenting data warehousing with data mining methods offers a mechanism to explore these vast repositories, enabling decision makers to assess the quality of their data and to unlock a wealth of new knowledge. These methods can be effectively used with inconsistent, noisy and incomplete data that are commonplace in data warehouses.

Download Full-text

A mobile crowd sensing framework for suspect investigation: An objectivity analysis and de-identification approach

Computer Science and Information Systems ◽

10.2298/csis190427039e ◽

2020 ◽

Vol 17 (1) ◽

pp. 253-269

Author(s):

Alaoui El ◽

Fazziki El ◽

Fatima Ennaji ◽

Mohamed Sadgal

Keyword(s):

Smart Cities ◽

Poor Quality ◽

Quality Data ◽

Crowd Sensing ◽

Private Data ◽

Poor Quality Data ◽

Identification Approach ◽

Mobile Crowd ◽

Share Information

The ubiquity of mobile devices and their advanced features have increased the use of crowdsourcing in many areas, such as the mobility in the smart cities. With the advent of high-quality sensors on smartphones, online communities can easily collect and share information. These information are of great importance for the institutions, which must analyze the facts by facilitating the data collecting on crimes and criminals, for example. This paper proposes an approach to develop a crowdsensing framework allowing a wider collaboration between the citizens and the authorities. In addition, this framework takes advantage of an objectivity analysis to ensure the participants? credibility and the information reliability, as law enforcement is often affected by unreliable and poor quality data. In addition, the proposed framework ensures the protection of users' private data through a de-identification process. Experimental results show that the proposed framework is an interesting tool to improve the quality of crowdsensing information in a government context.

Download Full-text

Data Cleaning in Cloud Platform

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a3088.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 2535-2539

Keyword(s):

Data Cleaning ◽

Data Association ◽

Poor Quality ◽

Quality Data ◽

Quality Of Data ◽

Useful Knowledge ◽

Big Data Applications ◽

Poor Quality Data ◽

Free Data ◽

The Given

: Data is very valuable and it is generated in large volumes. The Use of high-quality data for making quality decisions has become a huge task which helps people to make better decisions, analysis, predictions. We are surrounded by data with errors, Data cleaning is a delayed, complicated task and considered costly. Data polishing is important since it is necessary to remove errors from the data before transferring to the data warehouse since poor quality data is eliminated to get the desired results. The Error-free data will produce precise and accurate results when queried. Hence consistent and proper data is required for the decision making. The characteristics of data polishing is data repairing and data association. Identifying the homogeneous object and linking it to the most associated object is defined as Association. The process of making the database reliable by repairing and finding the faults is defined as repairing. In the case of big data applications, we do not use all the existing data, we use only subsets of appropriate data. Association is the process of converting extensive amounts of raw data to subsets of appropriate data that are useful. Once we get the appropriate data, the available data is analyzed and it leads to knowledge [14]. Multiple approaches are used to associate the given data and to achieve meaningful and useful knowledge to fix or repair [12]. Maintaining polished quality of data is referred to as data polishing. Usually the objectives of data polishing are not properly defined. This paper will discuss the goals of data cleaning and different approaches for data cleaning platforms

Download Full-text

Challenges Associated with Cross-Jurisdictional Analyses using Administrative Health Data and Primary Care Electronic Medical Records in Canada

International Journal for Population Data Science ◽

10.23889/ijpds.v3i3.437 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Alan Katz ◽

Jennifer Enns ◽

Sabrina T Wong ◽

Tyler Williamson ◽

Alexander Singer ◽

...

Keyword(s):

British Columbia ◽

Social Services ◽

Administrative Data ◽

Data Access ◽

Health Data ◽

Quality Data ◽

Data Governance ◽

Quality Health Care ◽

Data Repositories ◽

Administrative Health Data

Over the last 30 years, public investments in Canada and many other countries have created clinical and administrative health data repositories to support research on health and social services, population health and health policy. However, there is limited capacity to share and use data across jurisdictional boundaries, in part because of inefficient and cumbersome procedures to access these data and gain approval for their use in research. A lack of harmonization among variables and indicators makes it difficult to compare research among jurisdictions. These challenges affect the quality, scope, and impact of work that could be done. The purpose of this paper is to compare and contrast the data access procedures in three Canadian jurisdictions (Manitoba, Alberta and British Columbia), and to describe how we addressed the challenges presented by differences in data governance and architecture in a Canadian cross-jurisdictional research study. We characterize common stages in gaining access to administrative data among jurisdictions, including obtaining ethics approval, applying for data access from data custodians, and ensuring the extracted data is released to accredited individuals in secure data environments. We identify advantages of Manitoba’s flexible ‘stewardship’ model over the more restrictive ‘custodianship’ model in British Columbia, and highlight the importance of communication between analysts in each jurisdiction to compensate for differences in coding variables and poor quality data. Researchers and system planners must have access to and be able to make effective use of administrative health data to ensure that Canadians continue to have access to high-quality health care and benefit from effective health policies. The considerable benefits of collaborative population-based research that spans jurisdictional borders have been recognized by the Canadian Institutes for Health Research in their recent call for the creation of a National Data Platform to resolve many of the issues in harmonization and validation of administrative data elements.

Download Full-text