Investigation and reporting of Data Quality within and between linked SAIL datasets

ABSTRACTObjectivesThe SAIL databank brings together a range of datasets gathered primarily for administrative rather than research processes. These datasets contain information regarding different aspects of an individual’s contact with services which when combined form a detailed health record for individuals living (or deceased) in Wales. Understanding the quality of data in SAIL supports the research process by providing a level of assurance about the robustness of data, identifying and describing where there may be sources of potential bias due to invalid, incomplete, inconsistent or inaccurate data and therefore helping to increase the accuracy of research using these data. Designing processes to investigate and report on data quality within and between multiple datasets can be a time-consuming task to undertake; it requires a high degree of effort to ensure it is genuinely meaningful and useful to SAIL users and may require a range of different approaches. ApproachData quality tests for each dataset were written, considering a range of data quality dimensions including validity, consistency, accuracy and completeness. Tests were designed to capture not just the quality of data within each dataset, but also to assess consistency of data items between datasets. SQL scripts were written to test each of these aspects: in order to minimise repetition, automated processes were implemented where appropriate. Batch automation was used to called SQL stored procedures, which utilise metadata to generate dynamic SQL. The metadata (created as part of the data quality process) describes each dataset and the measurement parameters used to assess each field within the dataset. However automation on its own is insufficient and data quality process outputs require scrutiny and oversight to ensure they are actually capturing what they set out to do. SAIL users were consulted on the development of the data quality reports to ensure usability and appropriateness to support data utilisation for research. ResultsThe data quality reporting process is beneficial to the SAIL databank as it provides additional information to support the research process and in some cases may act as a diagnostic tool, detecting problems with data which can then be rectified. ConclusionThe development of data quality processes in SAIL is ongoing, and changes or developments in each dataset lead to new requirements for data quality measurement and reporting. A vital component of the process is the production of output that is genuinely meaningful and useful.

Download Full-text

Designing Information Product (IP) Maps On the Process of Data Processing and Academic Information

International Journal of New Media Technology ◽

10.31937/ijnmt.v4i1.534 ◽

2017 ◽

Vol 4 (1) ◽

pp. 25-31 ◽

Cited By ~ 1

Author(s):

Diana Effendi

Keyword(s):

Data Quality ◽

Data Management ◽

Information Management ◽

Information Quality ◽

Quality Data ◽

Management Approach ◽

Quality Of Data ◽

Information Product ◽

Academic Activities

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management. Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.

Download Full-text

Response Behavior and Quality of Survey Data: Comparing Elderly Respondents in Institutions and Private Households

Sociological Methods & Research ◽

10.1177/0049124121995534 ◽

2021 ◽

pp. 004912412199553

Author(s):

Jan-Lucas Schanze

Keyword(s):

Data Quality ◽

The Elderly ◽

Response Behavior ◽

Quality Of Data ◽

Social Surveys ◽

Private Households ◽

Confounding Variables ◽

Health Related ◽

Survey Interviews

An increasing age of respondents and cognitive impairment are usual suspects for increasing difficulties in survey interviews and a decreasing data quality. This is why survey researchers tend to label residents in retirement and nursing homes as hard-to-interview and exclude them from most social surveys. In this article, I examine to what extent this label is justified and whether quality of data collected among residents in institutions for the elderly really differs from data collected within private households. For this purpose, I analyze the response behavior and quality indicators in three waves of Survey of Health, Ageing and Retirement in Europe. To control for confounding variables, I use propensity score matching to identify respondents in private households who share similar characteristics with institutionalized residents. My results confirm that most indicators of response behavior and data quality are worse in institutions compared to private households. However, when controlling for sociodemographic and health-related variables, differences get very small. These results suggest the importance of health for the data quality irrespective of the housing situation.

Download Full-text

A framework for the evaluation and use of alternative data in the Consumer Expenditure Surveys

Monthly Labor Review ◽

10.21916/mlr.2021.2 ◽

1915 ◽

Author(s):

Laura Erhard ◽

Brett McBride ◽

Adam safir

Keyword(s):

Strategic Plan ◽

Consumer Expenditure ◽

Quality Of Data ◽

Additional Information ◽

Complementary Role ◽

Labor Statistics ◽

Weighting Procedures ◽

Bureau Of Labor Statistics ◽

The U.S

As part of the implementation of its strategic plan, the U.S. Bureau of Labor Statistics (BLS) has increasingly studied the issue of using alternative data to improve both the quality of its data and the process by which those data are collected. The plan includes the goal of integrating alternative data into BLS programs. This article describes the framework used by the BLS Consumer Expenditure Surveys (CE) program and the potential these data hold for complementing data collected in traditional formats. It also addresses some of the challenges BLS faces when using alternative data and the complementary role that alternative data play in improving the quality of data currently collected. Alternative data can substitute for what is presently being collected from respondents and provide additional information to supplement the variables the CE program produces or to adjust the CE program’s processing and weighting procedures.

Download Full-text

How do you measure up? Methods to assess linkage quality

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.152 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Anna Ferrante ◽

James Boyd ◽

Sean Randall ◽

Adrian Brown ◽

James Semmens

Keyword(s):

Record Linkage ◽

Service Use ◽

Performance Metrics ◽

Population Level ◽

Additional Information ◽

Linkage Quality ◽

Health And Disease ◽

The Impact ◽

Quality Process

ABSTRACT ObjectivesRecord linkage is a powerful technique which transforms discrete episode data into longitudinal person-based records. These records enable the construction and analysis of complex pathways of health and disease progression, and service use. Achieving high linkage quality is essential for ensuring the quality and integrity of research based on linked data. The methods used to assess linkage quality will depend on the volume and characteristics of the datasets involved, the processes used for linkage and the additional information available for quality assessment. This paper proposes and evaluates two methods to routinely assess linkage quality. ApproachLinkage units currently use a range of methods to measure, monitor and improve linkage quality; however, no common approach or standards exist. There is an urgent need to develop “best practices” in evaluating, reporting and benchmarking linkage quality. In assessing linkage quality, of primary interest is in knowing the number of true matches and non-matches identified as links and non-links. Any misclassification of matches within these groups introduces linkage errors. We present efforts to develop sharable methods to measure linkage quality in Australia. This includes a sampling-based method to estimate both precision (accuracy) and recall (sensitivity) following record linkage and a benchmarking method - a transparent and transportable methodology to benchmark the quality of linkages across different operational environments. ResultsThe sampling-based method achieved estimates of linkage quality that were very close to actual linkage quality metrics. This method presents as a feasible means of accurately estimating matching quality and refining linkages in population level linkage studies. The benchmarking method provides a systematic approach to estimating linkage quality with a set of open and shareable datasets and a set of well-defined, established performance metrics. The method provides an opportunity to benchmark the linkage quality of different record linkage operations. Both methods have the potential to assess the inter-rater reliability of clerical reviews. ConclusionsBoth methods produce reliable estimates of linkage quality enabling the exchange of information within and between linkage communities. It is important that researchers can assess risk in studies using record linkage techniques. Understanding the impact of linkage quality on research outputs highlights a need for standard methods to routinely measure linkage quality. These two methods provide a good start to the quality process, but it is important to identify standards and good practices in all parts of the linkage process (pre-processing, standardising activities, linkage, grouping and extracting).

Download Full-text

Nautical electronic maps of S-411 standard and their suitability in navigation for assessment of ice cover condition of the Arctic Ocean

Polish Cartographical Review ◽

10.1515/pcr-2016-0002 ◽

2016 ◽

Vol 48 (1) ◽

pp. 17-28 ◽

Cited By ~ 2

Author(s):

Tadeusz Pastusiak

Keyword(s):

Arctic Ocean ◽

Ice Cover ◽

Research Process ◽

Data Sources ◽

The Arctic ◽

Quality Of Data ◽

Vector Maps ◽

The Arctic Ocean ◽

Wide Availability

Abstract The research on the ice cover of waterways, rivers, lakes, seas and oceans by satellite remote sensing methods began at the end of the twentieth century. There was a lot of data sources in diverse file formats. It has not yet carried out a comparative assessment of their usefulness. A synthetic indicator of the quality of data sources binding maps resolution, file publication, time delay and the functionality for the user was developed in the research process. It reflects well a usefulness of maps and allows to compare them. Qualitative differences of map content have relatively little impact on the overall assessment of the data sources. Resolution of map is generally acceptable. Actuality has the greatest impact on the map content quality for the current vessel’s voyage planning in ice. The highest quality of all studied sources have the regional maps in GIF format issued by the NWS / NOAA, general maps of the Arctic Ocean in NetCDF format issued by the OSI SAF and the general maps of the Arctic Ocean in GRIB-2 format issued by the NCEP / NOAA. Among them are maps containing information on the quality of presented parameter. The leader among the map containing all three of the basic characteristics of ice cover (ice concentration, ice thickness and ice floe size) are vector maps in GML format. They are the new standard of electronic vector maps for the navigation of ships in ice. Publishing of ice cover maps in the standard electronic map format S-411 for navigation of vessels in ice adopted by the International Hydrographic Organization is advisable in case is planned to launch commercial navigation on the lagoons, rivers and canals. The wide availability of and exchange of information on the state of ice cover on rivers, lakes, estuaries and bays, which are used exclusively for water sports, ice sports and ice fishing is possible using handheld mobile phones, smartphones and tablets.

Download Full-text

Integrating Experimental and Analytic Approaches to Improve Data Quality in Genome-wide RNAi Screens

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057108317145 ◽

2008 ◽

Vol 13 (5) ◽

pp. 378-389 ◽

Cited By ~ 26

Author(s):

Xiaohua Douglas Zhang ◽

Amy S. Espeseth ◽

Eric N. Johnson ◽

Jayne Chin ◽

Adam Gates ◽

...

Keyword(s):

Data Quality ◽

High Throughput Screening ◽

Daily Practice ◽

Systematic Errors ◽

Quality Of Data ◽

Improve Data Quality ◽

Research Journal ◽

Genome Wide ◽

Assess Data Quality

RNA interference (RNAi) not only plays an important role in drug discovery but can also be developed directly into drugs. RNAi high-throughput screening (HTS) biotechnology allows us to conduct genome-wide RNAi research. A central challenge in genome-wide RNAi research is to integrate both experimental and computational approaches to obtain high quality RNAi HTS assays. Based on our daily practice in RNAi HTS experiments, we propose the implementation of 3 experimental and analytic processes to improve the quality of data from RNAi HTS biotechnology: (1) select effective biological controls; (2) adopt appropriate plate designs to display and/or adjust for systematic errors of measurement; and (3) use effective analytic metrics to assess data quality. The applications in 5 real RNAi HTS experiments demonstrate the effectiveness of integrating these processes to improve data quality. Due to the effectiveness in improving data quality in RNAi HTS experiments, the methods and guidelines contained in the 3 experimental and analytic processes are likely to have broad utility in genome-wide RNAi research. ( Journal of Biomolecular Screening 2008:378-389)

Download Full-text

Peningkatan Kualitas Data Bidang Tanah di Kantor Pertanahan Kota Administrasi Jakarta Selatan

Tunas Agraria ◽

10.31292/jta.v4i2.143 ◽

2021 ◽

Vol 4 (2) ◽

pp. 168-174

Author(s):

Maslusatun Mawadah

Keyword(s):

Quality Improvement ◽

Data Quality ◽

Research Method ◽

The South ◽

Quality Of Data ◽

Regional Division ◽

Land Administration ◽

Problems And Solutions ◽

Descriptive Approach

The South Jakarta Administrative City Land Office is one of the cities targeted to be a city with complete land administration in 2020. The current condition of land parcel data demands an update, namely improving the quality of data from KW1 to KW6 towards KW1 valid. The purpose of this study is to determine the condition of land data quality in South Jakarta, the implementation of data quality improvement, as well as problems and solutions in implementing data quality improvement. The research method used is qualitative with a descriptive approach. The results showed that the condition of the data quality after the implementation of the improvement, namely KW1 increased from 86.45% to 87.01%. The roles of man, material, machine, and method have been fulfilled and the implementation of data quality improvement is not in accordance with the 2019 Complete City Guidelines in terms of territorial boundary inventory, and there are still obstacles in the implementation of improving the quality of land parcel data, namely the absence of buku tanah, surat ukur, and gambar ukur at the land office, the existence of regional division, the boundaries of the sub district are not yet certain, and the existence of land parcels that have been separated from mapping without being noticed by the office administrator.

Download Full-text

Data Quality Associated with Big Data Processing: A Survey

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05386 ◽

2021 ◽

Vol 23 (06) ◽

pp. 1011-1018

Author(s):

Aishrith P Rao ◽

◽

Raghavendra J C ◽

Dr. Sowmyarani C N ◽

Dr. Padmashree T ◽

...

Keyword(s):

Big Data ◽

Data Quality ◽

Data Gathering ◽

Cost Effective ◽

Data Repository ◽

Critical Approach ◽

Critical Aspect ◽

Quality Of Data ◽

Data Group

With the advancement of technology and the large volume of data produced, processed, and stored, it is becoming increasingly important to maintain the quality of data in a cost-effective and productive manner. The most important aspects of Big Data (BD) are storage, processing, privacy, and analytics. The Big Data group has identified quality as a critical aspect of its maturity. Nonetheless, it is a critical approach that should be adopted early in the lifecycle and gradually extended to other primary processes. Companies are very reliant and drive profits from the huge amounts of data they collect. When its consistency deteriorates, the ramifications are uncertain and may result in completely undesirable conclusions. In the sense of BD, determining data quality is difficult, but it is essential that we uphold the data quality before we can proceed with any analytics. We investigate data quality during the stages of data gathering, preprocessing, data repository, and evaluation/analysis of BD processing in this paper. The related solutions are also suggested based on the elaboration and review of the proposed problems.

Download Full-text

Towards An Objective Assessment Framework for Linked Data Quality

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2016070104 ◽

2016 ◽

Vol 12 (3) ◽

pp. 111-133 ◽

Cited By ~ 7

Author(s):

Ahmad Assaf ◽

Aline Senart ◽

Raphaël Troncy

Keyword(s):

Data Quality ◽

Linked Data ◽

Objective Assessment ◽

Open Data ◽

Quality Measurement ◽

Measurement Tool ◽

Assessment Framework ◽

Complex Process ◽

Structured Information

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.

Download Full-text

Quality of Open Research Data: Values, Convergences and Governance

Information ◽

10.3390/info11040175 ◽

2020 ◽

Vol 11 (4) ◽

pp. 175 ◽

Cited By ~ 3

Author(s):

Tibor Koltay

Keyword(s):

Big Data ◽

Data Quality ◽

Academic Research ◽

Research Data ◽

Data Governance ◽

Quality Of Data ◽

Open Research ◽

Research Environments

This paper focuses on the characteristics of research data quality, and aims to cover the most important issues related to it, giving particular attention to its attributes and to data governance. The corporate word’s considerable interest in the quality of data is obvious in several thoughts and issues reported in business-related publications, even if there are apparent differences between values and approaches to data in corporate and in academic (research) environments. The paper also takes into consideration that addressing data quality would be unimaginable without considering big data.

Download Full-text