scholarly journals Investigation and reporting of Data Quality within and between linked SAIL datasets

Author(s):  
Sarah Rees ◽  
Arfon Rees

ABSTRACTObjectivesThe SAIL databank brings together a range of datasets gathered primarily for administrative rather than research processes. These datasets contain information regarding different aspects of an individual’s contact with services which when combined form a detailed health record for individuals living (or deceased) in Wales. Understanding the quality of data in SAIL supports the research process by providing a level of assurance about the robustness of data, identifying and describing where there may be sources of potential bias due to invalid, incomplete, inconsistent or inaccurate data and therefore helping to increase the accuracy of research using these data. Designing processes to investigate and report on data quality within and between multiple datasets can be a time-consuming task to undertake; it requires a high degree of effort to ensure it is genuinely meaningful and useful to SAIL users and may require a range of different approaches. ApproachData quality tests for each dataset were written, considering a range of data quality dimensions including validity, consistency, accuracy and completeness. Tests were designed to capture not just the quality of data within each dataset, but also to assess consistency of data items between datasets. SQL scripts were written to test each of these aspects: in order to minimise repetition, automated processes were implemented where appropriate. Batch automation was used to called SQL stored procedures, which utilise metadata to generate dynamic SQL. The metadata (created as part of the data quality process) describes each dataset and the measurement parameters used to assess each field within the dataset. However automation on its own is insufficient and data quality process outputs require scrutiny and oversight to ensure they are actually capturing what they set out to do. SAIL users were consulted on the development of the data quality reports to ensure usability and appropriateness to support data utilisation for research. ResultsThe data quality reporting process is beneficial to the SAIL databank as it provides additional information to support the research process and in some cases may act as a diagnostic tool, detecting problems with data which can then be rectified. ConclusionThe development of data quality processes in SAIL is ongoing, and changes or developments in each dataset lead to new requirements for data quality measurement and reporting. A vital component of the process is the production of output that is genuinely meaningful and useful.

2017 ◽  
Vol 4 (1) ◽  
pp. 25-31 ◽  
Author(s):  
Diana Effendi

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The  process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management.   Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.


2021 ◽  
pp. 004912412199553
Author(s):  
Jan-Lucas Schanze

An increasing age of respondents and cognitive impairment are usual suspects for increasing difficulties in survey interviews and a decreasing data quality. This is why survey researchers tend to label residents in retirement and nursing homes as hard-to-interview and exclude them from most social surveys. In this article, I examine to what extent this label is justified and whether quality of data collected among residents in institutions for the elderly really differs from data collected within private households. For this purpose, I analyze the response behavior and quality indicators in three waves of Survey of Health, Ageing and Retirement in Europe. To control for confounding variables, I use propensity score matching to identify respondents in private households who share similar characteristics with institutionalized residents. My results confirm that most indicators of response behavior and data quality are worse in institutions compared to private households. However, when controlling for sociodemographic and health-related variables, differences get very small. These results suggest the importance of health for the data quality irrespective of the housing situation.


1915 ◽  
Author(s):  
Laura Erhard ◽  
Brett McBride ◽  
Adam safir

As part of the implementation of its strategic plan, the U.S. Bureau of Labor Statistics (BLS) has increasingly studied the issue of using alternative data to improve both the quality of its data and the process by which those data are collected. The plan includes the goal of integrating alternative data into BLS programs. This article describes the framework used by the BLS Consumer Expenditure Surveys (CE) program and the potential these data hold for complementing data collected in traditional formats. It also addresses some of the challenges BLS faces when using alternative data and the complementary role that alternative data play in improving the quality of data currently collected. Alternative data can substitute for what is presently being collected from respondents and provide additional information to supplement the variables the CE program produces or to adjust the CE program’s processing and weighting procedures.


Author(s):  
Anna Ferrante ◽  
James Boyd ◽  
Sean Randall ◽  
Adrian Brown ◽  
James Semmens

ABSTRACT ObjectivesRecord linkage is a powerful technique which transforms discrete episode data into longitudinal person-based records. These records enable the construction and analysis of complex pathways of health and disease progression, and service use. Achieving high linkage quality is essential for ensuring the quality and integrity of research based on linked data. The methods used to assess linkage quality will depend on the volume and characteristics of the datasets involved, the processes used for linkage and the additional information available for quality assessment. This paper proposes and evaluates two methods to routinely assess linkage quality. ApproachLinkage units currently use a range of methods to measure, monitor and improve linkage quality; however, no common approach or standards exist. There is an urgent need to develop “best practices” in evaluating, reporting and benchmarking linkage quality. In assessing linkage quality, of primary interest is in knowing the number of true matches and non-matches identified as links and non-links. Any misclassification of matches within these groups introduces linkage errors. We present efforts to develop sharable methods to measure linkage quality in Australia. This includes a sampling-based method to estimate both precision (accuracy) and recall (sensitivity) following record linkage and a benchmarking method - a transparent and transportable methodology to benchmark the quality of linkages across different operational environments. ResultsThe sampling-based method achieved estimates of linkage quality that were very close to actual linkage quality metrics. This method presents as a feasible means of accurately estimating matching quality and refining linkages in population level linkage studies. The benchmarking method provides a systematic approach to estimating linkage quality with a set of open and shareable datasets and a set of well-defined, established performance metrics. The method provides an opportunity to benchmark the linkage quality of different record linkage operations. Both methods have the potential to assess the inter-rater reliability of clerical reviews. ConclusionsBoth methods produce reliable estimates of linkage quality enabling the exchange of information within and between linkage communities. It is important that researchers can assess risk in studies using record linkage techniques. Understanding the impact of linkage quality on research outputs highlights a need for standard methods to routinely measure linkage quality. These two methods provide a good start to the quality process, but it is important to identify standards and good practices in all parts of the linkage process (pre-processing, standardising activities, linkage, grouping and extracting).


2016 ◽  
Vol 48 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Tadeusz Pastusiak

Abstract The research on the ice cover of waterways, rivers, lakes, seas and oceans by satellite remote sensing methods began at the end of the twentieth century. There was a lot of data sources in diverse file formats. It has not yet carried out a comparative assessment of their usefulness. A synthetic indicator of the quality of data sources binding maps resolution, file publication, time delay and the functionality for the user was developed in the research process. It reflects well a usefulness of maps and allows to compare them. Qualitative differences of map content have relatively little impact on the overall assessment of the data sources. Resolution of map is generally acceptable. Actuality has the greatest impact on the map content quality for the current vessel’s voyage planning in ice. The highest quality of all studied sources have the regional maps in GIF format issued by the NWS / NOAA, general maps of the Arctic Ocean in NetCDF format issued by the OSI SAF and the general maps of the Arctic Ocean in GRIB-2 format issued by the NCEP / NOAA. Among them are maps containing information on the quality of presented parameter. The leader among the map containing all three of the basic characteristics of ice cover (ice concentration, ice thickness and ice floe size) are vector maps in GML format. They are the new standard of electronic vector maps for the navigation of ships in ice. Publishing of ice cover maps in the standard electronic map format S-411 for navigation of vessels in ice adopted by the International Hydrographic Organization is advisable in case is planned to launch commercial navigation on the lagoons, rivers and canals. The wide availability of and exchange of information on the state of ice cover on rivers, lakes, estuaries and bays, which are used exclusively for water sports, ice sports and ice fishing is possible using handheld mobile phones, smartphones and tablets.


2008 ◽  
Vol 13 (5) ◽  
pp. 378-389 ◽  
Author(s):  
Xiaohua Douglas Zhang ◽  
Amy S. Espeseth ◽  
Eric N. Johnson ◽  
Jayne Chin ◽  
Adam Gates ◽  
...  

RNA interference (RNAi) not only plays an important role in drug discovery but can also be developed directly into drugs. RNAi high-throughput screening (HTS) biotechnology allows us to conduct genome-wide RNAi research. A central challenge in genome-wide RNAi research is to integrate both experimental and computational approaches to obtain high quality RNAi HTS assays. Based on our daily practice in RNAi HTS experiments, we propose the implementation of 3 experimental and analytic processes to improve the quality of data from RNAi HTS biotechnology: (1) select effective biological controls; (2) adopt appropriate plate designs to display and/or adjust for systematic errors of measurement; and (3) use effective analytic metrics to assess data quality. The applications in 5 real RNAi HTS experiments demonstrate the effectiveness of integrating these processes to improve data quality. Due to the effectiveness in improving data quality in RNAi HTS experiments, the methods and guidelines contained in the 3 experimental and analytic processes are likely to have broad utility in genome-wide RNAi research. ( Journal of Biomolecular Screening 2008:378-389)


Tunas Agraria ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 168-174
Author(s):  
Maslusatun Mawadah

The South Jakarta Administrative City Land Office is one of the cities targeted to be a city with complete land administration in 2020. The current condition of land parcel data demands an update, namely improving the quality of data from KW1 to KW6 towards KW1 valid. The purpose of this study is to determine the condition of land data quality in South Jakarta, the implementation of data quality improvement, as well as problems and solutions in implementing data quality improvement. The research method used is qualitative with a descriptive approach. The results showed that the condition of the data quality after the implementation of the improvement, namely KW1 increased from 86.45% to 87.01%. The roles of man, material, machine, and method have been fulfilled and the implementation of data quality improvement is not in accordance with the 2019 Complete City Guidelines in terms of territorial boundary inventory, and there are still obstacles in the implementation of improving the quality of land parcel data, namely the absence of buku tanah, surat ukur, and gambar ukur at the land office, the existence of regional division, the boundaries of the sub district are not yet certain, and the existence of land parcels that have been separated from mapping without being noticed by the office administrator.


2021 ◽  
Vol 23 (06) ◽  
pp. 1011-1018
Author(s):  
Aishrith P Rao ◽  
◽  
Raghavendra J C ◽  
Dr. Sowmyarani C N ◽  
Dr. Padmashree T ◽  
...  

With the advancement of technology and the large volume of data produced, processed, and stored, it is becoming increasingly important to maintain the quality of data in a cost-effective and productive manner. The most important aspects of Big Data (BD) are storage, processing, privacy, and analytics. The Big Data group has identified quality as a critical aspect of its maturity. Nonetheless, it is a critical approach that should be adopted early in the lifecycle and gradually extended to other primary processes. Companies are very reliant and drive profits from the huge amounts of data they collect. When its consistency deteriorates, the ramifications are uncertain and may result in completely undesirable conclusions. In the sense of BD, determining data quality is difficult, but it is essential that we uphold the data quality before we can proceed with any analytics. We investigate data quality during the stages of data gathering, preprocessing, data repository, and evaluation/analysis of BD processing in this paper. The related solutions are also suggested based on the elaboration and review of the proposed problems.


2016 ◽  
Vol 12 (3) ◽  
pp. 111-133 ◽  
Author(s):  
Ahmad Assaf ◽  
Aline Senart ◽  
Raphaël Troncy

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 175 ◽  
Author(s):  
Tibor Koltay

This paper focuses on the characteristics of research data quality, and aims to cover the most important issues related to it, giving particular attention to its attributes and to data governance. The corporate word’s considerable interest in the quality of data is obvious in several thoughts and issues reported in business-related publications, even if there are apparent differences between values and approaches to data in corporate and in academic (research) environments. The paper also takes into consideration that addressing data quality would be unimaginable without considering big data.


Sign in / Sign up

Export Citation Format

Share Document