A Case Study on Data Quality, Privacy, and Evaluating the Outcome of Entity Resolution Processes

Author(s):  
Pei Wang ◽  
Daniel Pullen ◽  
Fan Liu ◽  
William C. Decker ◽  
Ningning Wu ◽  
...  

This paper presents ongoing research conducted through collaboration between the University of Arkansas at Little Rock and the Arkansas Department of Education to develop an entity resolution and identity management system. The process includes a multi-phase approach consisting of data-quality analysis, selection of entity-identity attributes for entity resolution, defined a rule set using the open source entity-resolution system named OYSTER and used entropy approach to identify the potential false positive and false negative. The research is the first known of its kind to evaluate privacy-enhancing, entity-resolution rule sets in a state education agency.

Author(s):  
William Decker ◽  
Fan Liu ◽  
John Talburt ◽  
Pei Wang ◽  
Ningning Wu

This chapter presents ongoing research conducted through collaboration between the University of Arkansas at Little Rock and the Arkansas Department of Education to develop an entity resolution and identity management system. The process includes a multi-phase approach consisting of data-quality analysis, selection of entity-identity attributes for entity resolution, development of a truth-set, and implementation and benchmarking of an entity-resolution rule set using the open source entity-resolution system named OYSTER. The research is the first known of its kind to evaluate privacy-enhancing, entity-resolution rule sets in a state education agency.


Laws ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 38
Author(s):  
Michael Rozalski ◽  
Mitchell L. Yell ◽  
Jacob Warner

In 1975, the Education for All Handicapped Children Act (renamed the Individuals with Disabilities Education Act in 1990) established the essential obligation of special education law, which is to develop a student’s individualized special education program that enables them to receive a free appropriate public education (FAPE). FAPE was defined in the federal law as special education and related services that: (a) are provided at public expense, (b) meet the standards of the state education agency, (c) include preschool, elementary, or secondary education, and (d) are provided in conformity with a student’s individualized education program (IEP). Thus, the IEP is the blueprint of an individual student’s FAPE. The importance of FAPE has been shown in the number of disputes that have arisen over the issue. In fact 85% to 90% of all special education litigation involves disagreements over the FAPE that students receive. FAPE issues boil down to the process and content of a student’s IEP. In this article, we differentiate procedural (process) and substantive (content) violations and provide specific guidance on how to avoid both process and content errors when drafting and implementing students’ IEPs.


2008 ◽  
pp. 3067-3084
Author(s):  
John Talburt ◽  
Richard Wang ◽  
Kimberly Hess ◽  
Emily Kuo

This chapter introduces abstract algebra as a means of understanding and creating data quality metrics for entity resolution, the process in which records determined to represent the same real-world entity are successively located and merged. Entity resolution is a particular form of data mining that is foundational to a number of applications in both industry and government. Examples include commercial customer recognition systems and information sharing on “persons of interest” across federal intelligence agencies. Despite the importance of these applications, most of the data quality literature focuses on measuring the intrinsic quality of individual records than the quality of record grouping or integration. In this chapter, the authors describe current research into the creation and validation of quality metrics for entity resolution, primarily in the context of customer recognition systems. The approach is based on an algebraic view of the system as creating a partition of a set of entity records based on the indicative information for the entities in question. In this view, the relative quality of entity identification between two systems can be measured in terms of the similarity between the partitions they produce. The authors discuss the difficulty of applying statistical cluster analysis to this problem when the datasets are large and propose an alternative index suitable for these situations. They also report some preliminary experimental results, and outlines areas and approaches to further research in this area.


Author(s):  
B. C. Scheffers ◽  
E. C. C. Wildeboer Schut ◽  
J. A. C. Meekes ◽  
H. L. H. Cox

Algorithms ◽  
2020 ◽  
Vol 13 (5) ◽  
pp. 107 ◽  
Author(s):  
Otmane Azeroual ◽  
Włodzimierz Lewoniewski

The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.


Sign in / Sign up

Export Citation Format

Share Document