improve data quality
Recently Published Documents


TOTAL DOCUMENTS

76
(FIVE YEARS 25)

H-INDEX

12
(FIVE YEARS 3)

Author(s):  
Meike Klettke ◽  
Adrian Lutsch ◽  
Uta Störl

AbstractData engineering is an integral part of any data science and ML process. It consists of several subtasks that are performed to improve data quality and to transform data into a target format suitable for analysis. The quality and correctness of the data engineering steps is therefore important to ensure the quality of the overall process.In machine learning processes requirements such as fairness and explainability are essential. The answers to these must also be provided by the data engineering subtasks. In this article, we will show how these can be achieved by logging, monitoring and controlling the data changes in order to evaluate their correctness. However, since data preprocessing algorithms are part of any machine learning pipeline, they must obviously also guarantee that they do not produce data biases.In this article we will briefly introduce three classes of methods for measuring data changes in data engineering and present which research questions still remain unanswered in this area.


2021 ◽  
Vol 79 (10) ◽  
pp. 940-947
Author(s):  
Anne-Marie Allard ◽  
Marc Grenier ◽  
Michael Sirois ◽  
Casper Wassink

Eddy current testing (ECT) has been used for quite a while now and has been proven a reliable surface inspection technique for conductive materials. In the last 15 to 20 years, this technique has evolved toward the use of eddy current arrays (ECAs), and many applications can now benefit from this configuration to improve data quality, inspection speed, and ease of deployment, and considerably reduce operator dependency. The physics principle behind ECT and ECA is the same and was addressed in a previous issue of Materials Evaluation (Wassink et al. 2021). In this paper, we will discuss the main differences between ECT and ECA as well as how the arrangement of coils in an array can allow for optimized detection capabilities on different materials or types of defects. Common applications where ECA has demonstrated its strength will also be discussed.


Rangelands ◽  
2021 ◽  
Author(s):  
Sarah E. McCord ◽  
Justin L. Welty ◽  
Jennifer Courtwright ◽  
Catherine Dillon ◽  
Alex Traynor ◽  
...  

Author(s):  
Nick Berrow ◽  
Ario de Marco ◽  
Mario Lebendiker ◽  
Maria Garcia-Alai ◽  
Stefan H. Knauer ◽  
...  

2021 ◽  
Vol 8 (2) ◽  
pp. 205316802110169
Author(s):  
William O’Brochta ◽  
Sunita Parikh

What can researchers do to address anomalous survey and experimental responses on Amazon Mechanical Turk (MTurk)? Much of the anomalous response problem has been traced to India, and several survey and technological techniques have been developed to detect foreign workers accessing US-specific surveys. We survey Indian MTurkers and find that 26% pass survey questions used to detect foreign workers, and 3% claim to be located in the United States. We show that restricting respondents to Master Workers and removing the US location requirement encourages Indian MTurkers to correctly self-report their location, helping to reduce anomalous responses among US respondents and to improve data quality. Based on these results, we outline key considerations for researchers seeking to maximize data quality while keeping costs low.


2021 ◽  
Vol 20 ◽  
pp. 160940692110661
Author(s):  
James Rowlands

Interviewee Transcript Review (ITR), a form of Respondent Validation, is a way to share and check interview transcripts with research participants. To date, the literature has considered how these practices affect data quality, focused on the ability of a participant to correct, add or remove data. Less considered is the extent to which ITR might enable sensitive research. Reporting on research examining the experiences and perspectives of different stakeholders involved in Domestic Homicide Reviews, 40 participants who took part in semi-structured interviews were offered the opportunity to review their transcripts. This paper contributes to the understanding of the use of ITR, demonstrating how it can be used to increase participant confidence to provide assurance about, and indeed active involvement in, the steps being taken to preserve their anonymity.


Sign in / Sign up

Export Citation Format

Share Document