scholarly journals Methods to Detect Low Quality Data and Its Implication for Psychological Research

2017 ◽  
Author(s):  
Erin Michelle Buchanan ◽  
John E. Scofield

Web-based data collection methods such as Amazon's Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.

2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Sim Jia Yi ◽  
Kriti Kacker ◽  
Vasilis M. Karlaftis ◽  
...  

Background. The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). Methods. A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. Results. The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. Conclusions. These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance.


2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Jia Yi Sim ◽  
Kriti Kacker ◽  
Vasilis M Karlaftis ◽  
...  

BACKGROUND The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. OBJECTIVE Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). METHODS A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. RESULTS The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. CONCLUSIONS These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance. CLINICALTRIAL N.A.


2021 ◽  
Vol 111 (12) ◽  
pp. 2167-2175
Author(s):  
Stephen J. Blumberg ◽  
Jennifer D. Parker ◽  
Brian C. Moyer

High-quality data are accurate, relevant, and timely. Large national health surveys have always balanced the implementation of these quality dimensions to meet the needs of diverse users. The COVID-19 pandemic shifted these balances, with both disrupted survey operations and a critical need for relevant and timely health data for decision-making. The National Health Interview Survey (NHIS) responded to these challenges with several operational changes to continue production in 2020. However, data files from the 2020 NHIS were not expected to be publicly available until fall 2021. To fill the gap, the National Center for Health Statistics (NCHS) turned to 2 online data collection platforms—the Census Bureau’s Household Pulse Survey (HPS) and the NCHS Research and Development Survey (RANDS)—to collect COVID-19‒related data more quickly. This article describes the adaptations of NHIS and the use of HPS and RANDS during the pandemic in the context of the recently released Framework for Data Quality from the Federal Committee on Statistical Methodology. (Am J Public Health. 2021;111(12):2167–2175. https://doi.org/10.2105/AJPH.2021.306516 )


2017 ◽  
Vol 23 (4) ◽  
pp. 266-270 ◽  
Author(s):  
Malena Jones

This article details the use of an online survey tool to obtain information from nurse faculty, including the data collection process, the survey responses by nurse faculty, and the advantages and barriers of online data collection. The survey response rate indicates that online data collection is a valuable tool for nurse researchers.


2020 ◽  
Author(s):  
Brian Bauer ◽  
Kristy L. Larsen ◽  
Nicole Caulfield ◽  
Domynic Elder ◽  
Sara Jordan ◽  
...  

Our ability to make scientific progress is dependent upon our interpretation of data. Thus, analyzing only those data that are an honest representation of a sample is imperative for drawing accurate conclusions that allow for robust, generalizable, and replicable scientific findings. Unfortunately, a consistent line of evidence indicates the presence of inattentive/careless responders who provide low-quality data in surveys, especially on popular online crowdsourcing platforms such as Amazon’s Mechanical Turk (MTurk). Yet, the majority of psychological studies using surveys only conduct outlier detection analyses to remove problematic data. Without carefully examining the possibility of low-quality data in a sample, researchers risk promoting inaccurate conclusions that interfere with scientific progress. Given that knowledge about data screening methods and optimal online data collection procedures are scattered across disparate disciplines, the dearth of psychological studies using more rigorous methodologies to prevent and detect low-quality data is likely due to inconvenience, not maleficence. Thus, this review provides up-to-date recommendations for best practices in collecting online data and data screening methods. In addition, this article includes resources for worked examples for each screening method, a collection of recommended measures, and a preregistration template for implementing these recommendations.


2019 ◽  
Vol 2 (2) ◽  
pp. 107-114 ◽  
Author(s):  
Kai Sassenberg ◽  
Lara Ditrich

The debate about false positives in psychological research has led to a demand for higher statistical power. To meet this demand, researchers need to collect data from larger samples—which is important to increase replicability, but can be costly in both time and money (i.e., remuneration of participants). Given that researchers might need to compensate for these higher costs, we hypothesized that larger sample sizes might have been accompanied by more frequent use of less costly research methods (i.e., online data collection and self-report measures). To test this idea, we analyzed social psychology studies published in 2009, 2011, 2016, and 2018. Indeed, research reported in 2016 and 2018 (vs. 2009 and 2011) had larger sample sizes and relied more on online data collection and self-report measures. Thus, over these years, research improved in its statistical power, but also changed with regard to the methods applied. Implications for social psychology as a discipline are discussed.


2011 ◽  
Author(s):  
Carson Sandy ◽  
Samuel Gosling ◽  
Jeff Potter ◽  
Oliver John

2021 ◽  
Author(s):  
Aaron J Moss ◽  
Cheskie Rosenzweig ◽  
Shalom Noach Jaffe ◽  
Richa Gautam ◽  
Jonathan Robinson ◽  
...  

Online data collection has become indispensable to the social sciences, polling, marketing, and corporate research. However, in recent years, online data collection has been inundated with low quality data. Low quality data threatens the validity of online research and, at times, invalidates entire studies. It is often assumed that random, inconsistent, and fraudulent data in online surveys comes from ‘bots.’ But little is known about whether bad data is caused by bots or ill-intentioned or inattentive humans. We examined this issue on Mechanical Turk (MTurk), a popular online data collection platform. In the summer of 2018, researchers noticed a sharp increase in the number of data quality problems on MTurk, problems that were commonly attributed to bots. Despite this assumption, few studies have directly examined whether problematic data on MTurk are from bots or inattentive humans, even though identifying the source of bad data has important implications for creating the right solutions. Using CloudResearch’s data quality tools to identify problematic participants in 2018 and 2020, we provide evidence that much of the data quality problems on MTurk can be tied to fraudulent users from outside of the U.S. who pose as American workers. Hence, our evidence strongly suggests that the source of low quality data is real humans, not bots. We additionally present evidence that these fraudulent users are behind data quality problems on other platforms.


2020 ◽  
Vol 41 (1) ◽  
pp. 30-36
Author(s):  
Steven V. Rouse

Abstract. Previous research has supported the use of Amazon’s Mechanical Turk (MTurk) for online data collection in individual differences research. Although MTurk Masters have reached an elite status because of strong approval ratings on previous tasks (and therefore gain higher payment for their work) no research has empirically examined whether researchers actually obtain higher quality data when they require that their MTurk Workers have Master status. In two different online survey studies (one using a personality test and one using a cognitive abilities test), the psychometric reliability of MTurk data was compared between a sample that required a Master qualification type and a sample that placed no status-level qualification requirement. In both studies, the Master samples failed to outperform the standard samples.


Sign in / Sign up

Export Citation Format

Share Document