Review of Best Practice Recommendations for Ensuring High Quality Data with Amazon’s Mechanical Turk

2020 ◽  
Author(s):  
Brian Bauer ◽  
Kristy L. Larsen ◽  
Nicole Caulfield ◽  
Domynic Elder ◽  
Sara Jordan ◽  
...  

Our ability to make scientific progress is dependent upon our interpretation of data. Thus, analyzing only those data that are an honest representation of a sample is imperative for drawing accurate conclusions that allow for robust, generalizable, and replicable scientific findings. Unfortunately, a consistent line of evidence indicates the presence of inattentive/careless responders who provide low-quality data in surveys, especially on popular online crowdsourcing platforms such as Amazon’s Mechanical Turk (MTurk). Yet, the majority of psychological studies using surveys only conduct outlier detection analyses to remove problematic data. Without carefully examining the possibility of low-quality data in a sample, researchers risk promoting inaccurate conclusions that interfere with scientific progress. Given that knowledge about data screening methods and optimal online data collection procedures are scattered across disparate disciplines, the dearth of psychological studies using more rigorous methodologies to prevent and detect low-quality data is likely due to inconvenience, not maleficence. Thus, this review provides up-to-date recommendations for best practices in collecting online data and data screening methods. In addition, this article includes resources for worked examples for each screening method, a collection of recommended measures, and a preregistration template for implementing these recommendations.

2020 ◽  
Vol 41 (1) ◽  
pp. 30-36
Author(s):  
Steven V. Rouse

Abstract. Previous research has supported the use of Amazon’s Mechanical Turk (MTurk) for online data collection in individual differences research. Although MTurk Masters have reached an elite status because of strong approval ratings on previous tasks (and therefore gain higher payment for their work) no research has empirically examined whether researchers actually obtain higher quality data when they require that their MTurk Workers have Master status. In two different online survey studies (one using a personality test and one using a cognitive abilities test), the psychometric reliability of MTurk data was compared between a sample that required a Master qualification type and a sample that placed no status-level qualification requirement. In both studies, the Master samples failed to outperform the standard samples.


2017 ◽  
Vol 30 (1) ◽  
pp. 111-122 ◽  
Author(s):  
Steve Buchheit ◽  
Marcus M. Doxey ◽  
Troy Pollard ◽  
Shane R. Stinson

ABSTRACT Multiple social science researchers claim that online data collection, mainly via Amazon's Mechanical Turk (MTurk), has revolutionized the behavioral sciences (Gureckis et al. 2016; Litman, Robinson, and Abberbock 2017). While MTurk-based research has grown exponentially in recent years (Chandler and Shapiro 2016), reasonable concerns have been raised about online research participants' ability to proxy for traditional research participants (Chandler, Mueller, and Paolacci 2014). This paper reviews recent MTurk research and provides further guidance for recruiting samples of MTurk participants from populations of interest to behavioral accounting researchers. First, we provide guidance on the logistics of using MTurk and discuss the potential benefits offered by TurkPrime, a third-party service provider. Second, we discuss ways to overcome challenges related to targeted participant recruiting in an online environment. Finally, we offer suggestions for disclosures that authors may provide about their efforts to attract participants and analyze responses.


2017 ◽  
Author(s):  
Erin Michelle Buchanan ◽  
John E. Scofield

Web-based data collection methods such as Amazon's Mechanical Turk (AMT) are an appealing option to recruit participants quickly and cheaply for psychological research. While concerns regarding data quality have emerged with AMT, several studies have exhibited that data collected via AMT are as reliable as traditional college samples and are often more diverse and representative of noncollege populations. The development of methods to screen for low quality data, however, has been less explored. Omitting participants based on simple screening methods in isolation, such as response time or attention checks may not be adequate identification methods, with an inability to delineate between high or low effort participants. Additionally, problematic survey responses may arise from survey automation techniques such as survey bots or automated form fillers. The current project developed low quality data detection methods while overcoming previous screening limitations. Multiple checks were employed, such as page response times, distribution of survey responses, the number of utilized choices from a given range of scale options, click counts, and manipulation checks. This method was tested on a survey taken with an easily available plug-in survey bot, as well as compared to data collected by human participants providing both high effort and randomized, or low effort, answers. Identified cases can then be used as part of sensitivity analyses to warrant exclusion from further analyses. This algorithm can be a promising tool to identify low quality or automated data via AMT or other online data collection platforms.


2020 ◽  
Vol 8 (4) ◽  
pp. 614-629 ◽  
Author(s):  
Ryan Kennedy ◽  
Scott Clifford ◽  
Tyler Burleigh ◽  
Philip D. Waggoner ◽  
Ryan Jewell ◽  
...  

AbstractAmazon's Mechanical Turk is widely used for data collection; however, data quality may be declining due to the use of virtual private servers to fraudulently gain access to studies. Unfortunately, we know little about the scale and consequence of this fraud, and tools for social scientists to detect and prevent this fraud are underdeveloped. We first analyze 38 studies and show that this fraud is not new, but has increased recently. We then show that these fraudulent respondents provide particularly low-quality data and can weaken treatment effects. Finally, we provide two solutions: an easy-to-use application for identifying fraud in the existing datasets and a method for blocking fraudulent respondents in Qualtrics surveys.


2021 ◽  
pp. 193896552110254
Author(s):  
Lu Lu ◽  
Nathan Neale ◽  
Nathaniel D. Line ◽  
Mark Bonn

As the use of Amazon’s Mechanical Turk (MTurk) has increased among social science researchers, so, too, has research into the merits and drawbacks of the platform. However, while many endeavors have sought to address issues such as generalizability, the attentiveness of workers, and the quality of the associated data, there has been relatively less effort concentrated on integrating the various strategies that can be used to generate high-quality data using MTurk samples. Accordingly, the purpose of this research is twofold. First, existing studies are integrated into a set of strategies/best practices that can be used to maximize MTurk data quality. Second, focusing on task setup, selected platform-level strategies that have received relatively less attention in previous research are empirically tested to further enhance the contribution of the proposed best practices for MTurk usage.


2018 ◽  
Vol 13 (2) ◽  
pp. 149-154 ◽  
Author(s):  
Michael D. Buhrmester ◽  
Sanaz Talaifar ◽  
Samuel D. Gosling

Over the past 2 decades, many social scientists have expanded their data-collection capabilities by using various online research tools. In the 2011 article “Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?” in Perspectives on Psychological Science, Buhrmester, Kwang, and Gosling introduced researchers to what was then considered to be a promising but nascent research platform. Since then, thousands of social scientists from seemingly every field have conducted research using the platform. Here, we reflect on the impact of Mechanical Turk on the social sciences and our article’s role in its rise, provide the newest data-driven recommendations to help researchers effectively use the platform, and highlight other online research platforms worth consideration.


2019 ◽  
pp. 75-112
Author(s):  
James N. Stanford

This is the first of the two chapters (Chapters 4 and 5) that present the results of the online data collection project using Amazon’s Mechanical Turk system. These projects provide a broad-scale “bird’s eye” view of New England dialect features across large distances. This chapter examines the results from 626 speakers who audio-recorded themselves reading 12 sentences two times each. The recordings were analyzed acoustically and then modeled statistically and graphically. The results are presented in the form of maps and statistical analyses, with the goal of providing a large-scale geographic overview of modern-day patterns of New England dialect features.


2017 ◽  
Author(s):  
Nathan Seltzer

Sociologists increasingly rely on third-party internet panel platforms to acquire respondents and administer questionnaires. Yet, researchers have demonstrated that even samples sourced from well-respected and widely-adopted internet platforms such as Amazon’s Mechanical Turk are often unable to screen out respondents who do not meet selection criteria requested by researchers. Here, I argue that researchers should proactively verify that third-party survey data is accurately sampled before considering it for analysis. I propose using survey “attention checks” as a methodological solution for researchers to determine whether data vendors have provided low quality data. In this short research note, I illustrate the approach by analyzing data from a consequential political opinion poll administered on behalf of an academic polling center by a third-party internet panel vendor for a special election in 2017. By assessing valid/invalid response choices of two overlapping geographic variables, I identify irregularities in the dataset that suggest that the sample included respondents who were not within the researchers’ intended sampling frame. Attention checks provide a straightforward, inexpensive tool to improve the validity of research produced with internet-drawn samples.


Sign in / Sign up

Export Citation Format

Share Document