A Primer for Conducting Survey Research using MTurk

Author(s):  
Silvana Chambers ◽  
Kim Nimon ◽  
Paula Anthony-McMann

This paper presents best practices for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid.

Crowdsourcing ◽  
2019 ◽  
pp. 766-788
Author(s):  
Silvana Chambers ◽  
Kim Nimon ◽  
Paula Anthony-McMann

This paper presents best practices for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid.


2019 ◽  
pp. 639-669
Author(s):  
Silvana Chambers ◽  
Kim Nimon

This chapter presents an introduction to crowdsourcing for survey participant recruitment. It also discusses best practices and ethical considerations for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid. An overview and syntax for conducting longitudinal studies is provided as well.


Author(s):  
Silvana Chambers ◽  
Kim Nimon

This chapter presents an introduction to crowdsourcing for survey participant recruitment. It also discusses best practices and ethical considerations for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid. An overview and syntax for conducting longitudinal studies is provided as well.


Crowdsourcing ◽  
2019 ◽  
pp. 410-439 ◽  
Author(s):  
Silvana Chambers ◽  
Kim Nimon

This chapter presents an introduction to crowdsourcing for survey participant recruitment. It also discusses best practices and ethical considerations for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid. An overview and syntax for conducting longitudinal studies is provided as well.


2021 ◽  
pp. 193896552110254
Author(s):  
Lu Lu ◽  
Nathan Neale ◽  
Nathaniel D. Line ◽  
Mark Bonn

As the use of Amazon’s Mechanical Turk (MTurk) has increased among social science researchers, so, too, has research into the merits and drawbacks of the platform. However, while many endeavors have sought to address issues such as generalizability, the attentiveness of workers, and the quality of the associated data, there has been relatively less effort concentrated on integrating the various strategies that can be used to generate high-quality data using MTurk samples. Accordingly, the purpose of this research is twofold. First, existing studies are integrated into a set of strategies/best practices that can be used to maximize MTurk data quality. Second, focusing on task setup, selected platform-level strategies that have received relatively less attention in previous research are empirically tested to further enhance the contribution of the proposed best practices for MTurk usage.


2019 ◽  
Vol 51 (5) ◽  
pp. 2022-2038 ◽  
Author(s):  
Jesse Chandler ◽  
Cheskie Rosenzweig ◽  
Aaron J. Moss ◽  
Jonathan Robinson ◽  
Leib Litman

Abstract Amazon Mechanical Turk (MTurk) is widely used by behavioral scientists to recruit research participants. MTurk offers advantages over traditional student subject pools, but it also has important limitations. In particular, the MTurk population is small and potentially overused, and some groups of interest to behavioral scientists are underrepresented and difficult to recruit. Here we examined whether online research panels can avoid these limitations. Specifically, we compared sample composition, data quality (measured by effect sizes, internal reliability, and attention checks), and the non-naivete of participants recruited from MTurk and Prime Panels—an aggregate of online research panels. Prime Panels participants were more diverse in age, family composition, religiosity, education, and political attitudes. Prime Panels participants also reported less exposure to classic protocols and produced larger effect sizes, but only after screening out several participants who failed a screening task. We conclude that online research panels offer a unique opportunity for research, yet one with some important trade-offs.


2021 ◽  
Vol 11 ◽  
Author(s):  
Philip Lindner ◽  
Jonas Ramnerö ◽  
Ekaterina Ivanova ◽  
Per Carlbring

Introduction: Online gambling, popular among both problem and recreational gamblers, simultaneously entails both heightened addiction risks as well as unique opportunities for prevention and intervention. There is a need to bridge the growing literature on learning and extinction mechanisms of gambling behavior, with account tracking studies using real-life gambling data. In this study, we describe the development and validation of the Frescati Online Research Casino (FORC): a simulated online casino where games, visual themes, outcome sizes, probabilities, and other variables of interest can be experimentally manipulated to conduct behavioral analytic studies and evaluate the efficacy of responsible gambling tools.Methods: FORC features an initial survey for self-reporting of gambling and gambling problems, along with several games resembling regular real-life casino games, designed to allow Pavlovian and instrumental learning. FORC was developed with maximum flexibility in mind, allowing detailed experiment specification by setting parameters using an online interface, including the display of messages. To allow convenient and rapid data collection from diverse samples, FORC is independently hosted yet integrated with the popular crowdsourcing platform Amazon Mechanical Turk through a reimbursement key mechanism. To validate the survey data quality and game mechanics of FORC, n = 101 participants were recruited, who answered an questionnaire on gambling habits and problems, then played both slot machine and card-draw type games. Questionnaire and trial-by-trial behavioral data were analyzed using standard psychometric tests, and outcome distribution modeling.Results: The expected associations among variables in the introductory questionnaire were found along with good psychometric properties, suggestive of good quality data. Only 6% of participants provided seemingly poor behavioral data. Game mechanics worked as intended: gambling outcomes showed the expected pattern of random sampling with replacement and were normally distributed around the set percentages, while balances developed according to the set return to player rate.Conclusions: FORC appears to be a valid paradigm for simulating online gambling and for collecting survey and behavioral data, offering a valuable compromise between stringent experimental paradigms with lower external validity, and real-world gambling account tracking data with lower internal validity.


2018 ◽  
Author(s):  
Wenhua Lu ◽  
Alexandra Guttentag ◽  
Brian Elbel ◽  
Kamila Kiszko ◽  
Courtney Abrams ◽  
...  

BACKGROUND The decisions that individuals make about the food and beverage products they purchase and consume directly influence their energy intake and dietary quality and may lead to excess weight gain and obesity. However, gathering and interpreting data on food and beverage purchase patterns can be difficult. Leveraging novel sources of data on food and beverage purchase behavior can provide us with a more objective understanding of food consumption behaviors. OBJECTIVE Food and beverage purchase receipts often include time-stamped location information, which, when associated with product purchase details, can provide a useful behavioral measurement tool. The purpose of this study was to assess the feasibility, reliability, and validity of processing data from fast-food restaurant receipts using crowdsourcing via Amazon Mechanical Turk (MTurk). METHODS Between 2013 and 2014, receipts (N=12,165) from consumer purchases were collected at 60 different locations of five fast-food restaurant chains in New Jersey and New York City, USA (ie, Burger King, KFC, McDonald’s, Subway, and Wendy’s). Data containing the restaurant name, location, receipt ID, food items purchased, price, and other information were manually entered into an MS Access database and checked for accuracy by a second reviewer; this was considered the gold standard. To assess the feasibility of coding receipt data via MTurk, a prototype set of receipts (N=196) was selected. For each receipt, 5 turkers were asked to (1) identify the receipt identifier and the name of the restaurant and (2) indicate whether a beverage was listed in the receipt; if yes, they were to categorize the beverage as cold (eg, soda or energy drink) or hot (eg, coffee or tea). Interturker agreement for specific questions (eg, restaurant name and beverage inclusion) and agreement between turker consensus responses and the gold standard values in the manually entered dataset were calculated. RESULTS Among the 196 receipts completed by turkers, the interturker agreement was 100% (196/196) for restaurant names (eg, Burger King, McDonald’s, and Subway), 98.5% (193/196) for beverage inclusion (ie, hot, cold, or none), 92.3% (181/196) for types of hot beverage (eg, hot coffee or hot tea), and 87.2% (171/196) for types of cold beverage (eg, Coke or bottled water). When compared with the gold standard data, the agreement level was 100% (196/196) for restaurant name, 99.5% (195/196) for beverage inclusion, and 99.5% (195/196) for beverage types. CONCLUSIONS Our findings indicated high interrater agreement for questions across difficulty levels (eg, single- vs binary- vs multiple-choice items). Compared with traditional methods for coding receipt data, MTurk can produce excellent-quality data in a lower-cost, more time-efficient manner.


2021 ◽  
Author(s):  
David Hauser ◽  
Aaron J Moss ◽  
Cheskie Rosenzweig ◽  
Shalom Noach Jaffe ◽  
Jonathan Robinson ◽  
...  

Maintaining data quality on Amazon Mechanical Turk (MTurk) has always been a concern for researchers. CloudResearch, a third-party website that interfaces with MTurk, assessed ~100,000 MTurkers and categorized them into those that provide high- (~65,000, Approved) and low-(~35,000, Blocked) quality data. Here, we examined the predictive validity of CloudResearch’s vetting. Participants (N = 900) from the Approved and Blocked groups, along with a Standard MTurk sample, completed an array of data quality measures. Approved participants had better reading comprehension, reliability, honesty, and attentiveness scores, were less likely to cheat and satisfice, and replicated classic experimental effects more reliably than Blocked participants who performed at chance on multiple outcomes. Data quality of the Standard sample was generally in between the Approved and Blocked groups. We discuss the implications of using the Approved group for scientific studies conducted on Mechanical Turk.


Sign in / Sign up

Export Citation Format

Share Document