The shape of and solutions to the MTurk quality crisis

AbstractAmazon's Mechanical Turk is widely used for data collection; however, data quality may be declining due to the use of virtual private servers to fraudulently gain access to studies. Unfortunately, we know little about the scale and consequence of this fraud, and tools for social scientists to detect and prevent this fraud are underdeveloped. We first analyze 38 studies and show that this fraud is not new, but has increased recently. We then show that these fraudulent respondents provide particularly low-quality data and can weaken treatment effects. Finally, we provide two solutions: an easy-to-use application for identifying fraud in the existing datasets and a method for blocking fraudulent respondents in Qualtrics surveys.

Download Full-text

Improving Data Quality Using Amazon Mechanical Turk Through Platform Setup

Cornell Hospitality Quarterly ◽

10.1177/19389655211025475 ◽

2021 ◽

pp. 193896552110254

Author(s):

Lu Lu ◽

Nathan Neale ◽

Nathaniel D. Line ◽

Mark Bonn

Keyword(s):

Best Practices ◽

Data Quality ◽

Mechanical Turk ◽

Amazon Mechanical Turk ◽

Quality Data ◽

High Quality ◽

High Quality Data ◽

Amazon's Mechanical Turk ◽

Associated Data

As the use of Amazon’s Mechanical Turk (MTurk) has increased among social science researchers, so, too, has research into the merits and drawbacks of the platform. However, while many endeavors have sought to address issues such as generalizability, the attentiveness of workers, and the quality of the associated data, there has been relatively less effort concentrated on integrating the various strategies that can be used to generate high-quality data using MTurk samples. Accordingly, the purpose of this research is twofold. First, existing studies are integrated into a set of strategies/best practices that can be used to maximize MTurk data quality. Second, focusing on task setup, selected platform-level strategies that have received relatively less attention in previous research are empirically tested to further enhance the contribution of the proposed best practices for MTurk usage.

Download Full-text

An Evaluation of Amazon’s Mechanical Turk, Its Rapid Rise, and Its Effective Use

Perspectives on Psychological Science ◽

10.1177/1745691617706516 ◽

2018 ◽

Vol 13 (2) ◽

pp. 149-154 ◽

Cited By ~ 144

Author(s):

Michael D. Buhrmester ◽

Sanaz Talaifar ◽

Samuel D. Gosling

Keyword(s):

Mechanical Turk ◽

Quality Data ◽

Online Research ◽

Social Scientists ◽

High Quality Data ◽

The Social ◽

Research Platform ◽

Amazon's Mechanical Turk ◽

Effective Use ◽

The Impact

Over the past 2 decades, many social scientists have expanded their data-collection capabilities by using various online research tools. In the 2011 article “Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data?” in Perspectives on Psychological Science, Buhrmester, Kwang, and Gosling introduced researchers to what was then considered to be a promising but nascent research platform. Since then, thousands of social scientists from seemingly every field have conducted research using the platform. Here, we reflect on the impact of Mechanical Turk on the social sciences and our article’s role in its rise, provide the newest data-driven recommendations to help researchers effectively use the platform, and highlight other online research platforms worth consideration.

Download Full-text

Abstract WP389: Understanding the Clinicians’ Experiences in Collecting Stroke Rehabilitation Intensity Data Within Ontario

Stroke ◽

10.1161/str.47.suppl_1.wp389 ◽

2016 ◽

Vol 47 (suppl_1) ◽

Author(s):

Elizabeth Linkewich ◽

Janine Theben ◽

Amy Maebrae-Waller ◽

Shelley Huffman ◽

Jenn Fearn ◽

...

Keyword(s):

Data Collection ◽

Data Quality ◽

Thematic Analysis ◽

Best Practice ◽

Practice Change ◽

Quality Data ◽

Electronic Survey ◽

Workload Measurement ◽

Survey Results ◽

Patients Experience

Background and Issues: The collection and reporting of Rehabilitation Intensity (RI) in a national rehabilitation database was mandated on April 1, 2015 for all stroke patients within Ontario, to support evaluation of stroke best practice implementation. RI includes minutes of direct task-specific therapy patients experience per day. This requires a shift in thinking from capturing the clinician’s time spent in therapy to the patient perspective. To ensure that high quality data is collected, it was important to understand clinicians’ experiences in collecting RI data. Purpose: To identify enablers and barriers to RI data collection in order to inform the development of resources to support clinicians. Methods: A 12-item electronic survey was developed by an Ontario Stroke Network (OSN) task group to evaluate the clinician experience of RI data collection (including: demographics, barriers, enablers, education, resources, and practice change). The survey was distributed via SurveyMonkey® and sent to clinicians from 48 hospitals, 3 weeks post implementation of RI data collection. Analyses involved descriptive statistics and thematic analysis. Results: Three hundred and twenty-one clinicians from 47 hospitals responded to the survey. Survey results suggest RI data collection is feasible; seventy-one percent of clinicians report it takes 10 minutes or less to enter RI data. Thematic analysis identified: 5 common challenges with most frequently reported relating to data quality, 30% (N=358) and 6 common enablers with most frequently reported relating to ease of collecting RI data through workload measurement systems, 50% (N=46). Suggestions for educational resources included tools for identifying what is included in RI and the provision of education (e.g. webinars). Conclusions: RI data collection is feasible for clinicians. Education and resources developed should support key challenges and enablers identified by clinicians - to enhance data quality and the consistency of RI collection. As RI data fields are available through a national rehabilitation database, this work sets the foundation for other provinces interested in the systematic collection and reporting of RI data.

Download Full-text

The Number of Choice Tasks and Survey Satisficing in Conjoint Experiments

Political Analysis ◽

10.1017/pan.2017.40 ◽

2018 ◽

Vol 26 (1) ◽

pp. 112-119 ◽

Cited By ~ 24

Author(s):

Kirk Bansak ◽

Jens Hainmueller ◽

Daniel J. Hopkins ◽

Teppei Yamamoto

Keyword(s):

Decision Making ◽

Survey Sampling ◽

Mechanical Turk ◽

Social Scientists ◽

Response Quality ◽

Amazon's Mechanical Turk ◽

Survey Designs ◽

Choice Tasks

In recent years, political and social scientists have made increasing use of conjoint survey designs to study decision-making. Here, we study a consequential question which researchers confront when implementing conjoint designs: How many choice tasks can respondents perform before survey satisficing degrades response quality? To answer the question, we run a set of experiments where respondents are asked to complete as many as 30 conjoint tasks. Experiments conducted through Amazon’s Mechanical Turk and Survey Sampling International demonstrate the surprising robustness of conjoint designs, as there are detectable but quite limited increases in survey satisficing as the number of tasks increases. Our evidence suggests that in similar study contexts researchers can assign dozens of tasks without substantial declines in response quality.

Download Full-text

Is Mechanical Turk the Answer to Our Sampling Woes?

Industrial and Organizational Psychology ◽

10.1017/iop.2015.130 ◽

2016 ◽

Vol 9 (1) ◽

pp. 162-167 ◽

Cited By ~ 12

Author(s):

Melissa G. Keith ◽

Peter D. Harms

Keyword(s):

New Technologies ◽

Human Subjects ◽

Mechanical Turk ◽

Working Population ◽

Sample Characteristics ◽

Catch Up ◽

Amazon's Mechanical Turk ◽

Gain Access ◽

New Jobs ◽

Internet Panels

Although we share Bergman and Jean's (2016) concerns about the representativeness of samples in the organizational sciences, we are mindful of the ever changing nature of the job market. New jobs are created from technological innovation while others become obsolete and disappear or are functionally transformed. These shifts in employment patterns produce both opportunities and challenges for organizational researchers addressing the problem of the representativeness in our working population samples. On one hand, it is understood that whatever we do, we will always be playing catch-up with the market. On the other hand, it is possible that we can leverage new technologies in order to react to such changes more quickly. As an example, in Bergman and Jean's commentary, they suggested making use of crowdsourcing websites or Internet panels in order to gain access to undersampled populations. Although we agree there is an opportunity to conduct much research of interest to organizational scholars in these settings, we also would point out that these types of samples come with their own sampling challenges. To illustrate these challenges, we examine sampling issues for Amazon's Mechanical Turk (MTurk), which is currently the most used portal for psychologists and organizational scholars collecting human subjects data online. Specifically, we examine whether MTurk workers are “workers” as defined by Bergman and Jean, whether MTurk samples are WEIRD (Western, educated, industrialized, rich, and democratic; Henrich, Heine, & Norenzayan, 2010), and how researchers may creatively utilize the sample characteristics.

Download Full-text

Obtaining quality data using behavioral measures of impulsivity in gambling research with Amazon’s Mechanical Turk

Journal of Behavioral Addictions ◽

10.1556/2006.7.2018.117 ◽

2018 ◽

Vol 7 (4) ◽

pp. 1122-1131 ◽

Cited By ~ 5

Author(s):

Magdalen G. Schluter ◽

Hyoun S. Kim ◽

David C. Hodgins

Keyword(s):

Mechanical Turk ◽

Quality Data ◽

Behavioral Measures ◽

Amazon's Mechanical Turk

Download Full-text

Review of Best Practice Recommendations for Ensuring High Quality Data with Amazon’s Mechanical Turk

10.31234/osf.io/m78sf ◽

2020 ◽

Author(s):

Brian Bauer ◽

Kristy L. Larsen ◽

Nicole Caulfield ◽

Domynic Elder ◽

Sara Jordan ◽

...

Keyword(s):

Screening Method ◽

Scientific Progress ◽

Worked Examples ◽

Mechanical Turk ◽

Quality Data ◽

Screening Methods ◽

Online Data ◽

Amazon's Mechanical Turk ◽

Data Screening ◽

Psychological Studies

Our ability to make scientific progress is dependent upon our interpretation of data. Thus, analyzing only those data that are an honest representation of a sample is imperative for drawing accurate conclusions that allow for robust, generalizable, and replicable scientific findings. Unfortunately, a consistent line of evidence indicates the presence of inattentive/careless responders who provide low-quality data in surveys, especially on popular online crowdsourcing platforms such as Amazon’s Mechanical Turk (MTurk). Yet, the majority of psychological studies using surveys only conduct outlier detection analyses to remove problematic data. Without carefully examining the possibility of low-quality data in a sample, researchers risk promoting inaccurate conclusions that interfere with scientific progress. Given that knowledge about data screening methods and optimal online data collection procedures are scattered across disparate disciplines, the dearth of psychological studies using more rigorous methodologies to prevent and detect low-quality data is likely due to inconvenience, not maleficence. Thus, this review provides up-to-date recommendations for best practices in collecting online data and data screening methods. In addition, this article includes resources for worked examples for each screening method, a collection of recommended measures, and a preregistration template for implementing these recommendations.

Download Full-text

Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data?

Methodological issues and strategies in clinical research (4th ed.). ◽

10.1037/14805-009 ◽

2015 ◽

pp. 133-139 ◽

Cited By ~ 32

Author(s):

Michael Buhrmester ◽

Tracy Kwang ◽

Samuel D. Gosling

Keyword(s):

Mechanical Turk ◽

Quality Data ◽

High Quality ◽

High Quality Data ◽

Amazon's Mechanical Turk

Download Full-text

Are Interviews Costing £0.08 a Waste of Money? Reviewing Google Surveys for ‘Wisdom of the Crowd’ Projects

International Journal of Market Research ◽

10.2501/ijmr-2017-015 ◽

2017 ◽

Vol 59 (2) ◽

pp. 199-220

Author(s):

G.W. Roughton ◽

Iain Mackay

Keyword(s):

Data Collection ◽

Data Quality ◽

Survey Data ◽

Response Rates ◽

Quality Data ◽

Collection Costs

This paper investigates whether a ‘wisdom of the crowd’ approach might offer an alternative to recent political polls that have raised questions about survey data quality. Data collection costs have become so low that, as well as the question of data quality, concerns have also been raised about low response rates, professional respondents and respondent interaction. There are also uncertainties about self-selecting ‘samples’. This paper looks at more than 100 such surveys and reports that, in five out of the six cases discussed, £0.08p interviews delivered results in line with known outcomes. The results discussed in the paper show that such interviews are not a waste of money.

Download Full-text

EIGER detector: application in macromolecular crystallography

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798316012304 ◽

2016 ◽

Vol 72 (9) ◽

pp. 1036-1048 ◽

Cited By ~ 71

Author(s):

Arnau Casanas ◽

Rangana Warshamanage ◽

Aaron D. Finke ◽

Ezequiel Panepucci ◽

Vincent Olieric ◽

...

Keyword(s):

Data Collection ◽

Data Quality ◽

Data Acquisition ◽

Dead Time ◽

Single Photon ◽

Photon Counting ◽

Quality Data ◽

Macromolecular Crystallography ◽

Single Photon Counting ◽

Optimal Rotation

The development of single-photon-counting detectors, such as the PILATUS, has been a major recent breakthrough in macromolecular crystallography, enabling noise-free detection and novel data-acquisition modes. The new EIGER detector features a pixel size of 75 × 75 µm, frame rates of up to 3000 Hz and a dead time as low as 3.8 µs. An EIGER 1M and EIGER 16M were tested on Swiss Light Source beamlines X10SA and X06SA for their application in macromolecular crystallography. The combination of fast frame rates and a very short dead time allows high-quality data acquisition in a shorter time. The ultrafine φ-slicing data-collection method is introduced and validated and its application in finding the optimal rotation angle, a suitable rotation speed and a sufficient X-ray dose are presented. An improvement of the data quality up to slicing at one tenth of the mosaicity has been observed, which is much finer than expected based on previous findings. The influence of key data-collection parameters on data quality is discussed.

Download Full-text