Improving Data Quality Using Amazon Mechanical Turk Through Platform Setup

2021 ◽  
pp. 193896552110254
Author(s):  
Lu Lu ◽  
Nathan Neale ◽  
Nathaniel D. Line ◽  
Mark Bonn

As the use of Amazon’s Mechanical Turk (MTurk) has increased among social science researchers, so, too, has research into the merits and drawbacks of the platform. However, while many endeavors have sought to address issues such as generalizability, the attentiveness of workers, and the quality of the associated data, there has been relatively less effort concentrated on integrating the various strategies that can be used to generate high-quality data using MTurk samples. Accordingly, the purpose of this research is twofold. First, existing studies are integrated into a set of strategies/best practices that can be used to maximize MTurk data quality. Second, focusing on task setup, selected platform-level strategies that have received relatively less attention in previous research are empirically tested to further enhance the contribution of the proposed best practices for MTurk usage.

Author(s):  
Amber Chauncey Strain ◽  
Lucille M. Booker

One of the major challenges of ANLP research is the constant balancing act between the need for large samples, and the excessive time and monetary resources necessary for acquiring those samples. Amazon’s Mechanical Turk (MTurk) is a web-based data collection tool that has become a premier resource for researchers who are interested in optimizing their sample sizes and minimizing costs. Due to its supportive infrastructure, diverse participant pool, quality of data, and time and cost efficiency, MTurk seems particularly suitable for ANLP researchers who are interested in gathering large, high quality corpora in relatively short time frames. In this chapter, the authors first provide a broad description of the MTurk interface. Next, they describe the steps for acquiring IRB approval of MTurk experiments, designing experiments using the MTurk dashboard, and managing data. Finally, the chapter concludes by discussing the potential benefits and limitations of using MTurk for ANLP experimentation.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 1978 ◽  
Author(s):  
Argyro Mavrogiorgou ◽  
Athanasios Kiourtis ◽  
Konstantinos Perakis ◽  
Stamatios Pitsios ◽  
Dimosthenis Kyriazis

It is an undeniable fact that Internet of Things (IoT) technologies have become a milestone advancement in the digital healthcare domain, since the number of IoT medical devices is grown exponentially, and it is now anticipated that by 2020 there will be over 161 million of them connected worldwide. Therefore, in an era of continuous growth, IoT healthcare faces various challenges, such as the collection, the quality estimation, as well as the interpretation and the harmonization of the data that derive from the existing huge amounts of heterogeneous IoT medical devices. Even though various approaches have been developed so far for solving each one of these challenges, none of these proposes a holistic approach for successfully achieving data interoperability between high-quality data that derive from heterogeneous devices. For that reason, in this manuscript a mechanism is produced for effectively addressing the intersection of these challenges. Through this mechanism, initially, the collection of the different devices’ datasets occurs, followed by the cleaning of them. In sequel, the produced cleaning results are used in order to capture the levels of the overall data quality of each dataset, in combination with the measurements of the availability of each device that produced each dataset, and the reliability of it. Consequently, only the high-quality data is kept and translated into a common format, being able to be used for further utilization. The proposed mechanism is evaluated through a specific scenario, producing reliable results, achieving data interoperability of 100% accuracy, and data quality of more than 90% accuracy.


2021 ◽  
Author(s):  
David Hauser ◽  
Aaron J Moss ◽  
Cheskie Rosenzweig ◽  
Shalom Noach Jaffe ◽  
Jonathan Robinson ◽  
...  

Maintaining data quality on Amazon Mechanical Turk (MTurk) has always been a concern for researchers. CloudResearch, a third-party website that interfaces with MTurk, assessed ~100,000 MTurkers and categorized them into those that provide high- (~65,000, Approved) and low-(~35,000, Blocked) quality data. Here, we examined the predictive validity of CloudResearch’s vetting. Participants (N = 900) from the Approved and Blocked groups, along with a Standard MTurk sample, completed an array of data quality measures. Approved participants had better reading comprehension, reliability, honesty, and attentiveness scores, were less likely to cheat and satisfice, and replicated classic experimental effects more reliably than Blocked participants who performed at chance on multiple outcomes. Data quality of the Standard sample was generally in between the Approved and Blocked groups. We discuss the implications of using the Approved group for scientific studies conducted on Mechanical Turk.


Forests ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 99
Author(s):  
Marieke Sandker ◽  
Oswaldo Carrillo ◽  
Chivin Leng ◽  
Donna Lee ◽  
Rémi d’Annunzio ◽  
...  

This article discusses the importance of quality deforestation area estimates for reliable and credible REDD+ monitoring and reporting. It discusses how countries can make use of global spatial tree cover change assessments, but how considerable additional efforts are required to translate these into national deforestation estimates. The article illustrates the relevance of countries’ continued efforts on improving data quality for REDD+ monitoring by looking at Mexico, Cambodia, and Ghana. The experience in these countries show differences between deforestation areas assessed directly from maps and improved sample-based deforestation area estimates, highlighting significant changes in both magnitude and trend of assessed deforestation from both methods. Forests play an important role in achieving the goals of the Paris Agreement, and therefore the ability of countries to accurately measure greenhouse gases from forests is critical. Continued efforts by countries are needed to produce credible and reliable data. Supporting countries to continually increase the quality of deforestation area estimates will also support more efficient allocation of finance that rewards REDD+ results-based payments.


Author(s):  
Silvana Chambers ◽  
Kim Nimon ◽  
Paula Anthony-McMann

This paper presents best practices for conducting survey research using Amazon Mechanical Turk (MTurk). Readers will learn the benefits, limitations, and trade-offs of using MTurk as compared to other recruitment services, including SurveyMonkey and Qualtrics. A synthesis of survey design guidelines along with a sample survey are presented to help researchers collect the best quality data. Techniques, including SPSS and R syntax, are provided that demonstrate how users can clean resulting data and identify valid responses for which workers could be paid.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4486 ◽  
Author(s):  
Mohan Li ◽  
Yanbin Sun ◽  
Yu Jiang ◽  
Zhihong Tian

In sensor-based systems, the data of an object is often provided by multiple sources. Since the data quality of these sources might be different, when querying the observations, it is necessary to carefully select the sources to make sure that high quality data is accessed. A solution is to perform a quality evaluation in the cloud and select a set of high-quality, low-cost data sources (i.e., sensors or small sensor networks) that can answer queries. This paper studies the problem of min-cost quality-aware query which aims to find high quality results from multi-sources with the minimized cost. The measurement of the query results is provided, and two methods for answering min-cost quality-aware query are proposed. How to get a reasonable parameter setting is also discussed. Experiments on real-life data verify that the proposed techniques are efficient and effective.


2019 ◽  
Vol 53 (1) ◽  
pp. 46-50
Author(s):  
Carolyn Logan ◽  
Pablo Parás ◽  
Michael Robbins ◽  
Elizabeth J. Zechmeister

ABSTRACTData quality in survey research remains a paramount concern for those studying mass political behavior. Because surveys are conducted in increasingly diverse contexts around the world, ensuring that best practices are followed becomes ever more important to the field of political science. Bringing together insights from surveys conducted in more than 80 countries worldwide, this article highlights common challenges faced in survey research and outlines steps that researchers can take to improve the quality of survey data. Importantly, the article demonstrates that with the investment of the necessary time and resources, it is possible to carry out high-quality survey research even in challenging environments in which survey research is not well established.


Sign in / Sign up

Export Citation Format

Share Document