Bots or inattentive humans? Identifying sources of low-quality data in online platforms

2021 ◽  
Author(s):  
Aaron J Moss ◽  
Cheskie Rosenzweig ◽  
Shalom Noach Jaffe ◽  
Richa Gautam ◽  
Jonathan Robinson ◽  
...  

Online data collection has become indispensable to the social sciences, polling, marketing, and corporate research. However, in recent years, online data collection has been inundated with low quality data. Low quality data threatens the validity of online research and, at times, invalidates entire studies. It is often assumed that random, inconsistent, and fraudulent data in online surveys comes from ‘bots.’ But little is known about whether bad data is caused by bots or ill-intentioned or inattentive humans. We examined this issue on Mechanical Turk (MTurk), a popular online data collection platform. In the summer of 2018, researchers noticed a sharp increase in the number of data quality problems on MTurk, problems that were commonly attributed to bots. Despite this assumption, few studies have directly examined whether problematic data on MTurk are from bots or inattentive humans, even though identifying the source of bad data has important implications for creating the right solutions. Using CloudResearch’s data quality tools to identify problematic participants in 2018 and 2020, we provide evidence that much of the data quality problems on MTurk can be tied to fraudulent users from outside of the U.S. who pose as American workers. Hence, our evidence strongly suggests that the source of low quality data is real humans, not bots. We additionally present evidence that these fraudulent users are behind data quality problems on other platforms.

2021 ◽  
Vol 111 (12) ◽  
pp. 2167-2175
Author(s):  
Stephen J. Blumberg ◽  
Jennifer D. Parker ◽  
Brian C. Moyer

High-quality data are accurate, relevant, and timely. Large national health surveys have always balanced the implementation of these quality dimensions to meet the needs of diverse users. The COVID-19 pandemic shifted these balances, with both disrupted survey operations and a critical need for relevant and timely health data for decision-making. The National Health Interview Survey (NHIS) responded to these challenges with several operational changes to continue production in 2020. However, data files from the 2020 NHIS were not expected to be publicly available until fall 2021. To fill the gap, the National Center for Health Statistics (NCHS) turned to 2 online data collection platforms—the Census Bureau’s Household Pulse Survey (HPS) and the NCHS Research and Development Survey (RANDS)—to collect COVID-19‒related data more quickly. This article describes the adaptations of NHIS and the use of HPS and RANDS during the pandemic in the context of the recently released Framework for Data Quality from the Federal Committee on Statistical Methodology. (Am J Public Health. 2021;111(12):2167–2175. https://doi.org/10.2105/AJPH.2021.306516 )


2020 ◽  
Vol 12 (2) ◽  
pp. 29-34
Author(s):  
K.A. Gurov ◽  
E.V. Savolova ◽  
V.Y. Yarmolovych ◽  
D.M. Ezerovych

To monitoring and optimization the operation of merchant vessels, remote access to the ship's systems, the so-called "ship-shore" networks, is being established. This requires the use of new complex digital electrical metering devices. Choosing the right and optimal device from the available on the current market is not an easy task. This article provides an overview of the typical electrical value analyzers used to create online data collection and transmission systems. A comparative review of the three devices currently widely used by different manufacturers (Phoenix Contact, Germany; ABB, Switzerland; DEIF, Denmark). The data reviewed is compiled into a table. Examples of the usage of these devices on specific vessels are given.


2010 ◽  
Vol 23 (2) ◽  
pp. 221-265 ◽  
Author(s):  
Philippe Fontaine

ArgumentFor more than thirty years after World War II, the unconventional economist Kenneth E. Boulding (1910–1993) was a fervent advocate of the integration of the social sciences. Building on common general principles from various fields, notably economics, political science, and sociology, Boulding claimed that an integrated social science in which mental images were recognized as the main determinant of human behavior would allow for a better understanding of society. Boulding's approach culminated in the social triangle, a view of society as comprised of three main social organizers – exchange, threat, and love – combined in varying proportions. According to this view, the problems of American society were caused by an unbalanced combination of these three organizers. The goal of integrated social scientific knowledge was therefore to help policy makers achieve the “right” proportions of exchange, threat, and love that would lead to social stabilization. Though he was hopeful that cross-disciplinary exchanges would overcome the shortcomings of too narrow specialization, Boulding found that rather than being the locus of a peaceful and mutually beneficial exchange, disciplinary boundaries were often the occasion of conflict and miscommunication.


2021 ◽  
Author(s):  
Victoria Leong ◽  
Kausar Raheel ◽  
Sim Jia Yi ◽  
Kriti Kacker ◽  
Vasilis M. Karlaftis ◽  
...  

Background. The global COVID-19 pandemic has triggered a fundamental reexamination of how human psychological research can be conducted both safely and robustly in a new era of digital working and physical distancing. Online web-based testing has risen to the fore as a promising solution for rapid mass collection of cognitive data without requiring human contact. However, a long-standing debate exists over the data quality and validity of web-based studies. Here, we examine the opportunities and challenges afforded by the societal shift toward web-based testing, highlight an urgent need to establish a standard data quality assurance framework for online studies, and develop and validate a new supervised online testing methodology, remote guided testing (RGT). Methods. A total of 85 healthy young adults were tested on 10 cognitive tasks assessing executive functioning (flexibility, memory and inhibition) and learning. Tasks were administered either face-to-face in the laboratory (N=41) or online using remote guided testing (N=44), delivered using identical web-based platforms (CANTAB, Inquisit and i-ABC). Data quality was assessed using detailed trial-level measures (missed trials, outlying and excluded responses, response times), as well as overall task performance measures. Results. The results indicated that, across all measures of data quality and performance, RGT data was statistically-equivalent to data collected in person in the lab. Moreover, RGT participants out-performed the lab group on measured verbal intelligence, which could reflect test environment differences, including possible effects of mask-wearing on communication. Conclusions. These data suggest that the RGT methodology could help to ameliorate concerns regarding online data quality and - particularly for studies involving high-risk or rare cohorts - offer an alternative for collecting high-quality human cognitive data without requiring in-person physical attendance.


Sign in / Sign up

Export Citation Format

Share Document