label aggregation
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 20)

H-INDEX

5
(FIVE YEARS 2)

2022 ◽  
Vol 16 (2) ◽  
pp. 1-18
Author(s):  
Hanlu Wu ◽  
Tengfei Ma ◽  
Lingfei Wu ◽  
Fangli Xu ◽  
Shouling Ji

Crowdsourcing has attracted much attention for its convenience to collect labels from non-expert workers instead of experts. However, due to the high level of noise from the non-experts, a label aggregation model that infers the true label from noisy crowdsourced labels is required. In this article, we propose a novel framework based on graph neural networks for aggregating crowd labels. We construct a heterogeneous graph between workers and tasks and derive a new graph neural network to learn the representations of nodes and the true labels. Besides, we exploit the unknown latent interaction between the same type of nodes (workers or tasks) by adding a homogeneous attention layer in the graph neural networks. Experimental results on 13 real-world datasets show superior performance over state-of-the-art models.


2022 ◽  
Vol 73 ◽  
pp. 209-229
Author(s):  
Chong Liu ◽  
Yu-Xiang Wang

Large-scale labeled dataset is the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection rules, we often need much lower label cost to achieve nearly the same accuracy as in the ideal world where all workers label all data points.


Author(s):  
Meric Altug Gemalmaz ◽  
Ming Yin

Collecting large-scale human-annotated datasets via crowdsourcing to train and improve automated models is a prominent human-in-the-loop approach to integrate human and machine intelligence. However, together with their unique intelligence, humans also come with their biases and subjective beliefs, which may influence the quality of the annotated data and negatively impact the effectiveness of the human-in-the-loop systems. One of the most common types of cognitive biases that humans are subject to is the confirmation bias, which is people's tendency to favor information that confirms their existing beliefs and values. In this paper, we present an algorithmic approach to infer the correct answers of tasks by aggregating the annotations from multiple crowd workers, while taking workers' various levels of confirmation bias into consideration. Evaluations on real-world crowd annotations show that the proposed bias-aware label aggregation algorithm outperforms baseline methods in accurately inferring the ground-truth labels of different tasks when crowd workers indeed exhibit some degree of confirmation bias. Through simulations on synthetic data, we further identify the conditions when the proposed algorithm has the largest advantages over baseline methods.


Author(s):  
Jiacheng Liu ◽  
Feilong Tang ◽  
Long Chen ◽  
Yanmin Zhu

Author(s):  
Chi Hong ◽  
Amirmasoud Ghiassi ◽  
Yichi Zhou ◽  
Robert Birke ◽  
Lydia Y. Chen

Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 875
Author(s):  
Jesus Cerquides ◽  
Mehmet Oğuz Mülâyim ◽  
Jerónimo Hernández-González ◽  
Amudha Ravi Shankar ◽  
Jose Luis Fernandez-Marquez

Over the last decade, hundreds of thousands of volunteers have contributed to science by collecting or analyzing data. This public participation in science, also known as citizen science, has contributed to significant discoveries and led to publications in major scientific journals. However, little attention has been paid to data quality issues. In this work we argue that being able to determine the accuracy of data obtained by crowdsourcing is a fundamental question and we point out that, for many real-life scenarios, mathematical tools and processes for the evaluation of data quality are missing. We propose a probabilistic methodology for the evaluation of the accuracy of labeling data obtained by crowdsourcing in citizen science. The methodology builds on an abstract probabilistic graphical model formalism, which is shown to generalize some already existing label aggregation models. We show how to make practical use of the methodology through a comparison of data obtained from different citizen science communities analyzing the earthquake that took place in Albania in 2019.


2021 ◽  
pp. 176-185
Author(s):  
Jiyi Li ◽  
Lucas Ryo Endo ◽  
Hisashi Kashima
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document