scholarly journals Aggregating Crowd Wisdoms with Label-aware Autoencoders

Author(s):  
Li'ang Yin ◽  
Jianhua Han ◽  
Weinan Zhang ◽  
Yong Yu

Aggregating crowd wisdoms takes multiple labels from various sources and infers true labels for objects. Recent research work makes progress by learning source credibility from data and roughly form three kinds of modeling frameworks: weighted majority voting, trust propagation, and generative models. In this paper, we propose a novel framework named Label-Aware Autoencoders (LAA) to aggregate crowd wisdoms. LAA integrates a classifier and a reconstructor into a unified model to infer labels in an unsupervised manner. Analogizing classical autoencoders, we can regard the classifier as an encoder, the reconstructor as a decoder, and inferred labels as latent features. To the best of our knowledge, it is the first trial to combine label aggregation with autoencoders. We adopt networks to implement the classifier and the reconstructor which have the potential to automatically learn underlying patterns of source credibility. To further improve inference accuracy, we introduce object ambiguity and latent aspects into LAA. Experiments on three real-world datasets show that proposed models achieve impressive inference accuracy improvement over state-of-the-art models.

2022 ◽  
Vol 16 (2) ◽  
pp. 1-18
Author(s):  
Hanlu Wu ◽  
Tengfei Ma ◽  
Lingfei Wu ◽  
Fangli Xu ◽  
Shouling Ji

Crowdsourcing has attracted much attention for its convenience to collect labels from non-expert workers instead of experts. However, due to the high level of noise from the non-experts, a label aggregation model that infers the true label from noisy crowdsourced labels is required. In this article, we propose a novel framework based on graph neural networks for aggregating crowd labels. We construct a heterogeneous graph between workers and tasks and derive a new graph neural network to learn the representations of nodes and the true labels. Besides, we exploit the unknown latent interaction between the same type of nodes (workers or tasks) by adding a homogeneous attention layer in the graph neural networks. Experimental results on 13 real-world datasets show superior performance over state-of-the-art models.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258410
Author(s):  
Xintao Ma ◽  
Liyan Dong ◽  
Yuequn Wang ◽  
Yongli Li ◽  
Hao Zhang

To alleviate the data sparsity and cold start problems for collaborative filtering in recommendation systems, side information is usually leveraged by researchers to improve the recommendation performance. The utility of knowledge graph regards the side information as part of the graph structure and gives an explanation for recommendation results. In this paper, we propose an enhanced multi-task neighborhood interaction (MNI) model for recommendation on knowledge graphs. MNI explores not only the user-item interaction but also the neighbor-neighbor interactions, capturing a more sophisticated local structure. Besides, the entities and relations are also semantically embedded. And with the cross&compress unit, items in the recommendation system and entities in the knowledge graph can share latent features, and thus high-order interactions can be investigated. Through extensive experiments on real-world datasets, we demonstrate that MNI outperforms some of the state-of-the-art baselines both for CTR prediction and top-N recommendation.


Author(s):  
Kaixuan Chen ◽  
Lina Yao ◽  
Dalin Zhang ◽  
Xiaojun Chang ◽  
Guodong Long ◽  
...  

Semi-supervised learning is crucial for alleviating labelling burdens in people-centric sensing. However, humangenerated data inherently suffer from distribution shift in semi-supervised learning due to the diverse biological conditions and behavior patterns of humans. To address this problem, we propose a generic distributionally robust model for semi-supervised learning on distributionally shifted data. Considering both the discrepancy and the consistency between the labeled data and the unlabeled data, we learn the latent features that reduce person-specific discrepancy and preserve task-specific consistency. We evaluate our model in a variety of people-centric recognition tasks on real-world datasets, including intention recognition, activity recognition, muscular movement recognition and gesture recognition. The experiment results demonstrate that the proposed model outperforms the state-of-the-art methods.


Author(s):  
Yasushi Kawase ◽  
Yuko Kuroki ◽  
Atsushi Miyauchi

Aggregating responses from crowd workers is a fundamental task in the process of crowdsourcing. In cases where a few experts are overwhelmed by a large number of non-experts, most answer aggregation algorithms such as the majority voting fail to identify the correct answers. Therefore, it is crucial to extract reliable experts from the crowd workers. In this study, we introduce the notion of "expert core", which is a set of workers that is very unlikely to contain a non-expert. We design a graph-mining-based efficient algorithm that exactly computes the expert core. To answer the aggregation task, we propose two types of algorithms. The first one incorporates the expert core into existing answer aggregation algorithms such as the majority voting, whereas the second one utilizes information provided by the expert core extraction algorithm pertaining to the reliability of workers. We then give a theoretical justification for the first type of algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our proposed answer aggregation algorithms outperform state-of-the-art algorithms. 


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Lobo ◽  
Rui Henriques ◽  
Sara C. Madeira

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1407
Author(s):  
Peng Wang ◽  
Jing Zhou ◽  
Yuzhang Liu ◽  
Xingchen Zhou

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.


Author(s):  
Sascha Meyen ◽  
Dorothee M. B. Sigg ◽  
Ulrike von Luxburg ◽  
Volker H. Franz

Abstract Background It has repeatedly been reported that, when making decisions under uncertainty, groups outperform individuals. Real groups are often replaced by simulated groups: Instead of performing an actual group discussion, individual responses are aggregated by a numerical computation. While studies have typically used unweighted majority voting (MV) for this aggregation, the theoretically optimal method is confidence weighted majority voting (CWMV)—if independent and accurate confidence ratings from the individual group members are available. To determine which simulations (MV vs. CWMV) reflect real group processes better, we applied formal cognitive modeling and compared simulated group responses to real group responses. Results Simulated group decisions based on CWMV matched the accuracy of real group decisions, while simulated group decisions based on MV showed lower accuracy. CWMV predicted the confidence that groups put into their group decisions well. However, real groups treated individual votes to some extent more equally weighted than suggested by CWMV. Additionally, real groups tend to put lower confidence into their decisions compared to CWMV simulations. Conclusion Our results highlight the importance of taking individual confidences into account when simulating group decisions: We found that real groups can aggregate individual confidences in a way that matches statistical aggregations given by CWMV to some extent. This implies that research using simulated group decisions should use CWMV instead of MV as a benchmark to compare real groups to.


Author(s):  
Masoumeh Zareapoor ◽  
Jie Yang

Image-to-Image translation aims to learn an image from a source domain to a target domain. However, there are three main challenges, such as lack of paired datasets, multimodality, and diversity, that are associated with these problems and need to be dealt with. Convolutional neural networks (CNNs), despite of having great performance in many computer vision tasks, they fail to detect the hierarchy of spatial relationships between different parts of an object and thus do not form the ideal representative model we look for. This article presents a new variation of generative models that aims to remedy this problem. We use a trainable transformer, which explicitly allows the spatial manipulation of data within training. This differentiable module can be augmented into the convolutional layers in the generative model, and it allows to freely alter the generated distributions for image-to-image translation. To reap the benefits of proposed module into generative model, our architecture incorporates a new loss function to facilitate an effective end-to-end generative learning for image-to-image translation. The proposed model is evaluated through comprehensive experiments on image synthesizing and image-to-image translation, along with comparisons with several state-of-the-art algorithms.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-32
Author(s):  
Quang-huy Duong ◽  
Heri Ramampiaro ◽  
Kjetil Nørvåg ◽  
Thu-lan Dam

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.


Logistics ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 8
Author(s):  
Hicham Lamzaouek ◽  
Hicham Drissi ◽  
Naima El Haoud

The bullwhip effect is a pervasive phenomenon in all supply chains causing excessive inventory, delivery delays, deterioration of customer service, and high costs. Some researchers have studied this phenomenon from a financial perspective by shedding light on the phenomenon of cash flow bullwhip (CFB). The objective of this article is to provide the state of the art in relation to research work on CFB. Our ambition is not to make an exhaustive list, but to synthesize the main contributions, to enable us to identify other interesting research perspectives. In this regard, certain lines of research remain insufficiently explored, such as the role that supply chain digitization could play in controlling CFB, the impact of CFB on the profitability of companies, or the impacts of the omnichannel commerce on CFB.


Sign in / Sign up

Export Citation Format

Share Document