noisy labels
Recently Published Documents


TOTAL DOCUMENTS

212
(FIVE YEARS 166)

H-INDEX

13
(FIVE YEARS 8)

2022 ◽  
Vol 40 (3) ◽  
pp. 1-28
Author(s):  
Yadong Zhu ◽  
Xiliang Wang ◽  
Qing Li ◽  
Tianjun Yao ◽  
Shangsong Liang

Mobile advertising has undoubtedly become one of the fastest-growing industries in the world. The influx of capital attracts increasing fraudsters to defraud money from advertisers. Fraudsters can leverage many techniques, where bots install fraud is the most difficult to detect due to its ability to emulate normal users by implementing sophisticated behavioral patterns to evade from detection rules defined by human experts. Therefore, we proposed BotSpot 1 for bots install fraud detection previously. However, there are some drawbacks in BotSpot, such as the sparsity of the devices’ neighbors, weak interactive information of leaf nodes, and noisy labels. In this work, we propose BotSpot++ to improve these drawbacks: (1) for the sparsity of the devices’ neighbors, we propose to construct a super device node to enrich the graph structure and information flow utilizing domain knowledge and a clustering algorithm; (2) for the weak interactive information, we propose to incorporate a self-attention mechanism to enhance the interaction of various leaf nodes; and (3) for the noisy labels, we apply a label smoothing mechanism to alleviate it. Comprehensive experimental results show that BotSpot++ yields the best performance compared with six state-of-the-art baselines. Furthermore, we deploy our model to the advertising platform of Mobvista, 2 a leading global mobile advertising company. The online experiments also demonstrate the effectiveness of our proposed method.


IEEE Access ◽  
2022 ◽  
pp. 1-1
Author(s):  
Xin Wu ◽  
Qing Liu ◽  
Jiarui Qin ◽  
Yong Yu

2021 ◽  
Author(s):  
Lin Li ◽  
Fuchuan Tong ◽  
Qingyang Hong

A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.<br>


Author(s):  
Juan-Luis García-Mendoza ◽  
Luis Villaseñor-Pineda ◽  
Felipe Orihuela-Espina ◽  
Lázaro Bustio-Martínez

Distant Supervision is an approach that allows automatic labeling of instances. This approach has been used in Relation Extraction. Still, the main challenge of this task is handling instances with noisy labels (e.g., when two entities in a sentence are automatically labeled with an invalid relation). The approaches reported in the literature addressed this problem by employing noise-tolerant classifiers. However, if a noise reduction stage is introduced before the classification step, this increases the macro precision values. This paper proposes an Adversarial Autoencoders-based approach for obtaining a new representation that allows noise reduction in Distant Supervision. The representation obtained using Adversarial Autoencoders minimize the intra-cluster distance concerning pre-trained embeddings and classic Autoencoders. Experiments demonstrated that in the noise-reduced datasets, the macro precision values obtained over the original dataset are similar using fewer instances considering the same classifier. For example, in one of the noise-reduced datasets, the macro precision was improved approximately 2.32% using 77% of the original instances. This suggests the validity of using Adversarial Autoencoders to obtain well-suited representations for noise reduction. Also, the proposed approach maintains the macro precision values concerning the original dataset and reduces the total instances needed for classification.


2021 ◽  
Author(s):  
Sayantan Majumdar ◽  
Ramesh Nair ◽  
Amit Kapadia ◽  
Jesus Martinez Manso ◽  
Cameron Bronstein ◽  
...  

2021 ◽  
Author(s):  
Lin Li ◽  
Fuchuan Tong ◽  
Qingyang Hong

A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.<br>


2021 ◽  
Author(s):  
Lin Li ◽  
Fuchuan Tong ◽  
Qingyang Hong

A typical speaker recognition system often involves two modules: a feature extractor front-end and a speaker identity back-end. Despite the superior performance that deep neural networks have achieved for the front-end, their success benefits from the availability of large-scale and correctly labeled datasets. While label noise is unavoidable in speaker recognition datasets, both the front-end and back-end are affected by label noise, which degrades the speaker recognition performance. In this paper, we first conduct comprehensive experiments to help improve the understanding of the effects of label noise on both the front-end and back-end. Then, we propose a simple yet effective training paradigm and loss correction method to handle label noise for the front-end. We combine our proposed method with the recently proposed Bayesian estimation of PLDA for noisy labels, and the whole system shows strong robustness to label noise. Furthermore, we show two practical applications of the improved system: one application corrects noisy labels based on an utterance’s chunk-level predictions, and the other algorithmically filters out high-confidence noisy samples within a dataset. By applying the second application to the NIST SRE0410 dataset and verifying filtered utterances by human validation, we identify that approximately 1% of the SRE04-10 dataset is made up of label errors.<br>


Sign in / Sign up

Export Citation Format

Share Document