Multi-Task Self-Supervised Learning for Disfluency Detection

Shaolei Wang; Wangxiang Che; Qi Liu; Pengda Qin; Ting Liu; William Yang Wang

doi:10.1609/aaai.v34i05.6456

Multi-Task Self-Supervised Learning for Disfluency Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6456 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9193-9200

Author(s):

Shaolei Wang ◽

Wangxiang Che ◽

Qi Liu ◽

Pengda Qin ◽

Ting Liu ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Experimental Results ◽

Training Data ◽

Competitive Performance ◽

Test Set ◽

Full Dataset ◽

Sentence Classification ◽

Trained Network

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text

Combining Self-supervised Learning and Active Learning for Disfluency Detection

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3487290 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-25

Author(s):

Shaolei Wang ◽

Zhongyuan Wang ◽

Wanxiang Che ◽

Sendong Zhao ◽

Ting Liu

Keyword(s):

Neural Network ◽

Active Learning ◽

Supervised Learning ◽

Large Scale ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Performance Gap ◽

Annotation Costs ◽

Trained Neural Network

Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.

Download Full-text

Data set entity recognition based on distant supervision

The Electronic Library ◽

10.1108/el-10-2020-0301 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Pengcheng Li ◽

Qikai Liu ◽

Qikai Cheng ◽

Wei Lu

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Data Augmentation ◽

Scientific Literature ◽

Neural Model ◽

Training Data ◽

Entity Recognition ◽

Data Set ◽

Content Type ◽

Augmentation Techniques

Purpose This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora in existing studies, a distant supervised learning-based approach is proposed to identify data set entities automatically from large-scale scientific literature in an open domain. Design/methodology/approach Firstly, the authors use a dictionary combined with a bootstrapping strategy to create a labelled corpus to apply supervised learning. Secondly, a bidirectional encoder representation from transformers (BERT)-based neural model was applied to identify data set entities in the scientific literature automatically. Finally, two data augmentation techniques, entity replacement and entity masking, were introduced to enhance the model generalisability and improve the recognition of data set entities. Findings In the absence of training data, the proposed method can effectively identify data set entities in large-scale scientific papers. The BERT-based vectorised representation and data augmentation techniques enable significant improvements in the generality and robustness of named entity recognition models, especially in long-tailed data set entity recognition. Originality/value This paper provides a practical research method for automatically recognising data set entities in scientific literature. To the best of the authors’ knowledge, this is the first attempt to apply distant learning to the study of data set entity recognition. The authors introduce a robust vectorised representation and two data augmentation strategies (entity replacement and entity masking) to address the problem inherent in distant supervised learning methods, which the existing research has mostly ignored. The experimental results demonstrate that our approach effectively improves the recognition of data set entities, especially long-tailed data set entities.

Download Full-text

Learning Meta Model for Zero- and Few-Shot Face Anti-Spoofing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6866 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11916-11923 ◽

Cited By ~ 4

Author(s):

Yunxiao Qin ◽

Chenxu Zhao ◽

Xiangyu Zhu ◽

Zezheng Wang ◽

Zitong Yu ◽

...

Keyword(s):

Face Recognition ◽

Supervised Learning ◽

Large Scale ◽

Training Data ◽

Learning Problem ◽

Meta Model ◽

Meta Learning ◽

Recognition Systems

Face anti-spoofing is crucial to the security of face recognition systems. Most previous methods formulate face anti-spoofing as a supervised learning problem to detect various predefined presentation attacks, which need large scale training data to cover as many attacks as possible. However, the trained model is easy to overfit several common attacks and is still vulnerable to unseen attacks. To overcome this challenge, the detector should: 1) learn discriminative features that can generalize to unseen spoofing types from predefined presentation attacks; 2) quickly adapt to new spoofing types by learning from both the predefined attacks and a few examples of the new spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot learning problem. In this paper, we propose a novel Adaptive Inner-update Meta Face Anti-Spoofing (AIM-FAS) method to tackle this problem through meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task of detecting unseen spoofing types by learning from predefined living and spoofing faces and a few examples of new attacks. To assess the proposed approach, we propose several benchmarks for zero- and few-shot FAS. Experiments show its superior performances on the presented benchmarks to existing methods in existing zero-shot FAS protocols.

Download Full-text

Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning

Applied Sciences ◽

10.3390/app9153133 ◽

2019 ◽

Vol 9 (15) ◽

pp. 3133

Author(s):

Sanghyun Seo ◽

Juntae Kim

Keyword(s):

Vector Space ◽

Supervised Learning ◽

Confidence Score ◽

Experimental Results ◽

Training Data ◽

Negative Sample ◽

Visual Data ◽

Class Label ◽

Semantic Data ◽

Triplet Loss

Traditional supervised learning is dependent on the label of the training data, so there is a limitation that the class label which is not included in the training data cannot be recognized properly. Therefore, zero-shot learning, which can recognize unseen-classes that are not used in training, is gaining research interest. One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval on newly input unseen-class data. This paper proposes a hierarchical semantic loss and confidence estimator to more efficiently perform zero-shot learning on visual data. Hierarchical semantic loss improves learning efficiency by using hierarchical knowledge in selecting a negative sample of triplet loss, and the confidence estimator estimates the confidence score to determine whether it is seen-class or unseen-class. These methodologies improve the performance of zero-shot learning by adjusting distances from a semantic vector to visual vector when performing zero-shot cross-modal retrieval. Experimental results show that the proposed method can improve the performance of zero-shot learning in terms of hit@k accuracy.

Download Full-text

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

BMC Bioinformatics ◽

10.1186/s12859-021-04504-x ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Aparna Elangovan ◽

Yuan Li ◽

Douglas E. V. Pires ◽

Melissa J. Davis ◽

Karin Verspoor

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Large Scale ◽

Training Data ◽

Biological Knowledge ◽

Post Translational Modification ◽

High Confidence ◽

Test Set ◽

Pubmed Database ◽

Confidence Calibration

Abstract Motivation Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models—dubbed PPI-BioBERT-x10—to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter $$\approx 5700$$ ≈ 5700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Download Full-text

Adaptive Graph Guided Embedding for Multi-label Annotation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/388 ◽

2018 ◽

Cited By ~ 8

Author(s):

Lichen Wang ◽

Zhengming Ding ◽

Yun Fu

Keyword(s):

Large Scale ◽

State Of The Art ◽

Unlabeled Data ◽

Label Propagation ◽

Experimental Results ◽

Training Data ◽

Learning Performance ◽

Intrinsic Structure ◽

Latent Space ◽

Art Methods

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

Download Full-text

Large-Scale Protein-Protein Post-Translational Modification Extraction With Distant Supervision And Confidence Calibrated BioBERT

10.21203/rs.3.rs-898489/v1 ◽

2021 ◽

Author(s):

Aparna Elangovan ◽

Yuan Li ◽

Douglas E.V. Pires ◽

Melissa J. Davis ◽

Karin Verspoor

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Large Scale ◽

Training Data ◽

Biological Knowledge ◽

Post Translational Modification ◽

High Confidence ◽

Test Set ◽

Pubmed Database ◽

Confidence Calibration

Abstract Motivation: Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. We further assessed model generalisation in a real-world scenario, evaluating its performance on a randomly sampled subset of predictions from 18 million PubMed abstracts. Method: We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models – dubbed PPI-BioBERT-x10 – to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion: The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P=58.1, R=32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter ≈5,700 (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Download Full-text

A robust speaker recognition based on Data augmentation

10.21203/rs.3.rs-16425/v1 ◽

2020 ◽

Author(s):

Chenqi Li ◽

Haoyuan Lu ◽

Wei Wang

Keyword(s):

Performance Improvement ◽

Speaker Recognition ◽

Large Scale ◽

Data Augmentation ◽

Environmental Noise ◽

Experimental Results ◽

Training Data ◽

Robust Speaker Recognition

Abstract It is known that large-scale training data can get the better effect of recognition. However, it is difficult to collect a lot of labeled training data for speaker recognition. At the same time, the performance of speaker recognition is greatly influenced by environmental noise. In this paper, we use data augmentation by adding noise to get much training data and improve the robustness of speaker recognition. The experimental results demonstrate that data augmentation have the better performance improvement on Chinese-863 database.

Download Full-text

Hierarchical Classification of Urban ALS Data by Using Geometry and Intensity Information

Sensors ◽

10.3390/s19204583 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4583 ◽

Cited By ~ 1

Author(s):

Xiaoqiang Liu ◽

Yanming Chen ◽

Shuyi Li ◽

Liang Cheng ◽

Manchun Li

Keyword(s):

Supervised Learning ◽

Laser Scanning ◽

Large Scale ◽

Three Dimensional ◽

Hierarchical Classification ◽

Training Data ◽

Classification Model ◽

Learning Method ◽

Intensity Information

Airborne laser scanning (ALS) can acquire both geometry and intensity information of geo-objects, which is important in mapping a large-scale three-dimensional (3D) urban environment. However, the intensity information recorded by ALS will be changed due to the flight height and atmospheric attenuation, which decreases the robustness of the trained supervised classifier. This paper proposes a hierarchical classification method by separately using geometry and intensity information of urban ALS data. The method uses supervised learning for stable geometry information and unsupervised learning for fluctuating intensity information. The experiment results show that the proposed method can utilize the intensity information effectively, based on three aspects, as below. (1) The proposed method improves the accuracy of classification result by using intensity. (2) When the ALS data to be classified are acquired under the same conditions as the training data, the performance of the proposed method is as good as the supervised learning method. (3) When the ALS data to be classified are acquired under different conditions from the training data, the performance of the proposed method is better than the supervised learning method. Therefore, the classification model derived from the proposed method can be transferred to other ALS data whose intensity is inconsistent with the training data. Furthermore, the proposed method can contribute to the hierarchical use of some other ALS information, such as multi-spectral information.

Download Full-text