scholarly journals A robust speaker recognition based on Data augmentation

2020 ◽  
Author(s):  
Chenqi Li ◽  
Haoyuan Lu ◽  
Wei Wang

Abstract It is known that large-scale training data can get the better effect of recognition. However, it is difficult to collect a lot of labeled training data for speaker recognition. At the same time, the performance of speaker recognition is greatly influenced by environmental noise. In this paper, we use data augmentation by adding noise to get much training data and improve the robustness of speaker recognition. The experimental results demonstrate that data augmentation have the better performance improvement on Chinese-863 database.

Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 845
Author(s):  
Dongheun Han ◽  
Chulwoo Lee ◽  
Hyeongyeop Kang

The neural-network-based human activity recognition (HAR) technique is being increasingly used for activity recognition in virtual reality (VR) users. The major issue of a such technique is the collection large-scale training datasets which are key for deriving a robust recognition model. However, collecting large-scale data is a costly and time-consuming process. Furthermore, increasing the number of activities to be classified will require a much larger number of training datasets. Since training the model with a sparse dataset can only provide limited features to recognition models, it can cause problems such as overfitting and suboptimal results. In this paper, we present a data augmentation technique named gravity control-based augmentation (GCDA) to alleviate the sparse data problem by generating new training data based on the existing data. The benefits of the symmetrical structure of the data are that it increased the number of data while preserving the properties of the data. The core concept of GCDA is two-fold: (1) decomposing the acceleration data obtained from the inertial measurement unit (IMU) into zero-gravity acceleration and gravitational acceleration, and augmenting them separately, and (2) exploiting gravity as a directional feature and controlling it to augment training datasets. Through the comparative evaluations, we validated that the application of GCDA to training datasets showed a larger improvement in classification accuracy (96.39%) compared to the typical data augmentation methods (92.29%) applied and those that did not apply the augmentation method (85.21%).


2020 ◽  
Vol 34 (05) ◽  
pp. 9193-9200
Author(s):  
Shaolei Wang ◽  
Wangxiang Che ◽  
Qi Liu ◽  
Pengda Qin ◽  
Ting Liu ◽  
...  

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasks-i.e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.


2019 ◽  
Vol 9 (8) ◽  
pp. 1550 ◽  
Author(s):  
Aihong Shen ◽  
Huasheng Wang ◽  
Junjie Wang ◽  
Hongchen Tan ◽  
Xiuping Liu ◽  
...  

Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map level to enforce feature diversity in the network. Furthermore, an attention assignment mechanism is proposed to enforce that the person re-ID classifier focuses on nearly all important regions of the input person image. To achieve this, a three-stage framework is proposed. First, a baseline classification network is trained for person re-ID. Second, an attention assignment network is proposed based on the baseline network, in which the attention module learns to suppress the response of the current detected regions and re-assign attentions to other important locations. By this means, multiple important regions for classification are highlighted by the attention map. Finally, the attention map is integrated in the attention-aware adversarial network (AAA-Net), which generates high-performance classification results with an adversarial training strategy. We evaluate the proposed method on two large-scale benchmark datasets, including Market1501 and DukeMTMC-reID. Experimental results show that our algorithm performs favorably against the state-of-the-art methods.


2021 ◽  
Vol 8 (2) ◽  
pp. 273-287
Author(s):  
Xuewei Bian ◽  
Chaoqun Wang ◽  
Weize Quan ◽  
Juntao Ye ◽  
Xiaopeng Zhang ◽  
...  

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.


2018 ◽  
Author(s):  
Germán Abrevaya ◽  
Aleksandr Aravkin ◽  
Guillermo Cecchi ◽  
Irina Rish ◽  
Pablo Polosecki ◽  
...  

AbstractMany real-world data sets, especially in biology, are produced by highly multivariate and nonlinear complex dynamical systems. In this paper, we focus on brain imaging data, including both calcium imaging and functional MRI data. Standard vector-autoregressive models are limited by their linearity assumptions, while nonlinear general-purpose, large-scale temporal models, such as LSTM networks, typically require large amounts of training data, not always readily available in biological applications; furthermore, such models have limited interpretability. We introduce here a novel approach for learning a nonlinear differential equation model aimed at capturing brain dynamics. Specifically, we propose a variable-projection optimization approach to estimate the parameters of the multivariate (coupled) van der Pol oscillator, and demonstrate that such a model can accurately represent nonlinear dynamics of the brain data. Furthermore, in order to improve the predictive accuracy when forecasting future brain-activity time series, we use this analytical model as an unlimited source of simulated data for pretraining LSTM; such model-specific data augmentation approach consistently improves LSTM performance on both calcium and fMRI imaging data.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pengcheng Li ◽  
Qikai Liu ◽  
Qikai Cheng ◽  
Wei Lu

Purpose This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora in existing studies, a distant supervised learning-based approach is proposed to identify data set entities automatically from large-scale scientific literature in an open domain. Design/methodology/approach Firstly, the authors use a dictionary combined with a bootstrapping strategy to create a labelled corpus to apply supervised learning. Secondly, a bidirectional encoder representation from transformers (BERT)-based neural model was applied to identify data set entities in the scientific literature automatically. Finally, two data augmentation techniques, entity replacement and entity masking, were introduced to enhance the model generalisability and improve the recognition of data set entities. Findings In the absence of training data, the proposed method can effectively identify data set entities in large-scale scientific papers. The BERT-based vectorised representation and data augmentation techniques enable significant improvements in the generality and robustness of named entity recognition models, especially in long-tailed data set entity recognition. Originality/value This paper provides a practical research method for automatically recognising data set entities in scientific literature. To the best of the authors’ knowledge, this is the first attempt to apply distant learning to the study of data set entity recognition. The authors introduce a robust vectorised representation and two data augmentation strategies (entity replacement and entity masking) to address the problem inherent in distant supervised learning methods, which the existing research has mostly ignored. The experimental results demonstrate that our approach effectively improves the recognition of data set entities, especially long-tailed data set entities.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1562 ◽  
Author(s):  
Xiaoming Lv ◽  
Fajie Duan ◽  
Jia-jia Jiang ◽  
Xiao Fu ◽  
Lin Gan

Metallic surface defect detection is an essential and necessary process to control the qualities of industrial products. However, due to the limited data scale and defect categories, existing defect datasets are generally unavailable for the deployment of the detection model. To address this problem, we contribute a new dataset called GC10-DET for large-scale metallic surface defect detection. The GC10-DET dataset has great challenges on defect categories, image number, and data scale. Besides, traditional detection approaches are poor in both efficiency and accuracy for the complex real-world environment. Thus, we also propose a novel end-to-end defect detection network (EDDN) based on the Single Shot MultiBox Detector. The EDDN model can deal with defects with different scales. Furthermore, a hard negative mining method is designed to alleviate the problem of data imbalance, while some data augmentation methods are adopted to enrich the training data for the expensive data collection problem. Finally, the extensive experiments on two datasets demonstrate that the proposed method is robust and can meet accuracy requirements for metallic defect detection.


2020 ◽  

Abstract The authors have requested that this preprint be withdrawn due to erroneous posting.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3780 ◽  
Author(s):  
Mustansar Fiaz ◽  
Arif Mahmood ◽  
Ki Yeol Baek ◽  
Sehar Shahzad Farooq ◽  
Soon Ki Jung

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.


2020 ◽  
Vol 2020 (14) ◽  
pp. 248-1-248-7
Author(s):  
Lan Fu ◽  
Hongkai Yu ◽  
Megna Shah ◽  
Jeff Simmons ◽  
Song Wang

Accurately and rapidly detecting the locations of the cores of large-scale dendrites from 2D sectioned microscopic images helps quantify the microstructure of material components. This provides a critical link between the processing and properties of the material. Such a tool could be a critical part of a quality control procedure for manufacturing these components. In this paper, we propose to use Faster R-CNN, a convolutional neural network (CNN) model that considers both the detection accuracy and computational efficiency, to detect the dendrite cores with complex shapes. However, training CNN models usually requires a large number of images annotated with ground-truth locations of dendrite cores, which are usually obtained by highly laborintensive manual annotations. In this paper, we leverage the crystallographic symmetry of dendrite cores for data augmentation – the cross sections of dendrite cores show, not perfect, but near four-fold rotation symmetry and we can rotate the image around the center of dendrite cores by specified angles to construct new training data without additional manual annotations. We conduct a series of experiments and the results show the effectiveness of the Faster R-CNN method with the proposed data augmentation strategy. Particularly, we find that we can reduce the number of the manually annotated training images by 75% while still maintaining the same detection accuracy of dendrite cores.


Sign in / Sign up

Export Citation Format

Share Document