A Fast Histogram-Based Postprocessor That Improves Posterior Probability Estimates

Although the outputs of neural network classifiers are often considered to be estimates of posterior class probabilities, the literature that assesses the calibration accuracy of these estimates illustrates that practical networks often fall far short of being ideal estimators. The theorems used to justify treating network outputs as good posterior estimates are based on several assumptions: that the network is sufficiently complex to model the posterior distribution accurately, that there are sufficient training data to specify the network, and that the optimization routine is capable of finding the global minimum of the cost function. Any or all of these assumptions may be violated in practice. This article does three things. First, we apply a simple, previously used histogram technique to assess graphically the accuracy of posterior estimates with respect to individual classes. Second, we introduce a simple and fast remapping procedure that transforms network outputs to provide better estimates of posteriors. Third, we use the remapping in a real-world telephone speech recognition system. The remapping results in a 10% reduction of both word-level error rates (from 4.53% to 4.06%) and sentence-level error rates (from 16.38% to 14.69%) on one corpus, and a 29% reduction at sentence-level error (from 6.3% to 4.5%) on another. The remapping required negligible additional overhead (in terms of both parameters and calculations). McNemar's test shows that these levels of improvement are statistically significant.

Download Full-text

Textual Adversarial Attacking with Limited Queries

Electronics ◽

10.3390/electronics10212671 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2671

Author(s):

Yu Zhang ◽

Junan Yang ◽

Xiaoshuai Li ◽

Hui Liu ◽

Kun Shao

Keyword(s):

Language Processing ◽

Main Idea ◽

Local Model ◽

Small Perturbations ◽

Target Model ◽

Word Level ◽

Sentence Level ◽

Adversarial Examples ◽

Reducing Costs ◽

The Cost

Recent studies have shown that natural language processing (NLP) models are vulnerable to adversarial examples, which are maliciously designed by adding small perturbations to benign inputs that are imperceptible to the human eye, leading to false predictions by the target model. Compared to character- and sentence-level textual adversarial attacks, word-level attack can generate higher-quality adversarial examples, especially in a black-box setting. However, existing attack methods usually require a huge number of queries to successfully deceive the target model, which is costly in a real adversarial scenario. Hence, finding appropriate models is difficult. Therefore, we propose a novel attack method, the main idea of which is to fully utilize the adversarial examples generated by the local model and transfer part of the attack to the local model to complete ahead of time, thereby reducing costs related to attacking the target model. Extensive experiments conducted on three public benchmarks show that our attack method can not only improve the success rate but also reduce the cost, while outperforming the baselines by a significant margin.

Download Full-text

MULTILINGUAL MACHINE PRINTED OCR

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001401000745 ◽

2001 ◽

Vol 15 (01) ◽

pp. 43-63 ◽

Cited By ~ 37

Author(s):

PREMKUMAR NATARAJAN ◽

ZHIDONG LU ◽

RICHARD SCHWARTZ ◽

ISSAM BAZZI ◽

JOHN MAKHOUL

Keyword(s):

Feature Extraction ◽

Character Recognition ◽

Optical Character Recognition ◽

Markov Models ◽

Ground Truth ◽

Recognition System ◽

Training Data ◽

Continuous Speech Recognition ◽

Word Level ◽

Adaptation Method

This paper presents a script-independent methodology for optical character recognition (OCR) based on the use of hidden Markov models (HMM). The feature extraction, training and recognition components of the system are all designed to be script independent. The training and recognition components were taken without modification from a continuous speech recognition system; the only component that is specific to OCR is the feature extraction component. To port the system to a new language, all that is needed is text image training data from the new language, along with ground truth which gives the identity of the sequences of characters along each line of each text image, without specifying the location of the characters on the image. The parameters of the character HMMs are estimated automatically from the training data, without the need for laborious handwritten rules. The system does not require presegmentation of the data, neither at the word level nor at the character level. Thus, the system is able to handle languages with connected characters in a straightforward manner. The script independence of the system is demonstrated in three languages with different types of script: Arabic, English, and Chinese. The robustness of the system is further demonstrated by testing the system on fax data. An unsupervised adaptation method is then described to improve performance under degraded conditions.

Download Full-text

Textual Backdoor Defense via Poisoned Sample Recognition

Applied Sciences ◽

10.3390/app11219938 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9938

Author(s):

Kun Shao ◽

Yu Zhang ◽

Junan Yang ◽

Hui Liu

Keyword(s):

Success Rate ◽

Language Processing ◽

Training Data ◽

Infection Model ◽

Search Range ◽

Word Level ◽

Sentence Level ◽

Preliminary Model ◽

Sample Recognition ◽

Better Than

Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attacks based on data poisoning in existing research is as high as 100%. In order to enhance the natural language processing model’s defense against backdoor attacks, we propose a textual backdoor defense method via poisoned sample recognition. Our method consists of two parts: the first step is to add a controlled noise layer after the model embedding layer, and to train a preliminary model with incomplete or no backdoor embedding, which reduces the effectiveness of poisoned samples. Then, we use the model to initially identify the poisoned samples in the training set so as to narrow the search range of the poisoned samples. The second step uses all the training data to train an infection model embedded in the backdoor, which is used to reclassify the samples selected in the first step, and finally identify the poisoned samples. Through detailed experiments, we have proved that our defense method can effectively defend against a variety of backdoor attacks (character-level, word-level and sentence-level backdoor attacks), and the experimental effect is better than the baseline method. For the BERT model trained by the IMDB dataset, this method can even reduce the success rate of word-level backdoor attacks to 0%.

Download Full-text

Preserve Integrity in Realtime Event Summarization

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442344 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-29

Author(s):

Chen Lin ◽

Zhichao Ouyang ◽

Xiaoli Wang ◽

Hui Li ◽

Zhenhua Huang

Keyword(s):

Information Acquisition ◽

Information Source ◽

Training Data ◽

Conflicting Information ◽

Word Level ◽

Text Streams ◽

Sentence Level ◽

Real World Datasets ◽

Event Summarization ◽

Consistent Information

Online text streams such as Twitter are the major information source for users when they are looking for ongoing events. Realtime event summarization aims to generate and update coherent and concise summaries to describe the state of a given event. Due to the enormous volume of continuously coming texts, realtime event summarization has become the de facto tool to facilitate information acquisition. However, there exists a challenging yet unexplored issue in current text summarization techniques: how to preserve the integrity, i.e., the accuracy and consistency of summaries during the update process. The issue is critical since online text stream is dynamic and conflicting information could spread during the event period. For example, conflicting numbers of death and injuries might be reported after an earthquake. Such misleading information should not appear in the earthquake summary at any timestamp. In this article, we present a novel realtime event summarization framework called IAEA (i.e., Integrity-Aware Extractive-Abstractive realtime event summarization). Our key idea is to integrate an inconsistency detection module into a unified extractive–abstractive framework. In each update, important new tweets are first extracted in an extractive module, and the extraction is refined by explicitly detecting inconsistency between new tweets and previous summaries. The extractive module is able to capture the sentence-level attention which is later used by an abstractive module to obtain the word-level attention. Finally, the word-level attention is leveraged to rephrase words. We conduct comprehensive experiments on real-world datasets. To reduce efforts required for building sufficient training data, we also provide automatic labeling steps of which the effectiveness has been empirically verified. Through experiments, we demonstrate that IAEA can generate better summaries with consistent information than state-of-the-art approaches.

Download Full-text

A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/582 ◽

2017 ◽

Author(s):

Liangchen Wei ◽

Zhi-Hong Deng

Keyword(s):

Language Learning ◽

Latent Variable ◽

Training Data ◽

Variational Model ◽

Word Embeddings ◽

Parallel Corpora ◽

Word Level ◽

Sentence Level ◽

Cross Lingual ◽

Traditional Approaches

Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.

Download Full-text

SonicASL

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3463519 ◽

2021 ◽

Vol 5 (2) ◽

pp. 1-30

Author(s):

Yincheng Jin ◽

Yang Gao ◽

Yanjun Zhu ◽

Wei Wang ◽

Jiyang Li ◽

...

Keyword(s):

Environmental Factors ◽

Sign Language ◽

Gesture Recognition ◽

Real World ◽

User Study ◽

High Reliability ◽

Recognition Performance ◽

Recognition System ◽

Word Level ◽

Sentence Level

We propose SonicASL, a real-time gesture recognition system that can recognize sign language gestures on the fly, leveraging front-facing microphones and speakers added to commodity earphones worn by someone facing the person making the gestures. In a user study (N=8), we evaluate the recognition performance of various sign language gestures at both the word and sentence levels. Given 42 frequently used individual words and 30 meaningful sentences, SonicASL can achieve an accuracy of 93.8% and 90.6% for word-level and sentence-level recognition, respectively. The proposed system is tested in two real-world scenarios: indoor (apartment, office, and corridor) and outdoor (sidewalk) environments with pedestrians walking nearby. The results show that our system can provide users with an effective gesture recognition tool with high reliability against environmental factors such as ambient noises and nearby pedestrians.

Download Full-text

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013723 ◽

2019 ◽

Vol 33 ◽

pp. 3723-3730 ◽

Cited By ~ 5

Author(s):

Junliang Guo ◽

Xu Tan ◽

Di He ◽

Tao Qin ◽

Linli Xu ◽

...

Keyword(s):

Machine Translation ◽

Experimental Results ◽

Word Embeddings ◽

Model Accuracy ◽

Neural Machine Translation ◽

Word Level ◽

Sentence Level ◽

The Cost ◽

Target Side

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task.

Download Full-text

IMU2Doppler

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3494994 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-20

Author(s):

Sejal Bhalla ◽

Mayank Goel ◽

Rushil Khurana

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Domain Adaptation ◽

Recognition System ◽

Training Data ◽

Machine Learning Techniques ◽

Future Research ◽

Doppler Data ◽

The Cost ◽

New Sensors

The proliferation of sensors powered by state-of-the-art machine learning techniques can now infer context, recognize activities and enable interactions. A key component required to build these automated sensing systems is labeled training data. However, the cost of collecting and labeling new data impedes our ability to deploy new sensors to recognize human activities. We tackle this challenge using domain adaptation i.e., using existing labeled data in a different domain to aid the training of a machine learning model for a new sensor. In this paper, we use off-the-shelf smartwatch IMU datasets to train an activity recognition system for mmWave radar sensor with minimally labeled data. We demonstrate that despite the lack of extensive datasets for mmWave radar, we are able to use our domain adaptation approach to build an activity recognition system that classifies between 10 activities with an accuracy of 70% with only 15 seconds of labeled doppler data. We also present results for a range of available labeled data (10 - 30 seconds) and show that our approach outperforms the baseline in every single scenario. We take our approach a step further and show that multiple IMU datasets can be combined together to act as a single source for our domain adaptation approach. Lastly, we discuss the limitations of our work and how it can impact future research directions.

Download Full-text

LBPV for Recognition of Sign Language at Sentence Level: An Approach Based on Symbolic Representation

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0003 ◽

2017 ◽

Vol 26 (2) ◽

pp. 371-385 ◽

Cited By ~ 3

Author(s):

H.S. Nagendraswamy ◽

B.M. Chethana Kumara

Keyword(s):

Sign Language ◽

Recognition Performance ◽

Recognition System ◽

Deaf People ◽

Language Recognition ◽

Sign Language Recognition ◽

Symbolic Data ◽

Word Level ◽

Sentence Level ◽

Indian Sign Language

AbstractRecognition of signs made by deaf people to produce equivalent textual description for normal people to communicate with deaf people is an essential and challenging task for the pattern recognition and image processing research community. Many researchers have made an attempt to standardize and to propose a sign language recognition system. To the best our knowledge, according to the literature survey, most of the work reported has concentrated at the fingerspelling level or at the word level, and less work at the sentence level has been reported. As sign languages are very abstract, fingerspelling or word level interpretation of signs seems to be a tedious and cumbersome task. Although existing research in sign language recognition is active and extensive, it still remains a challenge to achieve accurate recognition and interpretation of signs at the sentence level. In this paper, we made an attempt to address this problem by proposing an approach that exploits the texture description technique and symbolic data analysis concept to characterize and effectively represent a sign, taking into account the intra-class variations due to different signers or the same signers at different instances of time. In order to study the efficacy of the proposed approach, extensive experiments were carried out on a considerably large database of Indian sign language created by us. The experimental results demonstrated that the proposed method has shown good recognition performance in terms of F-measure rates.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text