Character n-Gram Embeddings to Improve RNN Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015074 ◽

2019 ◽

Vol 33 ◽

pp. 5074-5082 ◽

Cited By ~ 2

Author(s):

Sho Takase ◽

Jun Suzuki ◽

Masaaki Nagata

Keyword(s):

Neural Network ◽

Machine Translation ◽

Recurrent Neural Network ◽

Language Model ◽

Language Modeling ◽

Word Embedding ◽

Experimental Results ◽

Language Models ◽

Word Embeddings ◽

N Gram

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character ngram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks

Download Full-text

The effect of word embeddings and domain specific long-range contextual information on a Recurrent Neural Network Language Model

2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) ◽

10.1109/robomech.2019.8704827 ◽

2019 ◽

Author(s):

Linda Khumalo ◽

Georg I. Schltinz ◽

Quentin Williams

Keyword(s):

Neural Network ◽

Long Range ◽

Recurrent Neural Network ◽

Contextual Information ◽

Language Model ◽

Word Embeddings ◽

Domain Specific ◽

Network Language

Download Full-text

Flick: Japanese Input Method Editor Using N-Gram and Recurrent Neural Network Language Model Based Predictive Text Input

2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) ◽

10.1109/sitis.2017.19 ◽

2017 ◽

Author(s):

Yukino Ikegami ◽

Yoshitaka Sakurai ◽

Ernesto Damiani ◽

Rainer Knauf ◽

Setsuo Tsuruta

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Language Model ◽

Text Input ◽

Input Method ◽

Model Based ◽

N Gram ◽

Network Language

Download Full-text

A Comparison of Phrase Based and Word based Language Model for Punjabi

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0232 ◽

2017 ◽

Vol 7 (7) ◽

pp. 444

Author(s):

Umrinderpal Singh

Keyword(s):

Machine Translation ◽

Great Improvement ◽

Language Model ◽

Language Models ◽

Translation System ◽

Information Base ◽

Model Yield ◽

Machine Translation System ◽

N Gram

A language model provides connection to the decoding process to determine a precise word from several available options in the information base or phrase table. The language model can be generated using n-gram approach. Various language models and smoothing procedures are there to determine this model, like unigram, bigram, trigram, interpolation, backoff language model etc. We have done some experiments with different language models where we have used phrases in place of words as the smallest unit. Experiments have shown that phrase based language model yield more accurate results as compared to simple word based mode. We have also done some experiments with machine translation system where we have used phrase based language model rather than word based model and system yield great improvement.

Download Full-text

Recurrent neural network language model with structured word embeddings for speech recognition

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7179002 ◽

2015 ◽

Author(s):

Tianxing He ◽

Xu Xiang ◽

Yanmin Qian ◽

Kai Yu

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Language Model ◽

Word Embeddings ◽

Network Language

Download Full-text

Providing Morphological Information for SMT Using Neural Networks

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0026 ◽

2017 ◽

Vol 108 (1) ◽

pp. 271-282 ◽

Cited By ~ 1

Author(s):

Peyman Passban ◽

Qun Liu ◽

Andy Way

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Statistical Machine Translation ◽

Language Model ◽

Language Modeling ◽

Word Embeddings ◽

Surface Form ◽

Complex Word ◽

Complex Words

Abstract Treating morphologically complex words (MCWs) as atomic units in translation would not yield a desirable result. Such words are complicated constituents with meaningful subunits. A complex word in a morphologically rich language (MRL) could be associated with a number of words or even a full sentence in a simpler language, which means the surface form of complex words should be accompanied with auxiliary morphological information in order to provide a precise translation and a better alignment. In this paper we follow this idea and propose two different methods to convey such information for statistical machine translation (SMT) models. In the first model we enrich factored SMT engines by introducing a new morphological factor which relies on subword-aware word embeddings. In the second model we focus on the language-modeling component. We explore a subword-level neural language model (NLM) to capture sequence-, word- and subword-level dependencies. Our NLM is able to approximate better scores for conditional word probabilities, so the decoder generates more fluent translations. We studied two languages Farsi and German in our experiments and observed significant improvements for both of them.

Download Full-text

A Recurrent Neural Network Language Model Based on Word Embedding

Web and Big Data - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01298-4_30 ◽

2018 ◽

pp. 368-377 ◽

Cited By ~ 2

Author(s):

Shuaimin Li ◽

Jungang Xu

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Language Model ◽

Word Embedding ◽

Model Based ◽

Network Language

Download Full-text

Sparse Non-negative Matrix Language Modeling

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00102 ◽

2016 ◽

Vol 4 ◽

pp. 329-342 ◽

Cited By ~ 1

Author(s):

Joris Pelemans ◽

Noam Shazeer ◽

Ciprian Chelba

Keyword(s):

Neural Network ◽

State Of The Art ◽

Language Modeling ◽

Language Models ◽

Close Match ◽

Modeling Techniques ◽

Network Estimation ◽

N Gram ◽

Network Language ◽

The One

We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus. Results show that SNM language models trained with n-gram features are a close match for the well-established Kneser-Ney models. The addition of skip-gram features yields a model that is in the same league as the state-of-the-art recurrent neural network language models, as well as complementary: combining the two modeling techniques yields the best known result on the One Billion Word Benchmark. On the Gigaword corpus further improvements are observed using features that cross sentence boundaries. The computational advantages of SNM estimation over both maximum entropy and neural network estimation are probably its main strength, promising an approach that has large flexibility in combining arbitrary features and yet scales gracefully to large amounts of data.

Download Full-text

Backward and trigger-based language models for statistical machine translation

Natural Language Engineering ◽

10.1017/s1351324913000168 ◽

2013 ◽

Vol 21 (2) ◽

pp. 201-226 ◽

Cited By ~ 2

Author(s):

DEYI XIONG ◽

MIN ZHANG

Keyword(s):

Mutual Information ◽

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Language Model ◽

Experimental Results ◽

Language Models ◽

Knowledge Sources ◽

Long Distance ◽

Translation Quality

AbstractThe language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.

Download Full-text

OxLM: A Neural Language Modelling Framework for Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0016 ◽

2014 ◽

Vol 102 (1) ◽

pp. 81-92 ◽

Cited By ~ 1

Author(s):

Baltescu Paul ◽

Blunsom Phil ◽

Hoang Hieu

Keyword(s):

Machine Translation ◽

Language Model ◽

Computational Cost ◽

Training Data ◽

Language Models ◽

Training Algorithm ◽

Beam Search ◽

Modelling Framework ◽

Language Modelling ◽

N Gram

Abstract This paper presents an open source implementation1 of a neural language model for machine translation. Neural language models deal with the problem of data sparsity by learning distributed representations for words in a continuous vector space. The language modelling probabilities are estimated by projecting a word's context in the same space as the word representations and by assigning probabilities proportional to the distance between the words and the context's projection. Neural language models are notoriously slow to train and test. Our framework is designed with scalability in mind and provides two optional techniques for reducing the computational cost: the so-called class decomposition trick and a training algorithm based on noise contrastive estimation. Our models may be extended to incorporate direct n-gram features to learn weights for every n-gram in the training data. Our framework comes with wrappers for the cdec and Moses translation toolkits, allowing our language models to be incorporated as normalized features in their decoders (inside the beam search).

Download Full-text

Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6403 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8766-8774 ◽

Cited By ~ 1

Author(s):

Timo Schick ◽

Hinrich Schütze

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Language Processing ◽

Deep Neural Network ◽

Language Model ◽

Language Modeling ◽

Fine Tuning ◽

Language Models ◽

Network Architectures ◽

Semantic Properties

Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does substantially improve its understanding of rare words.

Download Full-text