An empirical study of statistical language models: n-gram language models vs. neural network language models

Author(s):  
Freha Mezzoudj ◽  
Abdelkader Benyettou
2016 ◽  
Vol 4 ◽  
pp. 329-342 ◽  
Author(s):  
Joris Pelemans ◽  
Noam Shazeer ◽  
Ciprian Chelba

We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus. Results show that SNM language models trained with n-gram features are a close match for the well-established Kneser-Ney models. The addition of skip-gram features yields a model that is in the same league as the state-of-the-art recurrent neural network language models, as well as complementary: combining the two modeling techniques yields the best known result on the One Billion Word Benchmark. On the Gigaword corpus further improvements are observed using features that cross sentence boundaries. The computational advantages of SNM estimation over both maximum entropy and neural network estimation are probably its main strength, promising an approach that has large flexibility in combining arbitrary features and yet scales gracefully to large amounts of data.


2018 ◽  
Vol 28 (09) ◽  
pp. 1850007
Author(s):  
Francisco Zamora-Martinez ◽  
Maria Jose Castro-Bleda

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.


2019 ◽  
Vol E102.D (3) ◽  
pp. 598-608 ◽  
Author(s):  
Michael HENTSCHEL ◽  
Marc DELCROIX ◽  
Atsunori OGAWA ◽  
Tomoharu IWATA ◽  
Tomohiro NAKATANI

Sign in / Sign up

Export Citation Format

Share Document