scholarly journals Learning Simpler Language Models with the Differential State Framework

2017 ◽  
Vol 29 (12) ◽  
pp. 3327-3352 ◽  
Author(s):  
Alexander G. Ororbia II ◽  
Tomas Mikolov ◽  
David Reitter

Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The differential state framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. Within the DSF framework, a new architecture is presented, the delta-RNN. This model requires hardly any more parameters than a classical, simple recurrent network. In language modeling at the word and character levels, the delta-RNN outperforms popular complex architectures, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the delta-RNN's performance is comparable to that of complex gated architectures.

Author(s):  
Tal Linzen ◽  
Emmanuel Dupoux ◽  
Yoav Goldberg

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.


Author(s):  
Ahmad Ashril Rizal ◽  
Siti Soraya

Tidak tersedianya sumber daya alam seperti migas, hasil hutan ataupun industri manufaktur yang berskala besar di pulau Lombok menyebabkan pariwisata telah menjadi sektor andalan dalam pembangunan daerah. Kontribusi sektor pariwisata menunjukkan trend yang semakin meningkat dari tahun ke tahun. Dampak positif pengeluaran wisatawan terhadap perekonomian terdistribusikan ke berbagai sektor. Akan tetapi, pemerinatah daerah umumnya akan melakukan persiapan wisata daerah hanya pada saat even  lokal saja. Padahal kunjungan wisatawan bukan hanya karena faktor adanya event lokal saja. Persiapan pemerintah daerah dan pelaku wisata sangat penting untuk meningkatkan stabilitas kunjungan wisatawan. Penelitian ini mengkaji prediksi kunjungan wisatawan dengan pendekatan Recurrent Neural Network Long Short Term Memory (RNN LSTM). LSTM berisi informasi di luar aliran normal dari recurrent nertwork dalam gate cell. Cell membuat keputusan tentang apa yang harus disimpan dan kapan mengizinkan pembacaan, penulisan dan penghapusan, melalui gate yang terbuka dan tertutup. Gate menyampaikan informasi berdasarkan kekuatan yang masuk ke dalamnya dan akan difilter menjadi bobot dari gate itu sendiri. Bobot tersebut sama seperti bobot input dan hidden unit yang disesuaikan melalui proses leraning pada recurrent network. Hasil penelitian yang dilakukan dengan membangun model prediksi kunjungan wisatawan dengan RNN LSTM menggunakan multi time steps mendapatkan hasil RMSE sebesar 6888.37 pada data training dan 14684.33 pada data testing.


2020 ◽  
Vol 34 (04) ◽  
pp. 4989-4996
Author(s):  
Ekaterina Lobacheva ◽  
Nadezhda Chirkova ◽  
Alexander Markovich ◽  
Dmitry Vetrov

One of the most popular approaches for neural network compression is sparsification — learning sparse weight matrices. In structured sparsification, weights are set to zero by groups corresponding to structure units, e. g. neurons. We further develop the structured sparsification approach for the gated recurrent neural networks, e. g. Long Short-Term Memory (LSTM). Specifically, in addition to the sparsification of individual weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies an LSTM structure. We test our approach on the text classification and language modeling tasks. Our method improves the neuron-wise compression of the model in most of the tasks. We also observe that the resulting structure of gate sparsity depends on the task and connect the learned structures to the specifics of the particular tasks.


Sign in / Sign up

Export Citation Format

Share Document