G2Basy: A framework to improve the RNN language model and ease overfitting problem

Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework—G2Basy—to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of “artificial features” to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora—the Penn Treebank and WikiText-2 datasets—and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs.

Download Full-text

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain

Applied Sciences ◽

10.3390/app11136007 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6007

Author(s):

Muzamil Hussain Syed ◽

Sun-Tae Chung

Keyword(s):

Domain Adaptation ◽

Language Model ◽

Named Entity Recognition ◽

Word Embedding ◽

Fine Tuning ◽

Entity Recognition ◽

Language Models ◽

Feature Vectors ◽

Named Entity ◽

Domain Specific

Entity-based information extraction is one of the main applications of Natural Language Processing (NLP). Recently, deep transfer-learning utilizing contextualized word embedding from pre-trained language models has shown remarkable results for many NLP tasks, including Named-entity recognition (NER). BERT (Bidirectional Encoder Representations from Transformers) is gaining prominent attention among various contextualized word embedding models as a state-of-the-art pre-trained language model. It is quite expensive to train a BERT model from scratch for a new application domain since it needs a huge dataset and enormous computing time. In this paper, we focus on menu entity extraction from online user reviews for the restaurant and propose a simple but effective approach for NER task on a new domain where a large dataset is rarely available or difficult to prepare, such as food menu domain, based on domain adaptation technique for word embedding and fine-tuning the popular NER task network model ‘Bi-LSTM+CRF’ with extended feature vectors. The proposed NER approach (named as ‘MenuNER’) consists of two step-processes: (1) Domain adaptation for target domain; further pre-training of the off-the-shelf BERT language model (BERT-base) in semi-supervised fashion on a domain-specific dataset, and (2) Supervised fine-tuning the popular Bi-LSTM+CRF network for downstream task with extended feature vectors obtained by concatenating word embedding from the domain-adapted pre-trained BERT model from the first step, character embedding and POS tag feature information. Experimental results on handcrafted food menu corpus from customers’ review dataset show that our proposed approach for domain-specific NER task, that is: food menu named-entity recognition, performs significantly better than the one based on the baseline off-the-shelf BERT-base model. The proposed approach achieves 92.5% F1 score on the YELP dataset for the MenuNER task.

Download Full-text

Character n-Gram Embeddings to Improve RNN Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015074 ◽

2019 ◽

Vol 33 ◽

pp. 5074-5082 ◽

Cited By ~ 2

Author(s):

Sho Takase ◽

Jun Suzuki ◽

Masaaki Nagata

Keyword(s):

Neural Network ◽

Machine Translation ◽

Recurrent Neural Network ◽

Language Model ◽

Language Modeling ◽

Word Embedding ◽

Experimental Results ◽

Language Models ◽

Word Embeddings ◽

N Gram

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our proposed method constructs word embeddings from character ngram embeddings and combines them with ordinary word embeddings. We demonstrate that the proposed method achieves the best perplexities on the language modeling datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct experiments on application tasks: machine translation and headline generation. The experimental results indicate that our proposed method also positively affects these tasks

Download Full-text

Text Genre Detection Using Doc2Vec Word-embedding Language Model

Language and Information ◽

10.29403/li.23.2.2 ◽

2019 ◽

Vol 23 (2) ◽

pp. 23-43

Author(s):

Dongsung Kim

Keyword(s):

Language Model ◽

Word Embedding ◽

Text Genre

Download Full-text

Incorporating LDA With Word Embedding for Web Service Clustering

International Journal of Web Services Research ◽

10.4018/ijwsr.2018100102 ◽

2018 ◽

Vol 15 (4) ◽

pp. 29-44 ◽

Cited By ~ 4

Author(s):

Yi Zhao ◽

Chong Wang ◽

Jian Wang ◽

Keqing He

Keyword(s):

Web Service ◽

Service Discovery ◽

Word Embedding ◽

The Internet ◽

Word Embeddings ◽

Training Process ◽

Web Service Discovery ◽

Processing Data ◽

Clustering Approach ◽

Service Clustering

With the rapid growth of web services on the internet, web service discovery has become a hot topic in services computing. Faced with the heterogeneous and unstructured service descriptions, many service clustering approaches have been proposed to promote web service discovery, and many other approaches leveraged auxiliary features to enhance the classical LDA model to achieve better clustering performance. However, these extended LDA approaches still have limitations in processing data sparsity and noise words. This article proposes a novel web service clustering approach by incorporating LDA with word embedding, which leverages relevant words obtained based on word embedding to improve the performance of web service clustering. Especially, the semantically relevant words of service keywords by Word2vec were used to train the word embeddings and then incorporated into the LDA training process. Finally, experiments conducted on a real-world dataset published on ProgrammableWeb show that the authors' proposed approach can achieve better clustering performance than several classical approaches.

Download Full-text

INTEGRATION OF n-GRAM LANGUAGE MODELS IN MULTIPLE CLASSIFIER SYSTEMS FOR OFFLINE HANDWRITTEN TEXT LINE RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001408006855 ◽

2008 ◽

Vol 22 (07) ◽

pp. 1301-1321 ◽

Cited By ~ 2

Author(s):

ROMAN BERTOLAMI ◽

HORST BUNKE

Keyword(s):

Language Model ◽

Language Models ◽

Combination Method ◽

Text Line ◽

Multiple Classifier Systems ◽

Classifier Systems ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Multiple Classifier ◽

N Gram

Current multiple classifier systems for unconstrained handwritten text recognition do not provide a straightforward way to utilize language model information. In this paper, we describe a generic method to integrate a statistical n-gram language model into the combination of multiple offline handwritten text line recognizers. The proposed method first builds a word transition network and then rescores this network with an n-gram language model. Experimental evaluation conducted on a large dataset of offline handwritten text lines shows that the proposed approach improves the recognition accuracy over a reference system as well as over the original combination method that does not include a language model.

Download Full-text

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

AI ◽

10.3390/ai2010001 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-16

Author(s):

Juan Cruz-Benito ◽

Sanjay Vishwakarma ◽

Francisco Martin-Fernandez ◽

Ismael Faro

Keyword(s):

Deep Learning ◽

Learning Community ◽

Programming Languages ◽

Language Processing ◽

Code Generation ◽

Language Model ◽

Language Models ◽

Stochastic Gradient Descent ◽

Network Architectures ◽

Learning Architectures

In recent years, the use of deep learning in language models has gained much attention. Some research projects claim that they can generate text that can be interpreted as human writing, enabling new possibilities in many application areas. Among the different areas related to language processing, one of the most notable in applying this type of modeling is programming languages. For years, the machine learning community has been researching this software engineering area, pursuing goals like applying different approaches to auto-complete, generate, fix, or evaluate code programmed by humans. Considering the increasing popularity of the deep learning-enabled language models approach, we found a lack of empirical papers that compare different deep learning architectures to create and use language models based on programming code. This paper compares different neural network architectures like Average Stochastic Gradient Descent (ASGD) Weight-Dropped LSTMs (AWD-LSTMs), AWD-Quasi-Recurrent Neural Networks (QRNNs), and Transformer while using transfer learning and different forms of tokenization to see how they behave in building language models using a Python dataset for code generation and filling mask tasks. Considering the results, we discuss each approach’s different strengths and weaknesses and what gaps we found to evaluate the language models or to apply them in a real programming context.

Download Full-text

HATIMU AISYAH KARYA ZURINAH HASSAN MENERUSI PERSPEKTIF ELAINE SHOWALTER MODEL BAHASA

International Journal of Creative Future and Heritage (TENIAT) ◽

10.47252/teniat.v8i2.296 ◽

2020 ◽

Vol 8 (2) ◽

pp. 54-62

Author(s):

NUR ZALIKHA MAT RADZI ◽

NASIRIN ABDILLAH ◽

DAENG HALIZA DAENG JAMAL

Keyword(s):

Language Model ◽

Southeast Asian ◽

Language Models ◽

Symbolic Language ◽

Female Characters ◽

Literary Works ◽

The Past ◽

Historical Practices

Hatimu Aisyah karya Sasterawan Negara ke-13 iaitu - Zurinah Hassan, yang juga penerima Anugerah Hadiah Penulis Asia Tenggara (SEA Write Award) pada tahun 2004. Rentetan kejayaan beliau, telah menjadi tumpuan para pengkaji untuk meneliti aspek mengenai pengarangan wanita. Hatimu Aisyah merupakan novel pertama dihasilkan oleh Zurinah Hassan yang menekankan mengenai amalan adat resam zaman terdahulu sehingga ditelan arus pemodenan zaman. Novel Hatimu Aisyah mengetengahkan gambaran wanita yang mengutamakan adat dalam konteks perjalanan hidup bermasyarakat. Kajian terhadap karya Zurinah Hassan ini, bersandarkan kepada Model Bahasa Gagasan Elaine Showalter dari perspektif ginokritik untuk melihat watak-watak wanita. Antara Perbincangan dalam kajian ini adalah berfokuskan kepada simbolik bahasa dan bahasa sebagai ekspresi kesedaran wanita. Hasil dapatan keseluruhan kajian menunjukkan bahawa Zurinah Hassan menggunakan bahasa yang bersesuaian dengan gagasan bahasa daripada Elaine Showalter tetapi agak kurang menyerlah. Hal ini disebabkan keterbatasan penggunaan bahasa selaras dengan sosiobudaya masyarakat Melayu. Penemuan kajian ini dalam model bahasa wanita dapat dilihat menerusi simbolik bahasa dan bahasa sebagai ekspresi kesedaran wanita. Hasil manfaat dan kepentingan diperolehi masa hadapan dapat dilihat bahawa golongan wanita menzahirkan protes dan kritikan menerusi corak penulisan karya mereka meskipun masih dalam keadaan terkawal. Hatimu Aisyah the 13th National literary works, namely-Zurinah Hassan, who is also the recipient of the Southeast Asian Writer award (SEA Write Award) in 2004. His success string has been the focus of researchers to examine the aspects of women's writings. Hatimu Aisyah is the first novel to be produced by Zurinah Hassan that emphasizes on the historical practices of the past, having swallowed the current modernization of the day. The Hatimu Aisyah Novel highlights the portrayal of women who are customcentric in the context of the communities life. Studies on Zurinah Hassan's work are based on the language Model of Elaine Showalter from the perspective of Ginokritik to see the female characters. Among the discussions in this study are focused on symbolic language and language as a expression of women's awareness. The overall findings of the study showed that Zurinah Hassan used a language that fits the language idea of Elaine Showalter but was somewhat less striking. This is due to the limitations of usage in line with the Malay social. The findings of this study in female language models can be seen through the symbolic language and language in the expression of women's awareness. The results of the benefits and interests gained future can be seen that women are in their protest and criticism through their work writing patterns despite being controlled.

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.17 ◽

2018 ◽

Vol 7 ◽

Author(s):

Yuta Ojima ◽

Eita Nakamura ◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Latent Variables ◽

Language Model ◽

Language Models ◽

Sequential Dependency ◽

Acoustic Model ◽

Hierarchical Bayesian ◽

Generative Process ◽

Music Transcription ◽

Automatic Music Transcription ◽

Music Audio

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.

Download Full-text

Active Learning to Speed-Up the Training Process for Dialogue Act Labelling

Human Language Technology Challenges for Computer Science and Linguistics - Lecture Notes in Computer Science ◽

10.1007/978-3-319-14120-6_21 ◽

2014 ◽

pp. 253-263

Author(s):

Fabrizio Ghigi ◽

Carlos-D. Martínez-Hinarejos ◽

José-Miguel Benedí

Keyword(s):

Active Learning ◽

Training Process ◽

Speed Up

Download Full-text