statistical language modeling Latest Research Papers

We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we suppose that the semantic properties of texts in a natural language could be approximately captured by a recently introduced concept of a perigraphic process. Perigraphic processes are a class of stochastic processes that satisfy a Zipf-law accumulation of a subset of factual knowledge, which is time-independent, compressed, and effectively inferrable from the process. We show that the classes of finite-state processes and of perigraphic processes are disjoint, and we present a new simple example of perigraphic processes over a finite alphabet called Oracle processes. The disjointness result makes use of the Hilberg condition, i.e., the almost sure power-law growth of algorithmic mutual information. Using a strongly consistent estimator of the number of hidden states, we show that finite-state processes do not satisfy the Hilberg condition whereas Oracle processes satisfy the Hilberg condition via the data-processing inequality. We discuss the relevance of these mathematical results for theoretical and computational linguistics.

Download Full-text

ESLMT: a new clustering method for biomedical document retrieval

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2018-0068 ◽

2019 ◽

Vol 64 (6) ◽

pp. 729-741 ◽

Cited By ~ 1

Author(s):

MohammadReza Keyvanpour ◽

Fatemeh Serpush

Keyword(s):

Document Retrieval ◽

Information Organization ◽

Data Set ◽

Statistical Language Modeling ◽

Retrieval Systems ◽

Efficient Performance ◽

Similarity Query ◽

Information Retrieval Systems ◽

Process Studies ◽

The Mean

Abstract MEDLINE is a rapidly growing database; to utilize this resource, practitioners and biomedical researchers have dealt with tedious and time-consuming tasks such as discovering, searching, reading and evaluating of biomedical documents. However, making a label for a group of biomedical documents is expensive and needs a complicated operation. Otherwise, compound words, polysemous and synonymous problems can influence the search in MEDLINE. Therefore, designing an efficient way of sharing knowledge and information organization is essential so that information retrieval systems can provide ideal outcomes. For this purpose, different strategies are used in the retrieval of biomedical documents (RBD). However, still a number of unrelated results for the users’ query are obtained in the RBD process. Studies have shown that well-defined clusters in the retrieval system exhibit a more efficient performance in contrast to the document-based retrieval. Accordingly, the present study proposes the Expanding Statistical Language Modeling and Thesaurus (ESLMT) for clustering and retrieving biomedical documents. The results showed that Clustering with ESLM Similarity and Thesaurus (CESLMST) in all those criteria in this study have a higher value than the other compared methods. The results indicated that the mean average precision (MAP) has improved in the Clusters’ Retrieval Derived from ESLM Similarity-Query (CRDESLMS-QET) method in comparison to the previous methods with the Text REtrieval Conference (TREC) data set.

Download Full-text

Progress in Neural Network Based Statistical Language Modeling

Deep Learning: Concepts and Architectures - Studies in Computational Intelligence ◽

10.1007/978-3-030-31756-0_11 ◽

2019 ◽

pp. 321-339

Author(s):

Anup Shrikant Kunte ◽

Vahida Z. Attar

Keyword(s):

Neural Network ◽

Language Modeling ◽

Statistical Language Modeling

Download Full-text

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

Computer and Information Science ◽

10.5539/cis.v11n4p77 ◽

2018 ◽

Vol 11 (4) ◽

pp. 77 ◽

Cited By ~ 2

Author(s):

Malek Mouhoub ◽

Mustakim Al Helal

Keyword(s):

Topic Modeling ◽

Text Categorization ◽

English Language ◽

Latent Dirichlet Allocation ◽

Similarity Measures ◽

Document Collections ◽

Statistical Language Modeling ◽

Document Models ◽

Wide Range ◽

News Corpus

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.

Download Full-text

11 Statistical Language Modeling

Speech Processing ◽

10.1201/9781482276237-74 ◽

2018 ◽

pp. 530-534

Keyword(s):

Language Modeling ◽

Statistical Language Modeling

Download Full-text

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos

Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops '17 ◽

10.1145/3126686.3126755 ◽

2017 ◽

Cited By ~ 1

Author(s):

Mengxi Lin ◽

Nakamasa Inoue ◽

Koichi Shinoda

Keyword(s):

Language Modeling ◽

Action Sequence ◽

Statistical Language Modeling ◽

Sequence Recognition

Download Full-text

A survey on the application of recurrent neural networks to statistical language modeling

Computer Speech & Language ◽

10.1016/j.csl.2014.09.005 ◽

2015 ◽

Vol 30 (1) ◽

pp. 61-98 ◽

Cited By ~ 68

Author(s):

Wim De Mulder ◽

Steven Bethard ◽

Marie-Francine Moens

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Language Modeling ◽

Statistical Language Modeling

Download Full-text

Multiple Adjunction in Feature-Based Tree-Adjoining Grammar

Computational Linguistics ◽

10.1162/coli_a_00217 ◽

2015 ◽

Vol 41 (1) ◽

pp. 41-70 ◽

Cited By ~ 1

Author(s):

Claire Gardent ◽

Shashi Narayan

Keyword(s):

Recognition Algorithm ◽

Semantic Interpretation ◽

Linear Ordering ◽

Syntactic Analysis ◽

Statistical Language Modeling ◽

Tree Adjoining Grammar ◽

Feature Based ◽

Semantic Dependencies ◽

Parsing Algorithm ◽

The One

In parsing with Tree Adjoining Grammar (TAG), independent derivations have been shown by Schabes and Shieber (1994) to be essential for correctly supporting syntactic analysis, semantic interpretation, and statistical language modeling. However, the parsing algorithm they propose is not directly applicable to Feature-Based TAGs (FB-TAG). We provide a recognition algorithm for FB-TAG that supports both dependent and independent derivations. The resulting algorithm combines the benefits of independent derivations with those of Feature-Based grammars. In particular, we show that it accounts for a range of interactions between dependent vs. independent derivation on the one hand, and syntactic constraints, linear ordering, and scopal vs. nonscopal semantic dependencies on the other hand.

Download Full-text

Sentence Validation by Statistical Language Modeling and Semantic Relations

International Journal of Computer Applications Technology and Research ◽

10.7753/ijcatr0312.1012 ◽

2014 ◽

Vol 3 (12) ◽

pp. 812-814

Author(s):

Lakshay Arya

Keyword(s):

Language Modeling ◽

Semantic Relations ◽

Statistical Language Modeling

Download Full-text

One billion word benchmark for measuring progress in statistical language modeling

10.21437/interspeech.2014-564 ◽

2014 ◽

Cited By ~ 12

Author(s):

Ciprian Chelba ◽

Tomas Mikolov ◽

Mike Schuster ◽

Qi Ge ◽

Thorsten Brants ◽

...

Keyword(s):

Language Modeling ◽

Statistical Language Modeling

Download Full-text

statistical language modeling
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Refutation of Finite-State Language Models through Zipf’s Law for Factual Knowledge

ESLMT: a new clustering method for biomedical document retrieval

Progress in Neural Network Based Statistical Language Modeling

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

11 Statistical Language Modeling

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos

A survey on the application of recurrent neural networks to statistical language modeling

Multiple Adjunction in Feature-Based Tree-Adjoining Grammar

Sentence Validation by Statistical Language Modeling and Semantic Relations

One billion word benchmark for measuring progress in statistical language modeling

Export Citation Format

statistical language modelingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Refutation of Finite-State Language Models through Zipf’s Law for Factual Knowledge

ESLMT: a new clustering method for biomedical document retrieval

Progress in Neural Network Based Statistical Language Modeling

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

11 Statistical Language Modeling

CTC Network with Statistical Language Modeling for Action Sequence Recognition in Videos

A survey on the application of recurrent neural networks to statistical language modeling

Multiple Adjunction in Feature-Based Tree-Adjoining Grammar

Sentence Validation by Statistical Language Modeling and Semantic Relations

One billion word benchmark for measuring progress in statistical language modeling

statistical language modeling
Recently Published Documents