statistical language models Latest Research Papers

This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also---and quite crucially---because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.

Download Full-text

Statistical Language Models for Spelling Error Detection with Web Search New Word Acquisition

10.1109/icce-tw52618.2021.9602971 ◽

2021 ◽

Author(s):

Jui-Feng Yeh ◽

Guan-Huei Wu ◽

Song-Yi Wang ◽

Chan-Kun Yeh ◽

Yao-Yi Wang

Keyword(s):

Error Detection ◽

Web Search ◽

Language Models ◽

Spelling Error ◽

Word Acquisition ◽

Statistical Language Models

Download Full-text

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2020 ◽

10.3384/ecp183159 ◽

2021 ◽

Author(s):

Bradley Hauer ◽

Colin Choi ◽

Anirudh S. Sundar ◽

Abram Hindle ◽

Scott Smallwood ◽

...

Keyword(s):

Experimental Analysis ◽

Language Models ◽

Statistical Language Models

Download Full-text

Towards an Embedding-Based Approach for the Geolocation of Texts and Users on Social Networks

Interdisciplinary Approaches to Spatial Optimization Issues - Advances in Geospatial Technologies ◽

10.4018/978-1-7998-1954-7.ch012 ◽

2021 ◽

pp. 206-234

Author(s):

Sarra Hasni

Keyword(s):

Neural Networks ◽

Social Networks ◽

Spatial Analysis ◽

Recurrent Neural Networks ◽

Contextual Information ◽

Language Models ◽

New Approach ◽

Textual Data ◽

Word Forms ◽

Statistical Language Models

The geolocation task of textual data shared on social networks like Twitter attracts a progressive attention. Since those data are supported by advanced geographic information systems for multipurpose spatial analysis, new trends to extend the paradigm of geolocated data become more emergent. Differently from statistical language models that are widely adopted in prior works, the authors propose a new approach that is adopted to the geolocation of both tweets and users through the application of embedding models. The authors boost the geolocation strategy with a sequential modelling using recurrent neural networks to delimit the importance of words in tweets with respect to contextual information. They evaluate the power of this strategy in order to determine locations of unstructured texts that reflect unlimited user's writing styles. Especially, the authors demonstrate that semantic proprieties and word forms can be effective to geolocate texts without specifying local words or topics' descriptions per region.

Download Full-text

CAN STATISTICAL LANGUAGE MODELS BE USED TO DISTINGUISH BETWEEN DIFFERENT GENRES OF NEWS

10.25144/13349 ◽

2020 ◽

Author(s):

S Dreibe ◽

G Hunter

Keyword(s):

Language Models ◽

Statistical Language Models

Download Full-text

An exploratory research on grammar checking of Bangla sentences using statistical language models

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i3.pp3244-3252 ◽

2020 ◽

Vol 10 (3) ◽

pp. 3244

Author(s):

M. D. Riazur Rahman ◽

M. D. Tarek Habib ◽

M. D. Sadekur Rahman ◽

Gazi Zahirul Islam ◽

M. D. Abbas Ali Khan

Keyword(s):

Language Processing ◽

Language Model ◽

Language Models ◽

Exploratory Research ◽

Smoothing Technique ◽

Comparative Performance ◽

Statistical Language Models ◽

Language Modelling ◽

N Gram ◽

Improved Technique

N-gram based language models are very popular and extensively used statistical methods for solving various natural language processing problems including grammar checking. Smoothing is one of the most effective techniques used in building a language model to deal with data sparsity problem. Kneser-Ney is one of the most prominently used and successful smoothing technique for language modelling. In our previous work, we presented a Witten-Bell smoothing based language modelling technique for checking grammatical correctness of Bangla sentences which showed promising results outperforming previous methods. In this work, we proposed an improved method using Kneser-Ney smoothing based n-gram language model for grammar checking and performed a comparative performance analysis between Kneser-Ney and Witten-Bell smoothing techniques for the same purpose. We also provided an improved technique for calculating the optimum threshold which further enhanced the the results. Our experimental results show that, Kneser-Ney outperforms Witten-Bell as a smoothing technique when used with n-gram LMs for checking grammatical correctness of Bangla sentences.

Download Full-text

Statistical language models for query-by-example spoken document retrieval

Multimedia Tools and Applications ◽

10.1007/s11042-019-08522-z ◽

2020 ◽

Vol 79 (11-12) ◽

pp. 7927-7949

Author(s):

Paula Lopez-Otero ◽

Javier Parapar ◽

Alvaro Barreiro

Keyword(s):

Document Retrieval ◽

Language Models ◽

Spoken Document Retrieval ◽

Query By Example ◽

Statistical Language Models

Download Full-text

Creative Culinary Recipe Generation Based on Statistical Language Models

IEEE Access ◽

10.1109/access.2020.3013436 ◽

2020 ◽

Vol 8 ◽

pp. 146263-146283

Author(s):

Willian Antonio dos Santos ◽

Joao Ribeiro Bezerra ◽

Luis Fabricio Wanderley Goes ◽

Flavia Magalhaes Freitas Ferreira

Keyword(s):

Language Models ◽

Statistical Language Models

Download Full-text

IS DESCRIBING LANGUAGE MERE BUTTERFLY COLLECTION? ON EPISTEMOLOGY, STATISTICAL LANGUAGE MODELS, AND CORPUS

ICERI2019 Proceedings ◽

10.21125/iceri.2019.2673 ◽

2019 ◽

Author(s):

Milena Uzeda-Garrão

Keyword(s):

Language Models ◽

Statistical Language Models

Download Full-text

Do speech registers differ in the predictability of words?

International Journal of Corpus Linguistics ◽

10.1075/ijcl.17062.ben ◽

2019 ◽

Vol 24 (1) ◽

pp. 98-130

Author(s):

Martijn Bentum ◽

Louis ten Bosch ◽

Antal van den Bosch ◽

Mirjam Ernestus

Keyword(s):

Language Use ◽

Sentence Length ◽

Language Models ◽

Read Aloud ◽

Score Vector ◽

Statistical Language Models ◽

Word Predictability ◽

Context Of Situation

Abstract Previous research has demonstrated that language use can vary depending on the context of situation. The present paper extends this finding by comparing word predictability differences between 14 speech registers ranging from highly informal conversations to read-aloud books. We trained 14 statistical language models to compute register-specific word predictability and trained a register classifier on the perplexity score vector of the language models. The classifier distinguishes perfectly between samples from all speech registers and this result generalizes to unseen materials. We show that differences in vocabulary and sentence length cannot explain the speech register classifier’s performance. The combined results show that speech registers differ in word predictability.

Download Full-text

statistical language models
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Language Modeling with Reduced Densities

Statistical Language Models for Spelling Error Detection with Web Search New Word Acquisition

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Towards an Embedding-Based Approach for the Geolocation of Texts and Users on Social Networks

CAN STATISTICAL LANGUAGE MODELS BE USED TO DISTINGUISH BETWEEN DIFFERENT GENRES OF NEWS

An exploratory research on grammar checking of Bangla sentences using statistical language models

Statistical language models for query-by-example spoken document retrieval

Creative Culinary Recipe Generation Based on Statistical Language Models

IS DESCRIBING LANGUAGE MERE BUTTERFLY COLLECTION? ON EPISTEMOLOGY, STATISTICAL LANGUAGE MODELS, AND CORPUS

Do speech registers differ in the predictability of words?

Export Citation Format

statistical language modelsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Language Modeling with Reduced Densities

Statistical Language Models for Spelling Error Detection with Web Search New Word Acquisition

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Towards an Embedding-Based Approach for the Geolocation of Texts and Users on Social Networks

CAN STATISTICAL LANGUAGE MODELS BE USED TO DISTINGUISH BETWEEN DIFFERENT GENRES OF NEWS

An exploratory research on grammar checking of Bangla sentences using statistical language models

Statistical language models for query-by-example spoken document retrieval

Creative Culinary Recipe Generation Based on Statistical Language Models

IS DESCRIBING LANGUAGE MERE BUTTERFLY COLLECTION? ON EPISTEMOLOGY, STATISTICAL LANGUAGE MODELS, AND CORPUS

Do speech registers differ in the predictability of words?

statistical language models
Recently Published Documents