Construction of spoken language model including fillers using filler prediction model

This article reflects on the author’s experience supervising a public school program for students who are deaf or hard-of-hearing, specifically addressing national, regional, and local trends affecting it. These trends included teacher efficacy, changes in educational service delivery, advances in technology, the selection of the listening and spoken language model, the needs of university teacher education programs, and telepractice. Furthermore, the author describes how the program responded to these trends, which ultimately resulted in positive educational outcomes for the students being served.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

Syllable and language model based features for detecting non-scorable tests in spoken language proficiency assessment applications

10.3115/v1/w14-1811 ◽

2014 ◽

Author(s):

Angeliki Metallinou ◽

Jian Cheng

Keyword(s):

Language Proficiency ◽

Language Model ◽

Spoken Language ◽

Model Based ◽

Proficiency Assessment

Download Full-text

Multi-Class Composite N-gram language model for spoken language processing using multiple word clusters

10.3115/1073012.1073080 ◽

2001 ◽

Cited By ~ 3

Author(s):

Hirofumi Yamamoto ◽

Shuntaro Isogai ◽

Yoshinori Sagisaka

Keyword(s):

Language Processing ◽

Language Model ◽

Spoken Language ◽

Spoken Language Processing ◽

N Gram

Download Full-text

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414922 ◽

2021 ◽

Author(s):

Cheng-I Lai ◽

Yung-Sung Chuang ◽

Hung-Yi Lee ◽

Shang-Wen Li ◽

James Glass

Keyword(s):

Language Model ◽

Spoken Language ◽

Speech And Language ◽

Language Understanding ◽

Spoken Language Understanding

Download Full-text

Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014959 ◽

2019 ◽

Vol 33 ◽

pp. 4959-4966 ◽

Cited By ~ 8

Author(s):

Aditya Siddhant ◽

Anuj Goyal ◽

Angeliki Metallinou

Keyword(s):

Intelligent Agents ◽

Language Model ◽

Model Performance ◽

User Interaction ◽

Spoken Language ◽

Training Method ◽

Language Understanding ◽

Spoken Language Understanding ◽

Improve Model ◽

Improve Model Performance

User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.

Download Full-text

Widening the lens: what the manual modality reveals about language, learning and cognition

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0295 ◽

2014 ◽

Vol 369 (1651) ◽

pp. 20130295 ◽

Cited By ~ 28

Author(s):

Susan Goldin-Meadow

Keyword(s):

Language Learning ◽

Language Model ◽

Spoken Language ◽

Deaf Children ◽

Hearing Parents ◽

Sign Systems ◽

Learning Contexts ◽

Representational Format ◽

Learning And Cognition ◽

Hearing Children

The goal of this paper is to widen the lens on language to include the manual modality. We look first at hearing children who are acquiring language from a spoken language model and find that even before they use speech to communicate, they use gesture. Moreover, those gestures precede, and predict, the acquisition of structures in speech. We look next at deaf children whose hearing losses prevent them from using the oral modality, and whose hearing parents have not presented them with a language model in the manual modality. These children fall back on the manual modality to communicate and use gestures, which take on many of the forms and functions of natural language. These homemade gesture systems constitute the first step in the emergence of manual sign systems that are shared within deaf communities and are full-fledged languages. We end by widening the lens on sign language to include gesture and find that signers not only gesture, but they also use gesture in learning contexts just as speakers do. These findings suggest that what is key in gesture's ability to predict learning is its ability to add a second representational format to communication, rather than a second modality. Gesture can thus be language, assuming linguistic forms and functions, when other vehicles are not available; but when speech or sign is possible, gesture works along with language, providing an additional representational format that can promote learning.

Download Full-text

Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition

International Journal of Image and Graphics ◽

10.1142/s0219467820500291 ◽

2020 ◽

Vol 20 (04) ◽

pp. 2050029

Author(s):

Aparna Brahme ◽

Umesh Bhadade

Keyword(s):

Speech Recognition ◽

Word Recognition ◽

Language Model ◽

Recognition Rate ◽

Spoken Language ◽

Language Identification ◽

Visual Speech ◽

Language Recognition ◽

Language Discrimination ◽

Visual Speech Recognition

In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a “Visual Word” as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1[Formula: see text]s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.

Download Full-text

A Novel Sentence Completion System for Punjabi using Deep Neural Networks

International Journal of Software Innovation ◽

10.4018/ijsi.293271 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Search Algorithm ◽

Language Model ◽

Qualitative Evaluation ◽

Spoken Language ◽

Cognitive Effort ◽

Sentence Completion ◽

Network Language ◽

Develop State

Sentence completion systems are actively studied by many researchers which ultimately results in the reduction of cognitive effort and enhancement in user-experience. The review of the literature reveals that most of the work in the said area is in English and limited effort spent on other languages, especially vernacular languages. This work aims to develop state-of-the-art sentence completion system for the Punjabi language, which is the 10th most spoken language in the world. The presented work is an outcome of the results of the experimentation on various neural network language model combinations. A new Sentence Search Algorithm (SSA) and patching system are developed to search, complete and rank the completed sub-string and give a syntactically rich sentence(s). The quantitative and qualitative evaluation metrics were utilized to evaluate the system. The results are quite promising, and the best performing model is capable of completing a given sub-string with more acceptability. Best performing model is utilized for developing the user-interface.

Download Full-text