Improvement of Lecture Speech Recognition by Using Unsupervised Adaptation

The aim of this work is to improve the recognition performance of spontaneous speech. In order to achieve the purpose, the authors of this chapter propose new approaches of unsupervised adaptation for spontaneous speech and evaluate the methods by using diagonal-covariance and full-covariance hidden Markov models. In the adaptation procedure, both methods of language model (LM) adaptation and acoustic model (AM) adaptation are used iteratively. Several combination methods are tested to find the optimal approach. In the LM adaptation, a word trigram model and a part-of-speech (POS) trigram model are combined to build a more task-specific LM. In addition, the authors propose an unsupervised speaker adaptation technique based on adaptation data weighting. The weighting is performed depending on POS class. In Japan, a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)” has been used as the common evaluation database for spontaneous speech and the authors used it for their recognition experiments. From the results, the proposed methods demonstrated a significant advantage in that task.

Download Full-text

External features enriched model for biomedical question answering

BMC Bioinformatics ◽

10.1186/s12859-021-04176-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Gezheng Xu ◽

Wenge Rong ◽

Yanmeng Wang ◽

Yuanxin Ouyang ◽

Zhang Xiong

Keyword(s):

Language Processing ◽

Large Scale ◽

Question Answering ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Small Scale ◽

Original Text ◽

Specific Domain ◽

Part Of Speech

Abstract Background Biomedical question answering (QA) is a sub-task of natural language processing in a specific domain, which aims to answer a question in the biomedical field based on one or more related passages and can provide people with accurate healthcare-related information. Recently, a lot of approaches based on the neural network and large scale pre-trained language model have largely improved its performance. However, considering the lexical characteristics of biomedical corpus and its small scale dataset, there is still much improvement room for biomedical QA tasks. Results Inspired by the importance of syntactic and lexical features in the biomedical corpus, we proposed a new framework to extract external features, such as part-of-speech and named-entity recognition, and fused them with the original text representation encoded by pre-trained language model, to enhance the biomedical question answering performance. Our model achieves an overall improvement of all three metrics on BioASQ 6b, 7b, and 8b factoid question answering tasks. Conclusions The experiments on BioASQ question answering dataset demonstrated the effectiveness of our external feature-enriched framework. It is proven by the experiments conducted that external lexical and syntactic features can improve Pre-trained Language Model’s performance in biomedical domain question answering task.

Download Full-text

Part-of-Speech Tagging

10.1093/oxfordhb/9780199276349.013.0011 ◽

2012 ◽

Author(s):

Atro Voutilainen

Keyword(s):

Markov Models ◽

Language Model ◽

Language Models ◽

Symbolic Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Text Corpora ◽

History Of ◽

General Architecture ◽

Speech Tagging

This article outlines the recently used methods for designing part-of-speech taggers; computer programs for assigning contextually appropriate grammatical descriptors to words in texts. It begins with the description of general architecture and task setting. It gives an overview of the history of tagging and describes the central approaches to tagging. These approaches are: taggers based on handwritten local rules, taggers based on n-grams automatically derived from text corpora, taggers based on hidden Markov models, taggers using automatically generated symbolic language models derived using methods from machine tagging, taggers based on handwritten global rules, and hybrid taggers, which combine the advantages of handwritten and automatically generated taggers. This article focuses on handwritten tagging rules. Well-tagged training corpora are a valuable resource for testing and improving language model. The text corpus reminds the grammarian about any oversight while designing a rule.

Download Full-text

HMM-Based Vietnamese Speech Synthesis

International Journal of Software Innovation ◽

10.4018/ijsi.2015100103 ◽

2015 ◽

Vol 3 (4) ◽

pp. 33-47

Author(s):

Son Trinh ◽

Kiem Hoang

Keyword(s):

Speech Synthesis ◽

Markov Models ◽

Synthesis Method ◽

High Quality ◽

Part Of Speech ◽

Speech Database ◽

Key Points ◽

Final Speech ◽

Model Approach

In this paper, improving naturalness HMM-based speech synthesis for Vietnamese language is described. By this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov models. A final speech waveform is synthesized from those speech parameters. The main objective for the development is to achieve maximum naturalness in output speech through key points. Firstly, system uses a high quality recorded Vietnamese speech database appropriate for training, especially in statistical parametric model approach. Secondly, prosodic informations such as tone, POS (part of speech) and features based on characteristics of Vietnamese language are added to ensure the quality of synthetic speech. Third, system uses STRAIGHT which showed its ability to produce high-quality voice manipulation and was successfully incorporated into HMM-based speech synthesis. The results collected show that the speech produced by our system has the best result when being compared with the other Vietnamese TTS systems trained from the same speech data.

Download Full-text

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

Part-of-speech tagging for web search queries using a large-scale web corpus

Proceedings of the Symposium on Applied Computing - SAC '17 ◽

10.1145/3019612.3019694 ◽

2017 ◽

Cited By ~ 1

Author(s):

Atsushi Keyaki ◽

Jun Miyazaki

Keyword(s):

Large Scale ◽

Web Search ◽

Search Queries ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Optimal Approach for Wind Resource Assessment Using Kolmogorov–Smirnov Statistic: A Case Study for Large-Scale Wind Farm in Pakistan

Renewable Energy ◽

10.1016/j.renene.2021.01.008 ◽

2021 ◽

Author(s):

Muhammad Abid Saeed ◽

Zahoor Ahmed ◽

Weidong Zhang

Keyword(s):

Large Scale ◽

Wind Farm ◽

Resource Assessment ◽

Wind Resource Assessment ◽

Kolmogorov Smirnov ◽

Optimal Approach ◽

Wind Resource ◽

Smirnov Statistic

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text

Typology and nuance: relativization

Revista da ABRALIN ◽

10.25189/rabralin.v19i3.1762 ◽

2020 ◽

pp. 104-130

Author(s):

Marianne Mithun

Keyword(s):

North American ◽

Large Scale ◽

Relative Clauses ◽

Spontaneous Speech ◽

The Other ◽

Stages Of Development ◽

Linguistic Typology ◽

The North

Much of linguistic typology is inherently categorical. In large-scale typological surveys, grammatical constructions, distinctions, and even variables are typically classified as present, absent, or embodying one of a set of specified options. This work is valuable for a multitude of purposes, and in many cases such categorization is sufficient. In others, we can advance our understanding further if we take a more nuanced approach, considering the extent to which a particular construction, distinction, or variable is installed in the grammar. An important tool for this approach is the examination of unscripted speech in context, complete with prosody. This point is illustrated here with Mohawk, an Iroquoian language indigenous to the North American Northeast. As will be seen, the two types of construction which might be identified as relative clauses are emergent, one less integrated into the grammar than the other. Examination of spontaneous speech indicates that the earliest stages of development are prosodic, as speakers shape their messages according to their communicative purposes at each moment.

Download Full-text

Modulation of A2aR Oligomerisation by Conformational State and PIP2 Interactions Revealed by MD Simulations and Markov Models

10.1101/2020.06.24.168260 ◽

2020 ◽

Cited By ~ 1

Author(s):

Wanling Song ◽

Anna L. Duncan ◽

Mark S.P. Sansom

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Markov Models ◽

Transmembrane Helix ◽

Receptor Activation ◽

Membrane Lipid ◽

Intracellular Loop ◽

Gpcr Activation ◽

State Models

AbstractG protein-coupled receptors (GPCRs) play key roles in cellular signalling. GPCRs are suggested to form dimers and higher order oligomers in response to activation. However, we do not fully understand GPCR activation at larger scales and in an in vivo context. We have characterised oligomeric configurations of the adenosine 2a receptor (A2aR) by combining large-scale molecular dynamics simulations with Markov state models. Receptor activation results in enhanced oligomerisation, more diverse oligomer populations, and a more connected oligomerisation network. The active state conformation of the A2aR shifts protein-protein association interfaces to those involving intracellular loop ICL3 and transmembrane helix TM6. Binding of PIP2 to A2aR stabilises protein-protein interactions via PIP2-mediated association interfaces. These results indicate that A2aR oligomerisation is responsive to the local membrane lipid environment. This in turn suggests a modulatory effect on A2aR whereby a given oligomerisation profile favours the dynamic formation of specific supra-molecular signalling complexes.

Download Full-text

High performance segmentation of spontaneous speech using part of speech and trigger word information

10.3115/974557.974560 ◽

1997 ◽

Cited By ~ 5

Author(s):

Marsal Gavaldà ◽

Klaus Zechner ◽

Gregory Aist

Keyword(s):

High Performance ◽

Spontaneous Speech ◽

Part Of Speech

Download Full-text