Improvement of Lecture Speech Recognition by Using Unsupervised Adaptation

Author(s):  
Tetsuo Kosaka ◽  
Takashi Kusama ◽  
Masaharu Kato ◽  
Masaki Kohda

The aim of this work is to improve the recognition performance of spontaneous speech. In order to achieve the purpose, the authors of this chapter propose new approaches of unsupervised adaptation for spontaneous speech and evaluate the methods by using diagonal-covariance and full-covariance hidden Markov models. In the adaptation procedure, both methods of language model (LM) adaptation and acoustic model (AM) adaptation are used iteratively. Several combination methods are tested to find the optimal approach. In the LM adaptation, a word trigram model and a part-of-speech (POS) trigram model are combined to build a more task-specific LM. In addition, the authors propose an unsupervised speaker adaptation technique based on adaptation data weighting. The weighting is performed depending on POS class. In Japan, a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)” has been used as the common evaluation database for spontaneous speech and the authors used it for their recognition experiments. From the results, the proposed methods demonstrated a significant advantage in that task.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gezheng Xu ◽  
Wenge Rong ◽  
Yanmeng Wang ◽  
Yuanxin Ouyang ◽  
Zhang Xiong

Abstract Background Biomedical question answering (QA) is a sub-task of natural language processing in a specific domain, which aims to answer a question in the biomedical field based on one or more related passages and can provide people with accurate healthcare-related information. Recently, a lot of approaches based on the neural network and large scale pre-trained language model have largely improved its performance. However, considering the lexical characteristics of biomedical corpus and its small scale dataset, there is still much improvement room for biomedical QA tasks. Results Inspired by the importance of syntactic and lexical features in the biomedical corpus, we proposed a new framework to extract external features, such as part-of-speech and named-entity recognition, and fused them with the original text representation encoded by pre-trained language model, to enhance the biomedical question answering performance. Our model achieves an overall improvement of all three metrics on BioASQ 6b, 7b, and 8b factoid question answering tasks. Conclusions The experiments on BioASQ question answering dataset demonstrated the effectiveness of our external feature-enriched framework. It is proven by the experiments conducted that external lexical and syntactic features can improve Pre-trained Language Model’s performance in biomedical domain question answering task.


Author(s):  
Atro Voutilainen

This article outlines the recently used methods for designing part-of-speech taggers; computer programs for assigning contextually appropriate grammatical descriptors to words in texts. It begins with the description of general architecture and task setting. It gives an overview of the history of tagging and describes the central approaches to tagging. These approaches are: taggers based on handwritten local rules, taggers based on n-grams automatically derived from text corpora, taggers based on hidden Markov models, taggers using automatically generated symbolic language models derived using methods from machine tagging, taggers based on handwritten global rules, and hybrid taggers, which combine the advantages of handwritten and automatically generated taggers. This article focuses on handwritten tagging rules. Well-tagged training corpora are a valuable resource for testing and improving language model. The text corpus reminds the grammarian about any oversight while designing a rule.


2015 ◽  
Vol 3 (4) ◽  
pp. 33-47
Author(s):  
Son Trinh ◽  
Kiem Hoang

In this paper, improving naturalness HMM-based speech synthesis for Vietnamese language is described. By this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov models. A final speech waveform is synthesized from those speech parameters. The main objective for the development is to achieve maximum naturalness in output speech through key points. Firstly, system uses a high quality recorded Vietnamese speech database appropriate for training, especially in statistical parametric model approach. Secondly, prosodic informations such as tone, POS (part of speech) and features based on characteristics of Vietnamese language are added to ensure the quality of synthetic speech. Third, system uses STRAIGHT which showed its ability to produce high-quality voice manipulation and was successfully incorporated into HMM-based speech synthesis. The results collected show that the speech produced by our system has the best result when being compared with the other Vietnamese TTS systems trained from the same speech data.


2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


Author(s):  
Yu Zhou ◽  
Yanxiang Tong ◽  
Taolue Chen ◽  
Jin Han

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.


2020 ◽  
pp. 104-130
Author(s):  
Marianne Mithun

Much of linguistic typology is inherently categorical. In large-scale typological surveys, grammatical constructions, distinctions, and even variables are typically classified as present, absent, or embodying one of a set of specified options. This work is valuable for a multitude of purposes, and in many cases such categorization is sufficient. In others, we can advance our understanding further if we take a more nuanced approach, considering the extent to which a particular construction, distinction, or variable is installed in the grammar. An important tool for this approach is the examination of unscripted speech in context, complete with prosody. This point is illustrated here with Mohawk, an Iroquoian language indigenous to the North American Northeast. As will be seen, the two types of construction which might be identified as relative clauses are emergent, one less integrated into the grammar than the other. Examination of spontaneous speech indicates that the earliest stages of development are prosodic, as speakers shape their messages according to their communicative purposes at each moment.


Author(s):  
Wanling Song ◽  
Anna L. Duncan ◽  
Mark S.P. Sansom

AbstractG protein-coupled receptors (GPCRs) play key roles in cellular signalling. GPCRs are suggested to form dimers and higher order oligomers in response to activation. However, we do not fully understand GPCR activation at larger scales and in an in vivo context. We have characterised oligomeric configurations of the adenosine 2a receptor (A2aR) by combining large-scale molecular dynamics simulations with Markov state models. Receptor activation results in enhanced oligomerisation, more diverse oligomer populations, and a more connected oligomerisation network. The active state conformation of the A2aR shifts protein-protein association interfaces to those involving intracellular loop ICL3 and transmembrane helix TM6. Binding of PIP2 to A2aR stabilises protein-protein interactions via PIP2-mediated association interfaces. These results indicate that A2aR oligomerisation is responsive to the local membrane lipid environment. This in turn suggests a modulatory effect on A2aR whereby a given oligomerisation profile favours the dynamic formation of specific supra-molecular signalling complexes.


Sign in / Sign up

Export Citation Format

Share Document