Hybrid methodological approach to context-dependent speech recognition

Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

Download Full-text

Telugu Speech Recognition on LSF and DNN Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5257.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7160-7162

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Deep Level ◽

Human Interaction ◽

Major Area ◽

Feature Classification ◽

Human Machine Interaction ◽

Easy Task ◽

Recognition Systems ◽

Machine Interaction

This fast world is running with machine and human interaction. This kind of interaction is not an easy task. For proper interaction between human and machine speech recognition is major area where the machine should understand the speech properly to perform the tasks. So ASR have been developed which improvised the HMIS (“Human Machine Interaction systems”) technology in to the deep level. This research focuses on speech recognition over “Telugu language”, which is used in Telugu HMI systems. This paper uses LSF (linear spectral frequencies) technique for feature extraction and DNN for feature classification which finally produced the effective results. Many other recognition systems also used these techniques but for Telugu language this are the most suitable techniques.

Download Full-text

The Benefits of Contextual Information for Speech Recognition Systems

2018 10th Computer Science and Electronic Engineering (CEEC) ◽

10.1109/ceec.2018.8674204 ◽

2018 ◽

Cited By ~ 1

Author(s):

Martin W. Kinch ◽

Wim J.C. Melis ◽

Simeon Keates

Keyword(s):

Speech Recognition ◽

Contextual Information ◽

Recognition Systems

Download Full-text

Usability and Acceptance of the Embodied Conversational Agent Anne by People with Dementia and their Caregivers: an exploratory study in home environment settings (Preprint)

10.2196/preprints.25891 ◽

2020 ◽

Author(s):

Vera Stara ◽

Benjamin Vera ◽

Daniel Bolliger ◽

Lorena Rossi ◽

Elisa Felici ◽

...

Keyword(s):

Speech Recognition ◽

Information And Communication Technologies ◽

Home Environment ◽

Scientific Community ◽

Community Dwelling ◽

Conversational Agents ◽

Human Machine Interaction ◽

Technical Problems ◽

Research And Innovation ◽

People With Dementia

BACKGROUND Information and Communication Technologies are seen as tools able to support cognitive functions, monitor health and movements, provide reminders to maintain residual memory abilities and promote social support, especially among patients with dementia. Among these technologies, Embodied Conversational Agents (ECAs) are seen as screen-based entities, designed to stimulate human face-to-face conversation skills and thus allow for natural human-machine interaction. Unfortunately, the efficacy of ECA in supporting people affected by dementia and their caregivers is not yet well studied. Therefore, research in this area is essential for the entire scientific community. OBJECTIVE This study aims to evaluate the usability and acceptance of the ECA Anne by seniors affected with dementia. The study is also designed to assess the ability of target users to utilize the system independently and receive valuable information from it. METHODS A 4-week trial was conducted involving 20 older adults with dementia and 14 family caregivers in home environment settings in Italy. This study used a mixed method approach, balancing quantitative and qualitative instruments to gather data from users. Telemetry data was also collected. RESULTS It emerges that older users were particularly engaged in providing significant responses and participated in system improvements. Some of them clearly discussed how technical problems related to speech recognition negatively impacted on the intention to use, adaptiveness, usefulness and trust. Moreover, the usability of the system achieved an encouraging score and the half of the sample recognized a role in the ECA. This study confirms that the quality of automatic speech recognition and synthesis is still a technical issue and has room for improvement, whereas touchscreen modality is almost stable and positively used by patients with dementia. CONCLUSIONS This specific field of research is novel and poorly discussed in the scientific community. This could be due to its newness, yet there is an urgent need to strengthen data, research and innovation to accelerate the implementation of ECA as a future way to offer non-pharmacological support to community-dwelling persons with dementia.

Download Full-text

User-awareness and adaptation in conversational agents

Facta universitatis - series Electronics and Energetics ◽

10.2298/fuee1403375d ◽

2014 ◽

Vol 27 (3) ◽

pp. 375-387 ◽

Cited By ~ 1

Author(s):

Vlado Delic ◽

Milan Gnjatovic ◽

Niksa Jakovljevic ◽

Branislav Popovic ◽

Ivan Jokic ◽

...

Keyword(s):

Speaker Recognition ◽

Emotional State ◽

Research Question ◽

Emotional States ◽

Conversational Agent ◽

Conversational Agents ◽

Human Machine Interaction ◽

Interaction Domain ◽

Dialogue Management ◽

Machine Interaction

This paper considers the research question of developing user-aware and adaptive conversational agents. The conversational agent is a system which is user-aware to the extent that it recognizes the user identity and his/her emotional states that are relevant in a given interaction domain. The conversational agent is user-adaptive to the extent that it dynamically adapts its dialogue behavior according to the user and his/her emotional state. The paper summarizes some aspects of our previous work and presents work-in-progress in the field of speech-based human-machine interaction. It focuses particularly on the development of speech recognition modules in cooperation with both modules for emotion recognition and speaker recognition, as well as the dialogue management module. Finally, it proposes an architecture of a conversational agent that integrates those modules and improves each of them based on some kind of synergies among themselves.

Download Full-text

Exploring E2E speech recognition systems for new languages

10.21437/iberspeech.2018-22 ◽

2018 ◽

Cited By ~ 1

Author(s):

Conrad Bernath ◽

Aitor Alvarez ◽

Haritz Arzelus ◽

Carlos David Martínez

Keyword(s):

Speech Recognition ◽

Recognition Systems

Download Full-text

Improving Transformer-Based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation

10.21437/interspeech.2019-2112 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sheng Li ◽

Dabre Raj ◽

Xugang Lu ◽

Peng Shen ◽

Tatsuya Kawahara ◽

...

Keyword(s):

Speech Recognition ◽

Recognition Systems

Download Full-text

SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems

10.21437/interspeech.2020-2787 ◽

2020 ◽

Author(s):

Huili Chen ◽

Bita Darvish ◽

Farinaz Koushanfar

Keyword(s):

Speech Recognition ◽

Ip Protection ◽

Recognition Systems

Download Full-text

Human-robot-interaction using cloud-based speech recognition systems

Procedia CIRP ◽

10.1016/j.procir.2020.05.214 ◽

2021 ◽

Vol 97 ◽

pp. 130-135

Author(s):

Christian Deuerlein ◽

Moritz Langer ◽

Julian Seßner ◽

Peter Heß ◽

Jörg Franke

Keyword(s):

Speech Recognition ◽

Human Robot Interaction ◽

Robot Interaction ◽

Recognition Systems

Download Full-text

Development of Speech Recognition Systems in Emergency Call Centers

Symmetry ◽

10.3390/sym13040634 ◽

2021 ◽

Vol 13 (4) ◽

pp. 634

Author(s):

Alakbar Valizada ◽

Natavan Akhundova ◽

Samir Rustamov

Keyword(s):

Speech Recognition ◽

Markov Model ◽

Hidden Markov ◽

Call Centers ◽

The Other ◽

Language Models ◽

Emergency Call ◽

Acoustic Model ◽

Different Types ◽

Recognition Systems

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.

Download Full-text

HPM: A Hybrid Model for User’s Behavior Prediction Based on N-Gram Parsing and Access Logs

Scientific Programming ◽

10.1155/2020/8897244 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Sonia Setia ◽

Verma Jyoti ◽

Neelam Duhan

Keyword(s):

Web Mining ◽

Contextual Information ◽

Hybrid Approach ◽

Web Pages ◽

Continuous Growth ◽

Novel Approach ◽

Content Mining ◽

Long Access ◽

N Gram ◽

The Individual

The continuous growth of the World Wide Web has led to the problem of long access delays. To reduce this delay, prefetching techniques have been used to predict the users’ browsing behavior to fetch the web pages before the user explicitly demands that web page. To make near accurate predictions for users’ search behavior is a complex task faced by researchers for many years. For this, various web mining techniques have been used. However, it is observed that either of the methods has its own set of drawbacks. In this paper, a novel approach has been proposed to make a hybrid prediction model that integrates usage mining and content mining techniques to tackle the individual challenges of both these approaches. The proposed method uses N-gram parsing along with the click count of the queries to capture more contextual information as an effort to improve the prediction of web pages. Evaluation of the proposed hybrid approach has been done by using AOL search logs, which shows a 26% increase in precision of prediction and a 10% increase in hit ratio on average as compared to other mining techniques.

Download Full-text