Discriminative Learning for Speech Recognition: Theory and Practice

2008 ◽  
Vol 4 (1) ◽  
pp. 1-112 ◽  
Author(s):  
Xiaodong He ◽  
Li Deng
Author(s):  
Jun Rokui ◽  

This paper presents MCE/GPD using GPD that is known as a highly effective discriminative learning method. MCE/GPD is an excellent recognition method that is applicable especially to speech recognition, since it excels in recognizing performance and can be used to deal with variable-length vectors. MCE/GPD involves a problem of calculation resulting from c omplicated algorithms making it impractical. In this paper, we propose a learning method to increase speed at learning based on a hierarchical model. We used a hierarchical neural network to evaluate the method’s performance.


Author(s):  
KALPANA JOSHI ◽  
NILIMA KOLHARE ◽  
V.M. PANDHARIPANDE

While many Automatic Speech Recognition applications employ powerful computers to handle the complex recognition algorithms, there is a clear demand for effective solutions on embedded platforms. Digital Signal Processing (DSP) is one of the most commonly used hardware platform that provides good development flexibility and requires relatively short application development cycle.DSP techniques have been at the heart of progress in Speech Processing during the last 25years.Simultaneously speech processing has been an important catalyst for the development of DSP theory and practice. Today DSP methods are used in speech analysis, synthesis, coding, recognition, enhancement as well as voice modification, speaker recognition, language identification.Speech recognition is generally computationally-intensive task and includes many of digital signal processing algorithms. In real-time and real environment speech recognisers applications, it’s often necessary to use embedded resource-limited hardware. Less memory, clock frequency, space and cost related to common architecture PC (x86), must be balanced by more effective computation.


1999 ◽  
Vol 08 (01) ◽  
pp. 43-52 ◽  
Author(s):  
ALEXANDRINA ROGOZAN

In recent years a number of techniques have been proposed to improve the accuracy and the robustness of automatic speech recognition in noisy environments. Among these, suplementing the acoustic information with visual data, mostly extracted from speaker's lip shapes, has been proved to be successful. We have already demonstrated the effectiveness of integrating visual data at two different levels during speech decoding according to both direct and separate identification strategies (DI+SI). This paper outlines methods for reinforcing the visible speech recognition in the framework of separate identification. First, we define visual-specific units using a self-organizing mapping technique. Second, we complete a stochastic learning of these units with a discriminative neural-network-based technique for speech recognition purposes. Finally, we show on a connected-letter speech recognition task that using these methods improves performances of the DI+SI based system under varying noise-level conditions.


2021 ◽  
Vol 3 ◽  
Author(s):  
Roozbeh Sadeghian ◽  
J. David Schaffer ◽  
Stephen A. Zahorian

Automatic Speech Recognition (ASR) is widely used in many applications and tools. Smartphones, video games, and cars are a few examples where people use ASR routinely and often daily. A less commonly used, but potentially very important arena for using ASR, is the health domain. For some people, the impact on life could be enormous. The goal of this work is to develop an easy-to-use, non-invasive, inexpensive speech-based diagnostic test for dementia that can easily be applied in a clinician’s office or even at home. While considerable work has been published along these lines, increasing dramatically recently, it is primarily of theoretical value and not yet practical to apply. A large gap exists between current scientific understanding, and the creation of a diagnostic test for dementia. The aim of this paper is to bridge this gap between theory and practice by engineering a practical test. Experimental evidence suggests that strong discrimination between subjects with a diagnosis of probable Alzheimer’s vs. matched normal controls can be achieved with a combination of acoustic features from speech, linguistic features extracted from a transcription of the speech, and results of a mini mental state exam. A fully automatic speech recognition system tuned for the speech-to-text aspect of this application, including automatic punctuation, is also described.


Sign in / Sign up

Export Citation Format

Share Document