Phonetic Features enhancement for Bangla automatic speech recognition

Author(s):  
Sharif M Rasel Kabir ◽  
Foyzul Hassan ◽  
Foysal Ahamed ◽  
Khondokar Mamun ◽  
Mohammad Nurul Huda ◽  
...  
2016 ◽  
Author(s):  
Cai Wingfield ◽  
Li Su ◽  
Xunying Liu ◽  
Chao Zhang ◽  
Phil Woodland ◽  
...  

AbstractThere is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro-and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.Author SummaryThe ability to understand spoken language is a defining human capacity. But despite decades of research, there is still no well-specified account of how sound entering the ear is neurally interpreted as a sequence of meaningful words. At the same time, modern computer-based Automatic Speech Recognition (ASR) systems are capable of nearhuman levels of performance, especially where word-identification is concerned. In this research we aim to bridge the gap between human and machine solutions to speech recognition. We use a novel combination of neuroimaging and statistical methods to relate human and machine internal states that are dynamically generated as spoken words are heard by human listeners and analysed by ASR systems. We find that the stable regularities discovered by the ASR process, linking speech input to phonetic labels, can be significantly related to the regularities extracted in the human brain. Both systems may have in common a representation of these regularities in terms of articulatory phonetic features, consistent with an analysis process which recovers the articulatory gestures that generated the speech. These results suggest a possible partnership between human-and machine-based research which may deliver both a better understanding of how the human brain provides such a robust solution to speech understanding, and generate insights that enhance the performance of future ASR systems.


Dementia ◽  
2018 ◽  
Vol 19 (4) ◽  
pp. 1173-1188 ◽  
Author(s):  
Traci Walker ◽  
Heidi Christensen ◽  
Bahman Mirheidari ◽  
Thomas Swainston ◽  
Casey Rutten ◽  
...  

Previous work on interactions in the memory clinic has shown that conversation analysis can be used to differentiate neurodegenerative dementia from functional memory disorder. Based on this work, a screening system was developed that uses a computerised ‘talking head’ (intelligent virtual agent) and a combination of automatic speech recognition and conversation analysis-informed programming. This system can reliably differentiate patients with functional memory disorder from those with neurodegenerative dementia by analysing the way they respond to questions from either a human doctor or the intelligent virtual agent. However, much of this computerised analysis has relied on simplistic, nonlinguistic phonetic features such as the length of pauses between talk by the two parties. To gain confidence in automation of the stratification procedure, this paper investigates whether the patients’ responses to questions asked by the intelligent virtual agent are qualitatively similar to those given in response to a doctor. All the participants in this study have a clear functional memory disorder or neurodegenerative dementia diagnosis. Analyses of patients’ responses to the intelligent virtual agent showed similar, diagnostically relevant sequential features to those found in responses to doctors’ questions. However, since the intelligent virtual agent’s questions are invariant, its use results in more consistent responses across people – regardless of diagnosis – which facilitates automatic speech recognition and makes it easier for a machine to learn patterns. Our analysis also shows why doctors do not always ask the same question in the exact same way to different patients. This sensitivity and adaptation to nuances of conversation may be interactionally helpful; for instance, altering a question may make it easier for patients to understand. While we demonstrate that some of what is said in such interactions is bound to be constructed collaboratively between doctor and patient, doctors could consider ensuring that certain, particularly important and/or relevant questions are asked in as invariant a form as possible to be better able to identify diagnostically relevant differences in patients’ responses.


Author(s):  
Peter A. Heeman ◽  
Rebecca Lunsford ◽  
Andy McMillin ◽  
J. Scott Yaruss

Author(s):  
Manoj Kumar ◽  
Daniel Bone ◽  
Kelly McWilliams ◽  
Shanna Williams ◽  
Thomas D. Lyon ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document