scholarly journals Auditory-like filterbank: An optimal speech processor for efficient human speech communication

Sadhana ◽  
2011 ◽  
Vol 36 (5) ◽  
pp. 699-712
Author(s):  
PRASANTA KUMAR GHOSH ◽  
LOUIS M GOLDSTEIN ◽  
SHRIKANTH S NARAYANAN
2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


Author(s):  
Sadaoki Furui

Research in automatic speech and speaker recognition has now spanned five decades. This paper surveys the major themes and advances made in the past fifty years of research so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. Although many techniques have been developed, many challenges have yet to be overcome before we can achieve the ultimate goal of creating machines that can communicate naturally with people. Such a machine needs to be able to deliver a satisfactory performance under a broad range of operating conditions. A much greater understanding of the human speech process is required before automatic speech and speaker recognition systems can approach human performance.


2012 ◽  
Vol 490-495 ◽  
pp. 756-760
Author(s):  
Qing Guo Xie ◽  
Yan Hui Cao

With the development of electronic media and the demand of human, speech communication becomes more and more diversified and important. Face-to-face Speech Communication(FSC) is the most basic form and the flourishing of Electronic Speech Communication(ESC) indicates rejuvenation of oral tradition. This paper aims at demonstrating the similarities and differences between FSC and ESC through literature review and observation. FSC is highly dependent on communication environment while ESC depend upon other non-verbal symbols which determine communicator’s image; in terms of communication content, both of them are full of characteristics of oral language, but have significant differences in information code and language regulation; From the perspective of emotion interaction, FSC pays more attention to face-work while ESC focuses on confrontation performance to create exciting point.


Author(s):  
Evgenij F. Tarasov

The article questions if human speech communication (SC) involves a transfer of information. The information functioning in speech communication is dwelled upon in the information and systemic activity approaches. The informational approach adequately explains only the direct method of information transfer, while the systemic activity approach is relevant for the sign-mediated speech communication typical for human interaction. The more heuristic thesis is that the perception of the chain of linguistic sign bodies produced in the intersubjective space only starts the construction of the perceived speech message content by the recipient. The completeness of the constructed speech message content depends entirely on the recipient, who has the optimal common consciousness with the speaker. The purpose of speech messages is not the actual construction of the content by the recipient, but the development of the message personal meaning. In human speech communication, the communicants do not transmit information, but use verbal signs bodies to actualize images of consciousness which are developed within a single ethnic culture and therefore are common for them. The incentive for the common consciousness development by the communicants is their participation in joint activities that ensure their earthly existence.


2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


2002 ◽  
pp. 1-10
Author(s):  
John Holmes ◽  
Wendy Holmes

Sign in / Sign up

Export Citation Format

Share Document