Auditory-like filterbank: An optimal speech processor for efficient human speech communication

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so. The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility? To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals. To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units. Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>

Download Full-text

Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions

10.21437/interspeech.2010-470 ◽

2010 ◽

Author(s):

Seyed Omid Sadjadi ◽

Sanjay A. Patil ◽

John H. L. Hansen

Keyword(s):

Acoustic Signals ◽

Speech Communication ◽

Human Speech ◽

Acoustic Conditions

Download Full-text

Coding and decoding of messages in human speech communication: Implications for machine recognition of speech

Speech Communication ◽

10.1016/j.specom.2018.12.004 ◽

2019 ◽

Vol 106 ◽

pp. 112-117 ◽

Cited By ~ 2

Author(s):

Hynek Hermansky

Keyword(s):

Speech Communication ◽

Human Speech

Download Full-text

50 Years of Progress in Speech and Speaker Recognition Research

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.200512.51834 ◽

1970 ◽

Vol 1 (2) ◽

pp. 64-74 ◽

Cited By ~ 12

Author(s):

Sadaoki Furui

Keyword(s):

Speaker Recognition ◽

Human Performance ◽

Operating Conditions ◽

Speech Communication ◽

Human Speech ◽

The Past ◽

Satisfactory Performance ◽

Recognition Systems ◽

Technological Perspective ◽

Made In

Research in automatic speech and speaker recognition has now spanned five decades. This paper surveys the major themes and advances made in the past fifty years of research so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. Although many techniques have been developed, many challenges have yet to be overcome before we can achieve the ultimate goal of creating machines that can communicate naturally with people. Such a machine needs to be able to deliver a satisfactory performance under a broad range of operating conditions. A much greater understanding of the human speech process is required before automatic speech and speaker recognition systems can approach human performance.

Download Full-text

The Similarities and Differences of Speech Communication between Face-to-Face Situations and Electronic Situations

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.490-495.756 ◽

2012 ◽

Vol 490-495 ◽

pp. 756-760

Author(s):

Qing Guo Xie ◽

Yan Hui Cao

Keyword(s):

Literature Review ◽

Oral Language ◽

Oral Tradition ◽

Electronic Media ◽

Speech Communication ◽

Basic Form ◽

Human Speech ◽

Face To Face ◽

Communication Environment ◽

Similarities And Differences

With the development of electronic media and the demand of human, speech communication becomes more and more diversified and important. Face-to-face Speech Communication(FSC) is the most basic form and the flourishing of Electronic Speech Communication(ESC) indicates rejuvenation of oral tradition. This paper aims at demonstrating the similarities and differences between FSC and ESC through literature review and observation. FSC is highly dependent on communication environment while ESC depend upon other non-verbal symbols which determine communicator’s image; in terms of communication content, both of them are full of characteristics of oral language, but have significant differences in information code and language regulation; From the perspective of emotion interaction, FSC pays more attention to face-work while ESC focuses on confrontation performance to create exciting point.

Download Full-text

Do Communicants Transmit Information in Course of Speech Communication?

RUDN Journal of Language Studies Semiotics and Semantics ◽

10.22363/2313-2299-2021-12-2-255-268 ◽

2021 ◽

Vol 12 (2) ◽

pp. 255-268

Author(s):

Evgenij F. Tarasov

Keyword(s):

Information Transfer ◽

Direct Method ◽

Human Interaction ◽

Speech Communication ◽

Message Content ◽

Human Speech ◽

Consciousness Development ◽

Joint Activities ◽

Systemic Activity ◽

Transfer Of Information

The article questions if human speech communication (SC) involves a transfer of information. The information functioning in speech communication is dwelled upon in the information and systemic activity approaches. The informational approach adequately explains only the direct method of information transfer, while the systemic activity approach is relevant for the sign-mediated speech communication typical for human interaction. The more heuristic thesis is that the perception of the chain of linguistic sign bodies produced in the intersubjective space only starts the construction of the perceived speech message content by the recipient. The completeness of the constructed speech message content depends entirely on the recipient, who has the optimal common consciousness with the speaker. The purpose of speech messages is not the actual construction of the content by the recipient, but the development of the message personal meaning. In human speech communication, the communicants do not transmit information, but use verbal signs bodies to actualize images of consciousness which are developed within a single ethnic culture and therefore are common for them. The incentive for the common consciousness development by the communicants is their participation in joint activities that ensure their earthly existence.

Download Full-text

Speech Communication from an Information Theoretical Perspective

10.26686/wgtn.17136377.v1 ◽

2021 ◽

Author(s):

◽

Steven Van Kuyk

Keyword(s):

Information Theory ◽

Speech Intelligibility ◽

Communication Systems ◽

Research Question ◽

Information Rate ◽

Speech Signals ◽

Speech Communication ◽

Human Speech ◽

Time Frequency ◽

Rate Of Speech

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so. The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility? To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals. To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units. Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>

Download Full-text