On the information rate of speech communication

Author(s):  
Steven Van Kuyk ◽  
W. Bastiaan Kleijn ◽  
Richard C. Hendriks
2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


Author(s):  
D. Van Dyck

An (electron) microscope can be considered as a communication channel that transfers structural information between an object and an observer. In electron microscopy this information is carried by electrons. According to the theory of Shannon the maximal information rate (or capacity) of a communication channel is given by C = B log2 (1 + S/N) bits/sec., where B is the band width, and S and N the average signal power, respectively noise power at the output. We will now apply to study the information transfer in an electron microscope. For simplicity we will assume the object and the image to be onedimensional (the results can straightforwardly be generalized). An imaging device can be characterized by its transfer function, which describes the magnitude with which a spatial frequency g is transferred through the device, n is the noise. Usually, the resolution of the instrument ᑭ is defined from the cut-off 1/ᑭ beyond which no spadal information is transferred.


1995 ◽  
Vol 38 (5) ◽  
pp. 1014-1024 ◽  
Author(s):  
Robert L. Whitehead ◽  
Nicholas Schiavetti ◽  
Brenda H. Whitehead ◽  
Dale Evan Metz

The purpose of this investigation was twofold: (a) to determine if there are changes in specific temporal characteristics of speech that occur during simultaneous communication, and (b) to determine if known temporal rules of spoken English are disrupted during simultaneous communication. Ten speakers uttered sentences consisting of a carrier phrase and experimental CVC words under conditions of: (a) speech, (b) speech combined with signed English, and (c) speech combined with signed English for every word except the CVC word that was fingerspelled. The temporal features investigated included: (a) sentence duration, (b) experimental CVC word duration, (c) vowel duration in experimental CVC words, (d) pause duration before and after experimental CVC words, and (e) consonantal effects on vowel duration. Results indicated that for all durational measures, the speech/sign/fingerspelling condition was longest, followed by the speech/sign condition, with the speech condition being shortest. It was also found that for all three speaking conditions, vowels were longer in duration when preceding voiced consonants than vowels preceding their voiceless cognates, and that a low vowel was longer in duration than a high vowel. These findings indicate that speakers consistently reduced their rate of speech when using simultaneous communication, but did not violate these specific temporal rules of English important for consonant and vowel perception.


1965 ◽  
Author(s):  
Carl E. Williams ◽  
Michael H. L. Hecker ◽  
Karl D. Kryter

Author(s):  
Mohd Javed ◽  
Khaleel Ahmad ◽  
Ahmad Talha Siddiqui

WiMAX is the innovation and upgradation of 802.16 benchmarks given by IEEE. It has numerous remarkable qualities, for example, high information rate, the nature of the service, versatility, security and portability putting it heads and shoulder over the current advancements like broadband link, DSL and remote systems. Though like its competitors the concern for security remains mandatory. Since the remote medium is accessible to call, the assailants can undoubtedly get into the system, making the powerless against the client. Many modern confirmations and encryption methods have been installed into WiMAX; however, regardless it opens with up different dangers. In this paper, we proposed Elliptic curve Cryptography based on Cellular Automata (EC3A) for encryption and decryption the message for improving the WiMAX security


Sign in / Sign up

Export Citation Format

Share Document