scholarly journals Speech Communication from an Information Theoretical Perspective

2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>

2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


1965 ◽  
Author(s):  
Carl E. Williams ◽  
Michael H. L. Hecker ◽  
Karl D. Kryter

Author(s):  
Isiaka Ajewale Alimi

Digital hearing aids addresses the issues of noise and speech intelligibility that is associated with the analogue types. One of the main functions of the digital signal processor (DSP) of digital hearing aid systems is noise reduction which can be achieved by speech enhancement algorithms which in turn improve system performance and flexibility. However, studies have shown that the quality of experience (QoE) with some of the current hearing aids is not up to expectation in a noisy environment due to interfering sound, background noise and reverberation. It is also suggested that noise reduction features of the DSP can be further improved accordingly. Recently, we proposed an adaptive spectral subtraction algorithm to enhance the performance of communication systems and address the issue of associated musical noise generated by the conventional spectral subtraction algorithm. The effectiveness of the algorithm has been confirmed by different objective and subjective evaluations. In this study, an adaptive spectral subtraction algorithm is implemented using the noise-estimation algorithm for highly non-stationary noisy environments instead of the voice activity detection (VAD) employed in our previous work due to its effectiveness. Also, signal to residual spectrum ratio (SR) is implemented in order to control the amplification distortion for speech intelligibility improvement. The results show that the proposed scheme gives comparatively better performance and can be easily employed in digital hearing aid system for improving speech quality and intelligibility.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 231
Author(s):  
Weiheng Jiang ◽  
Xiaogang Wu ◽  
Yimou Wang ◽  
Bolin Chen ◽  
Wenjiang Feng ◽  
...  

Blind modulation classification is an important step in implementing cognitive radio networks. The multiple-input multiple-output (MIMO) technique is widely used in military and civil communication systems. Due to the lack of prior information about channel parameters and the overlapping of signals in MIMO systems, the traditional likelihood-based and feature-based approaches cannot be applied in these scenarios directly. Hence, in this paper, to resolve the problem of blind modulation classification in MIMO systems, the time–frequency analysis method based on the windowed short-time Fourier transform was used to analyze the time–frequency characteristics of time-domain modulated signals. Then, the extracted time–frequency characteristics are converted into red–green–blue (RGB) spectrogram images, and the convolutional neural network based on transfer learning was applied to classify the modulation types according to the RGB spectrogram images. Finally, a decision fusion module was used to fuse the classification results of all the receiving antennas. Through simulations, we analyzed the classification performance at different signal-to-noise ratios (SNRs); the results indicate that, for the single-input single-output (SISO) network, our proposed scheme can achieve 92.37% and 99.12% average classification accuracy at SNRs of −4 and 10 dB, respectively. For the MIMO network, our scheme achieves 80.42% and 87.92% average classification accuracy at −4 and 10 dB, respectively. The proposed method greatly improves the accuracy of modulation classification in MIMO networks.


Network ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 50-74
Author(s):  
Divyanshu Pandey ◽  
Adithya Venugopal ◽  
Harry Leib

Most modern communication systems, such as those intended for deployment in IoT applications or 5G and beyond networks, utilize multiple domains for transmission and reception at the physical layer. Depending on the application, these domains can include space, time, frequency, users, code sequences, and transmission media, to name a few. As such, the design criteria of future communication systems must be cognizant of the opportunities and the challenges that exist in exploiting the multi-domain nature of the signals and systems involved for information transmission. Focussing on the Physical Layer, this paper presents a novel mathematical framework using tensors, to represent, design, and analyze multi-domain systems. Various domains can be integrated into the transceiver design scheme using tensors. Tools from multi-linear algebra can be used to develop simultaneous signal processing techniques across all the domains. In particular, we present tensor partial response signaling (TPRS) which allows the introduction of controlled interference within elements of a domain and also across domains. We develop the TPRS system using the tensor contracted convolution to generate a multi-domain signal with desired spectral and cross-spectral properties across domains. In addition, by studying the information theoretic properties of the multi-domain tensor channel, we present the trade-off between different domains that can be harnessed using this framework. Numerical examples for capacity and mean square error are presented to highlight the domain trade-off revealed by the tensor formulation. Furthermore, an application of the tensor framework to MIMO Generalized Frequency Division Multiplexing (GFDM) is also presented.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 165
Author(s):  
Shiyu Guo ◽  
Mengna Shi ◽  
Yanqi Zhou ◽  
Jiayin Yu ◽  
Erfu Wang

As the main method of information transmission, it is particularly important to ensure the security of speech communication. Considering the more complex multipath channel transmission situation in the wireless communication of speech signals and separating or extracting the source signal from the convolutional signal are crucial steps in obtaining source information. In this paper, chaotic masking technology is used to guarantee the transmission safety of speech signals, and a fast fixed-point independent vector analysis algorithm is used to solve the problem of convolutional blind source separation. First, the chaotic masking is performed before the speech signal is sent, and the convolutional mixing process of multiple signals is simulated by impulse response filter. Then, the observed signal is transformed to the frequency domain by short-time Fourier transform, and instantaneous blind source separation is performed using a fast fixed-point independent vector analysis algorithm. The algorithm can preserve the high-order statistical correlation between frequencies to solve the permutation ambiguity problem in independent component analysis. Simulation experiments show that this algorithm can efficiently complete the blind extraction of convolutional signals, and the quality of recovered speech signals is better. It provides a solution for the secure transmission and effective separation of speech signals in multipath transmission channels.


2005 ◽  
Vol 47 (4) ◽  
pp. 411-423 ◽  
Author(s):  
Hasan Palaz ◽  
Yücel Bicil ◽  
Alper Kanak ◽  
Mehmet Ug̃ur Dog̃an

Sign in / Sign up

Export Citation Format

Share Document