Processing of speech signals using a microphone array for intelligent robots

Author(s):  
J Hu ◽  
C C Cheng ◽  
W H Liu

For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.

Author(s):  
Gregor Rozinaj

In this chapter we describe a proposal of a metropolitan information system (MIS) for providing various information for inhabitants of the city, as well as for strangers. The main principle is based on a philosophy of accessing data from the Internet and to provide a user-friendly interface to these data using various types of intelligent kiosks. The stress is put to the multimodal human-computer communications in both directions using image audio/speech and text modes. We propose several versions of the intelligent kiosks and various types of communications with MIS. The first version is placed on public places and offer three-dimensional human head displayed on a large display that gives information about city, institutions, weather, and so on. It is a system with integrated microphone array, camera, and touch screen as an input and two displays and loudspeakers as the output. Speech recognized question for some information will be transformed into an answer using database or Internet and then visually and acoustically displayed to the costumer with help of robust multilingual speech synthesizer and powerful graphical engine. The second flexible version, even if with limited functionality, is the concept of mobile phone used as a multimedia terminal for access to different information. The last possibility is to use a regular phone (fixed or mobile) to access MIS via an intelligent speech communication interface. The type of communications depends on the version of the terminal. The stand terminals suppose to have mainly fixed IP connection to MIS, but wireless access can be used as well. The second version of terminals uses WiFi technology to MIS. The last solution, the general phone, can access the MIS using either fixed telecommunication network or GSM.


2012 ◽  
Vol 2012 ◽  
pp. 1-11
Author(s):  
Kazunobu Kondo ◽  
Yusuke Mizuno ◽  
Takanori Nishino ◽  
Kazuya Takeda

Small agglomerative microphone array systems have been proposed for use with speech communication and recognition systems. Blind source separation methods based on frequency domain independent component analysis have shown significant separation performance, and the microphone arrays are small enough to make them portable. However, the level of computational complexity involved is very high because the conventional signal collection and processing method uses 60 microphones. In this paper, we propose a band selection method based on magnitude squared coherence. Frequency bands are selected based on the spatial and geometric characteristics of the microphone array device which is strongly related to the dodecahedral shape, and the selected bands are nonuniformly spaced. The estimated reduction in the computational complexity is 90% with a 68% reduction in the number of frequency bands. Separation performance achieved during our experimental evaluation was 7.45 (dB) (signal-to-noise ratio) and 2.30 (dB) (cepstral distortion). These results show improvement in performance compared to the use of uniformly spaced frequency band.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 165
Author(s):  
Shiyu Guo ◽  
Mengna Shi ◽  
Yanqi Zhou ◽  
Jiayin Yu ◽  
Erfu Wang

As the main method of information transmission, it is particularly important to ensure the security of speech communication. Considering the more complex multipath channel transmission situation in the wireless communication of speech signals and separating or extracting the source signal from the convolutional signal are crucial steps in obtaining source information. In this paper, chaotic masking technology is used to guarantee the transmission safety of speech signals, and a fast fixed-point independent vector analysis algorithm is used to solve the problem of convolutional blind source separation. First, the chaotic masking is performed before the speech signal is sent, and the convolutional mixing process of multiple signals is simulated by impulse response filter. Then, the observed signal is transformed to the frequency domain by short-time Fourier transform, and instantaneous blind source separation is performed using a fast fixed-point independent vector analysis algorithm. The algorithm can preserve the high-order statistical correlation between frequencies to solve the permutation ambiguity problem in independent component analysis. Simulation experiments show that this algorithm can efficiently complete the blind extraction of convolutional signals, and the quality of recovered speech signals is better. It provides a solution for the secure transmission and effective separation of speech signals in multipath transmission channels.


2021 ◽  
pp. 1-15
Author(s):  
Poovarasan Selvaraj ◽  
E. Chandra

The most challenging process in recent Speech Enhancement (SE) systems is to exclude the non-stationary noises and additive white Gaussian noise in real-time applications. Several SE techniques suggested were not successful in real-time scenarios to eliminate noises in the speech signals due to the high utilization of resources. So, a Sliding Window Empirical Mode Decomposition including a Variant of Variational Model Decomposition and Hurst (SWEMD-VVMDH) technique was developed for minimizing the difficulty in real-time applications. But this is the statistical framework that takes a long time for computations. Hence in this article, this SWEMD-VVMDH technique is extended using Deep Neural Network (DNN) that learns the decomposed speech signals via SWEMD-VVMDH efficiently to achieve SE. At first, the noisy speech signals are decomposed into Intrinsic Mode Functions (IMFs) by the SWEMD Hurst (SWEMDH) technique. Then, the Time-Delay Estimation (TDE)-based VVMD was performed on the IMFs to elect the most relevant IMFs according to the Hurst exponent and lessen the low- as well as high-frequency noise elements in the speech signal. For each signal frame, the target features are chosen and fed to the DNN that learns these features to estimate the Ideal Ratio Mask (IRM) in a supervised manner. The abilities of DNN are enhanced for the categories of background noise, and the Signal-to-Noise Ratio (SNR) of the speech signals. Also, the noise category dimension and the SNR dimension are chosen for training and testing manifold DNNs since these are dimensions often taken into account for the SE systems. Further, the IRM in each frequency channel for all noisy signal samples is concatenated to reconstruct the noiseless speech signal. At last, the experimental outcomes exhibit considerable improvement in SE under different categories of noises.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3469
Author(s):  
Chien-Chang Huang ◽  
Chien-Hao Liu

In this research, we proposed a miniaturized two-element sensor array inspired by Ormia Ochracea for sound direction finding applications. In contrast to the convectional approach of using mechanical coupling structures for enlarging the intensity differences, we exploited an electrical coupling network circuit composed of lumped elements to enhance the phase differences and extract the optimized output power for good signal-to-noise ratio. The separation distance between two sensors could be reduced from 0.5 wavelength to 0.1 wavelength 3.43 mm at the operation frequency of 10 kHz) for determining the angle of arrivals. The main advantages of the proposed device include low power losses, flexible designs, and wide operation bandwidths. A prototype was designed, fabricated, and experiments examined within a sound anechoic chamber. It was demonstrated that the proposed device had a phase enhancement of 110 ∘ at the incident angle of 90 ∘ and the normalized power level of −2.16 dB at both output ports. The received power levels of our device were 3 dB higher than those of the transformer-type direction-finding system. In addition, our proposed device could operate in the frequency range from 8 kHz to 12 kHz with a tunable capacitor. The research results are expected to be beneficial for the compact sonar or radar systems.


2018 ◽  
Vol 7 (2.17) ◽  
pp. 79
Author(s):  
Jyoshna Girika ◽  
Md Zia Ur Rahman

Removal of noise components of speech signals in mobile applications  is an important step to facilitate high resolution signals to the user. Throughout the communication method the speech signals are tainted by numerous non stationary noises. The Least Mean Square (LMS) technique is a fundamental adaptive technique usedbroadly in numerouspurposes as anoutcome of its plainness as well as toughness. In LMS technique, an importantfactor is the step size. It bewell-known that if the union rate of the LMS technique will be rapidif the step size is speedy, but the steady-state mean square error (MSE) will raise. On the rival, for the diminutive step size, the steady state MSE will be minute, but the union rate will be conscious. Thus, the step size offers anexchange among the convergence rate and the steady-state MSE of the LMS technique. Build the step size variable before fixed to recover the act of the LMS technique, explicitly, prefer large step size values at the time of the earlyunion of the LMS technique, and usetiny step size values when the structure is near up to its steady state, which results in Normalized LMS (NLMS) algorithms. In this practice the step size is not stable and changes along with the fault signal at that time. The Less mathematical difficulty of the adaptive filter is extremely attractive in speech enhancement purposes. This drop usually accessible by extract either the input data or evaluation fault.  The algorithms depend on an extract of fault or data are Sign Regressor (SR) Algorithms. We merge these sign version to various adaptive noise cancellers. SR Weight NLMS (SRWNLMS), SR Error NLMS (SRENLMS), SR Unbiased LMS (SRUBLMS) algorithms are individual introduced as a quality factor. These Adaptive noise cancellers are compared with esteem to Signal to Noise Ratio Improvement (SNRI). 


Sign in / Sign up

Export Citation Format

Share Document