Combining speech energy and edge information for fast and efficient voice activity detection in noisy environments

Author(s):  
Xiaokun Li ◽  
Yunbin Deng
Author(s):  
Charaf Eddine Chelloug ◽  
◽  
Atef Farrouki ◽  

In speech compression systems, Voice Activity Detection (VAD) is frequently used to distinguish active voice from other noisy sounds. In this paper, a robust approach of VAD is presented to deal with non-stationary noisy environments. The proposed algorithm exploits adaptive thresholding technique to keep a desired False Acceptance (FA) rate. Iterative hypothesis tests, using signal energy, are implemented to discard or to accept the successive audio frames as active voice. According to the stationary property of the speech, we provide a smoothing method to obtain final VAD decisions. The main contribution of the proposed algorithm concerns its ability to automatically adjust the energy threshold according to the local noise estimator. We analyzed the proposed approach by presenting a comparison with the G.729-B via the NOIZEUS database. The VAD architecture is implemented on a Microcontroller-based system (MCU). Several tests have been conducted by performing real time acquisition via the Input/Output ports of the MCU-system.


Author(s):  
Yasunari Obuchi

This paper proposes a new voice activity detection (VAD) algorithm based on statistical noise suppression and framewise speech/non-speech classification. Although many VAD algorithms have been developed that are robust in noisy environments, the most successful ones are related to statistical noise suppression in some way. Accordingly, we formulate our VAD algorithm as a combination of noise suppression and subsequent framewise classification. The noise suppression part is improved by introducing the idea that any unreliable frequency component should be removed, and the decision can be made by the remaining signal. This augmentation can be realized using a few additional parameters embedded in the gain-estimation process. The framewise classification part can be either model-less or model-based. A model-less classifier has the advantage that it can be applied to any situation, even if no training data are available. In contrast, a model-based classifier (e.g., neural network-based classifier) requires training data but tends to be more accurate. The accuracy of the proposed algorithm is evaluated using the CENSREC-1-C public framework and confirmed to be superior to many existing algorithms.


2020 ◽  
Vol 10 (15) ◽  
pp. 5026
Author(s):  
Seon Man Kim

This paper proposes a technique for improving statistical-model-based voice activity detection (VAD) in noisy environments to be applied in an auditory hearing aid. The proposed method is implemented for a uniform polyphase discrete Fourier transform filter bank satisfying an auditory device time latency of 8 ms. The proposed VAD technique provides an online unified framework to overcome the frequent false rejection of the statistical-model-based likelihood-ratio test (LRT) in noisy environments. The method is based on the observation that the sparseness of speech and background noise cause high false-rejection error rates in statistical LRT-based VAD—the false rejection rate increases as the sparseness increases. We demonstrate that the false-rejection error rate can be reduced by incorporating likelihood-ratio order statistics into a conventional LRT VAD. We confirm experimentally that the proposed method relatively reduces the average detection error rate by 15.8% compared to a conventional VAD with only minimal change in the false acceptance probability for three different noise conditions whose signal-to-noise ratio ranges from 0 to 20 dB.


Sign in / Sign up

Export Citation Format

Share Document