Sub band Speech analysis using Gammatone Filter banks and optimal pitch extraction methods for each sub band using average magnitude difference function (AMDF) for LPC Speech Coders in Noisy Environments

2010 ◽  
Vol 1 (2) ◽  
pp. 13-24
Author(s):  
Suma S.A. ◽  
Dr. K.S.Gurumurthy
1974 ◽  
Vol 22 (5) ◽  
pp. 353-362 ◽  
Author(s):  
M. Ross ◽  
H. Shaffer ◽  
A. Cohen ◽  
R. Freudberg ◽  
H. Manley

Information ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 24 ◽  
Author(s):  
Zhao Han ◽  
Xiaoli Wang

Period detection technology for weak characteristic signals is very important in the fields of speech signal processing, mechanical engineering, etc. Average magnitude difference function (AMDF) is a widely used method to extract the period of periodic signal for its low computational complexity and high accuracy. However, this method has low detection accuracy when the background noise is strong. In order to improve this method, this paper proposes a new method of period detection of the signal with single period based on the morphological self-complementary Top-Hat (STH) transform and AMDF. Firstly, the signal is de-noised by the morphological self-complementary Top-Hat transform. Secondly, the average magnitude difference function of the noise reduction sequence is calculated, and the falling trend is suppressed. Finally, a calculating adaptive threshold is used to extract the peaks at the position equal to the period of periodic signal. The experimental results show that the accuracy of periodic extraction of AMDF after Top-Hat filtering is better than that of AMDF directly. In summary, the proposed method is reliable and stable for detecting the periodic signal with weak characteristics.


2014 ◽  
Vol 490-491 ◽  
pp. 1287-1292 ◽  
Author(s):  
Jian Da Wu ◽  
Pang Yi Liu ◽  
Guan Long Hong

This study presents a driver identification system using voice analysis for a vehicle security system. The structure of the proposed system has three parts. The first procedure is speech pre-processing, the second is feature extraction of sound signals, and the third is classification of driver voice. Initially, a database of sound signals for several drivers was established. The volume and zero-crossing rate (ZCR) of sound are used to detect the voice end-point in order to reduce data computation. Then the Auto-correlation Function (ACF) and Average Magnitude Difference Function (AMDF) methods are applied to retrieve the voice pitch features. Finally these features are used to identify the drivers by a General Regression Neural Network (GRNN). The experimental results show that the development of this voice identification system can use fewer feature vectors of pitch to obtain a good recognition rate.


2013 ◽  
Vol 325-326 ◽  
pp. 1649-1652
Author(s):  
Wei Wei Shi ◽  
Wei Hua Xiong ◽  
Yun Yun Chu ◽  
Yu Liu

Speech endpoint detection plays an important role in speech signal processing. In this paper, a method of speech endpoint detection based on empirical mode decomposition is introduced for accurately detecting the speech endpoint. This method used in speech signal decomposition gets a set of intrinsic mode functions (IMF). An IMF which contained a lot of noise must be filtered, and the rest of IMFs can be reconstructed to a new speech signal. The speech endpoint is detected by average magnitude difference function precisely. Simulation experiments show that the method proposed in this paper can eliminate the impact of noise effectively and detect the speech signal endpoint accurately.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1888
Author(s):  
Juraj Kacur ◽  
Boris Puterka ◽  
Jarmila Pavlovicova ◽  
Milos Oravec

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.


Sign in / Sign up

Export Citation Format

Share Document