Sub band Speech analysis using Gammatone Filter banks and optimal pitch extraction methods for each sub band using average magnitude difference function (AMDF) for LPC Speech Coders in Noisy Environments

Period detection technology for weak characteristic signals is very important in the fields of speech signal processing, mechanical engineering, etc. Average magnitude difference function (AMDF) is a widely used method to extract the period of periodic signal for its low computational complexity and high accuracy. However, this method has low detection accuracy when the background noise is strong. In order to improve this method, this paper proposes a new method of period detection of the signal with single period based on the morphological self-complementary Top-Hat (STH) transform and AMDF. Firstly, the signal is de-noised by the morphological self-complementary Top-Hat transform. Secondly, the average magnitude difference function of the noise reduction sequence is calculated, and the falling trend is suppressed. Finally, a calculating adaptive threshold is used to extract the peaks at the position equal to the period of periodic signal. The experimental results show that the accuracy of periodic extraction of AMDF after Top-Hat filtering is better than that of AMDF directly. In summary, the proposed method is reliable and stable for detecting the periodic signal with weak characteristics.

Download Full-text

Driver Voice Identification System Using Auto-Correlation Function and Average Magnitude Difference Function

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.490-491.1287 ◽

2014 ◽

Vol 490-491 ◽

pp. 1287-1292 ◽

Cited By ~ 1

Author(s):

Jian Da Wu ◽

Pang Yi Liu ◽

Guan Long Hong

Keyword(s):

Correlation Function ◽

Identification System ◽

Zero Crossing ◽

Voice Identification ◽

Auto Correlation ◽

Difference Function ◽

Auto Correlation Function ◽

The Voice ◽

Magnitude Difference ◽

Average Magnitude Difference Function

This study presents a driver identification system using voice analysis for a vehicle security system. The structure of the proposed system has three parts. The first procedure is speech pre-processing, the second is feature extraction of sound signals, and the third is classification of driver voice. Initially, a database of sound signals for several drivers was established. The volume and zero-crossing rate (ZCR) of sound are used to detect the voice end-point in order to reduce data computation. Then the Auto-correlation Function (ACF) and Average Magnitude Difference Function (AMDF) methods are applied to retrieve the voice pitch features. Finally these features are used to identify the drivers by a General Regression Neural Network (GRNN). The experimental results show that the development of this voice identification system can use fewer feature vectors of pitch to obtain a good recognition rate.

Download Full-text

A Speech Endpoint Detection Based on Empirical Mode Decomposition and Average Magnitude Difference Function

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.325-326.1649 ◽

2013 ◽

Vol 325-326 ◽

pp. 1649-1652

Author(s):

Wei Wei Shi ◽

Wei Hua Xiong ◽

Yun Yun Chu ◽

Yu Liu

Keyword(s):

Empirical Mode Decomposition ◽

Speech Signal ◽

Endpoint Detection ◽

Intrinsic Mode Functions ◽

Difference Function ◽

Mode Decomposition ◽

The Impact ◽

Speech Endpoint Detection ◽

Magnitude Difference ◽

Average Magnitude Difference Function

Speech endpoint detection plays an important role in speech signal processing. In this paper, a method of speech endpoint detection based on empirical mode decomposition is introduced for accurately detecting the speech endpoint. This method used in speech signal decomposition gets a set of intrinsic mode functions (IMF). An IMF which contained a lot of noise must be filtered, and the rest of IMFs can be reconstructed to a new speech signal. The speech endpoint is detected by average magnitude difference function precisely. Simulation experiments show that the method proposed in this paper can eliminate the impact of noise effectively and detect the speech signal endpoint accurately.

Download Full-text

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Sensors ◽

10.3390/s21051888 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1888

Author(s):

Juraj Kacur ◽

Boris Puterka ◽

Jarmila Pavlovicova ◽

Milos Oravec

Keyword(s):

Emotion Recognition ◽

Linear Prediction ◽

Filter Banks ◽

Vocal Tract ◽

Statistical Tests ◽

Extraction Methods ◽

Speech Emotion Recognition ◽

Speech Characteristics ◽

Evaluation Phase ◽

Cepstral Features

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

Download Full-text

Digital encoding of variable-length vectors with application to pitch extraction and pitch-synchronous speech analysis and synthesis

10.1109/icassp.1976.1170048 ◽

2005 ◽

Cited By ~ 1

Author(s):

D. Langle

Keyword(s):

Variable Length ◽

Speech Analysis ◽

Analysis And Synthesis ◽

Digital Encoding ◽

Pitch Extraction

Download Full-text