Blocking artifacts in speech/audio: Dynamic auditory model-based characterization and optimal time–frequency smoothing

2009 ◽  
Vol 89 (4) ◽  
pp. 523-531 ◽  
Author(s):  
Chandra Sekhar Seelamantula ◽  
Thippur V. Sreenivas
Author(s):  
M. van der Schaar ◽  
E. Delory ◽  
J. van der Weide ◽  
C. Kamminga ◽  
J.C. Goold ◽  
...  

We tried to find discriminating features for sperm whale clicks in order to distinguish between clicks from different whales, or to enable unique identification. We examined two different methods to obtain suitable characteristics. First, a model based on the Gabor function was used to describe the dominant frequencies in a click, and then the model parameters were used as classification features. The Gabor function model was selected because it has been used to model dolphin sonar pulses with great precision. Additionally, it has the interesting property that it has an optimal time–frequency resolution. As such, it can indicate optimal usage of the sonar by sperm whales. Second, the clicks were expressed in a wavelet packet table, from which subsequently a local discriminant basis was created. A wavelet packet basis has the advantage that it offers a highly redundant number of coefficients, which allow signals to be represented in many different ways. From the redundant signal description a representation can be selected that emphasizes the differences between classes. This local discriminant basis is more flexible than the Gabor function, which can make it more suitable for classification, but it is also more complex. Class vectors were created with both models and classification was based on the distance of a click to these vectors. We show that the Gabor function could not model the sperm whale clicks very well, due to the variability of the changing click characteristics. Best performance was reached when three subsequent clicks were averaged to smoothen the variability. Around 70% of the clicks classified correctly in both the training and validation sets. The wavelet packet table adapted better to the changing characteristics, and gave better classification. Here, also using a 3-click moving average, around 95% of the training sets classified correctly and 78% of the validation sets. These numbers lowered by only a few per cent when single clicks, instead of a moving average, were classified. This indicates that, while the features may show too much variability to enable unique identification of individual whales on a click by click basis, the wavelet approach may be capable of distinguishing between a small group of whales.


2012 ◽  
Vol 452-453 ◽  
pp. 782-788
Author(s):  
Jin Feng Wang ◽  
Li Jie Feng ◽  
Zhao Hui Li

For the coal resources working which are affected by the coal mine flooding seriously, this paper make an analysis on the factors which affect the coal mine flooding emergency ability evaluation model based on GA-WNN is established through the wavelet neural network value which is optimized with genetic algorithm. This model combined the global optimization ability of genetic algorithm with the time-frequency localization of wavelet neural network. This combination can make up for many defects (for example, the neural network structure should be given artificially, the function can got local minimum easily and so on). Therefore, the local mine flooding emergency ability evaluation model based on genetic algorithm and wavelet neural network have higher reliability and calculation ability, and is beneficial to the pre-control management for coal mine flooding rescue.


2004 ◽  
Vol 01 (04) ◽  
pp. 345-356
Author(s):  
HYUNG-MIN PARK ◽  
JONG-HWAN LEE ◽  
TAESU KIM ◽  
UN-MIN BAE ◽  
BYUNG TAEK KIM ◽  
...  

An auditory model has been developed for an intelligent speech information acquisition system in real-world noisy environment. The developed mathematical model of the human auditory pathway consists of three components, i.e. the nonlinear feature extraction from cochlea to auditory cortex, the binaural processing at superior olivery complex, and the top-down attention from higher brain to the cochlea. The feature extraction is based on information-theoretic sparse coding throughout the auditory pathway. Also, the time-frequency masking is incorporated as a model of the lateral inhibition in both time and frequency domain. The binaural processing is modeled as the blind signal separation and adaptive noise canceling based on the independent component analysis with hundreds of time-delays for noisy reverberated signals. The Top-Down (TD) attention comes from familiarity and/or importance of the sensory information, i.e. the sound, and a simple but efficient TD attention model had been developed based on the error backpropagation algorithm. Also, the binaural processing and top-down attention are combined for speech signals with heavy noises. This auditory model requires extensive computing, and special hardware had been developed for real-time applications. Experimental results demonstrate much better recognition performance in real-world noisy environments.


2013 ◽  
Vol 765-767 ◽  
pp. 2862-2865
Author(s):  
Jin Lun Chen

The auditory filter-bank is the key component of auditory model, and its implementation involves a lot of computations. The time spent by an auditory filter-bank to finish its work has a significant effect on the real-time implementation of auditory model-based audio signal processing systems. In this paper, we give a brief introduction to the auditory filter-bank at the first, and then discuss its DSP-based implementation and optimization in details.


Sign in / Sign up

Export Citation Format

Share Document