Convolutional recurrent neural networks with multi-sized convolution filters for sound-event recognition

Sound-event recognition often utilizes time-frequency analysis to produce an image-like spectrogram that provides a rich visual representation of original signal in time and frequency. Convolutional Neural Networks (CNN) with the ability of learning discriminative spectrogram patterns are suitable for sound-event recognition. However, there is relatively little effort that CNN makes full use of the important temporal information. In this paper, we propose MCRNN, a Convolutional Recurrent Neural Networks (CRNN) architecture for sound-event recognition, the letter “M” in the name “MCRNN” of our model denotes the multi-sized convolution filters. Richer features are extracted by using several different convolution filter sizes at the last convolution layer. In addition, cochleagram images are used as the input layer of the network, instead of the traditional spectrogram image of a sound signal. Experiments on the RWCP dataset shows that the recognition rate of the proposed method achieved 98.4% in clean conditions, and it robustly outperforms the existing methods, the recognition rate increased by 0.9%, 1.9% and 10.3% in 20 dB, 10 dB and 0 dB signal-to-noise ratios (SNR), respectively.

Download Full-text

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

2018 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2018.8489470 ◽

2018 ◽

Cited By ~ 6

Author(s):

Emre Cakir ◽

Tuomas Virtanen

Keyword(s):

Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Time Frequency ◽

Frequency Representation ◽

Sound Event ◽

Sound Event Detection ◽

End To End

Download Full-text

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414350 ◽

2021 ◽

Author(s):

Chih-Yuan Koh ◽

You-Siang Chen ◽

Yi-Wen Liu ◽

Mingsian R. Bai

Keyword(s):

Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Sound Event ◽

Feature Pyramid ◽

Sound Event Detection

Download Full-text

Recurrent Neural Networks for Narrowband Signal Detection in the Time-Frequency Domain

Symposium - International Astronomical Union ◽

10.1017/s0074180900193751 ◽

2004 ◽

Vol 213 ◽

pp. 483-486

Author(s):

David Brodrick ◽

Douglas Taylor ◽

Joachim Diederich

Keyword(s):

Neural Network ◽

Neural Networks ◽

Signal Detection ◽

Frequency Domain ◽

Recurrent Neural Networks ◽

Radio Frequency Interference ◽

Recurrent Networks ◽

Time Frequency ◽

Narrowband Signal ◽

Radio Signals

A recurrent neural network was trained to detect the time-frequency domain signature of narrowband radio signals against a background of astronomical noise. The objective was to investigate the use of recurrent networks for signal detection in the Search for Extra-Terrestrial Intelligence, though the problem is closely analogous to the detection of some classes of Radio Frequency Interference in radio astronomy.

Download Full-text

Enhancement of Speech Recognition System by neural network approaches of Clustering

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i1.4456 ◽

2013 ◽

Vol 6 (1) ◽

pp. 266-271

Author(s):

Anurag Upadhyay ◽

Chitranjanjit Kaur

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Alternative Energy ◽

Recognition Rate ◽

Speech Sound ◽

Recognition System ◽

Training Methods ◽

Indian Languages ◽

Phone Recognition

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature.Â

Download Full-text