Audio Segmentation and Classification Approach Based on Adaptive CNN in Broadcast Domain

Author(s):  
Sun Jingzhou ◽  
Wang Yongbin ◽  
Chen Xiaosen
2019 ◽  
Vol 30 (2) ◽  
pp. 44-66
Author(s):  
Jingzhou Sun ◽  
Yongbin Wang

Audio segmentation and classification are the basis of audio processing in broadcasting industries. A Dual-CNN (Dual-Convolutional Neural Network) method is proposed in this article in which it is possible to pre-train a CNN with unlabeled audio data so as to deal with the scarcity of labeled data. Auto-encoders (including an encoder and a decoder) are utilized, thus the name “Dual.” In the first place, audio sampling points and the derived STFT (Short-Time Fourier Transform) spectrograms pass through their own CNNs. Fusion of the extracted features is then performed. Finally, the merged features are sent to a fully connected network and the classification results are produced via Softmax. Being one of the segmentation-by-classification approaches, our solution also presents a novel smoothing method (SEG-smoothing) in order to deliver the best result of segmentation. A series of experiments have been conducted and their result verifies that the proposed approach for segmentation and classification outperforms alternative solutions.


Author(s):  
Diego Castán ◽  
David Tavarez ◽  
Paula Lopez-Otero ◽  
Javier Franco-Pedroso ◽  
Héctor Delgado ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document