Singing voice detection in pop songs using co-training algorithm

Singing voice detection or vocal detection is a classification task that determines whether a given audio segment contains singing voices. This task plays a very important role in vocal-related music information retrieval tasks, such as singer identification. Although humans can easily distinguish between singing and nonsinging parts, it is still very difficult for machines to do so. Most existing methods focus on audio feature engineering with classifiers, which rely on the experience of the algorithm designer. In recent years, deep learning has been widely used in computer hearing. To extract essential features that reflect the audio content and characterize the vocal context in the time domain, this study adopted a long-term recurrent convolutional network (LRCN) to realize vocal detection. The convolutional layer in LRCN functions in feature extraction, and the long short-term memory (LSTM) layer can learn the time sequence relationship. The preprocessing of singing voices and accompaniment separation and the postprocessing of time-domain smoothing were combined to form a complete system. Experiments on five public datasets investigated the impacts of the different features for the fusion, frame size, and block size on LRCN temporal relationship learning, and the effects of preprocessing and postprocessing on performance, and the results confirm that the proposed singing voice detection algorithm reached the state-of-the-art level on public datasets.

Download Full-text

Singing voice detection for karaoke application

Visual Communications and Image Processing 2005 ◽

10.1117/12.631645 ◽

2005 ◽

Cited By ~ 4

Author(s):

Arun Shenoy ◽

Yuansheng Wu ◽

Ye Wang

Keyword(s):

Singing Voice ◽

Voice Detection

Download Full-text

Singing voice detection in music tracks using direct voice vibrato detection

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4959926 ◽

2009 ◽

Cited By ~ 29

Author(s):

L. Regnier ◽

G. Peeters

Keyword(s):

Singing Voice ◽

Voice Detection

Download Full-text

Abnormal Voice Detection Algorithm Based on Semi-Supervised Co-Training Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.461.117 ◽

2012 ◽

Vol 461 ◽

pp. 117-122 ◽

Cited By ~ 1

Author(s):

Ya Hui Zhao ◽

Hong Li Wang ◽

Rong Yi Cui

Keyword(s):

Learning Strategy ◽

Detection Algorithm ◽

Training Algorithm ◽

Mixed Programming ◽

Training Samples ◽

Rich Information ◽

Detection Software ◽

Voice Detection ◽

The Rich ◽

Validation Set

The AR-Tri-training algorithm is proposed for applying to the abnormal voice detection, and voice detection software is designed by mixed programming used Matlab and VC in this paper. Firstly, training samples are collected and the features of each sample are extracted including centroid, spectral entropy, wavelet and MFCC. Secondly, the assistant learning strategy is proposed, AR-Tri-training algorithm is designed by combining the rich information strategy. Finally, Classifiers are trained by using AR-Tri-training algorithm, and the integrated classifier is applied to voice detection. As can be drawn from the experimental results, AR-Tri-training not only removes mislabeled examples in training process, but also takes full advantage of the unlabeled examples and wrong-learning examples on validation set

Download Full-text