Singing voice detection in pop songs using co-training algorithm

Author(s):  
Swe Zin Kalayar Khine ◽  
Tin Lay Nwe ◽  
Haizhou Li
Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1458
Author(s):  
Xulong Zhang ◽  
Yi Yu ◽  
Yongwei Gao ◽  
Xi Chen ◽  
Wei Li

Singing voice detection or vocal detection is a classification task that determines whether a given audio segment contains singing voices. This task plays a very important role in vocal-related music information retrieval tasks, such as singer identification. Although humans can easily distinguish between singing and nonsinging parts, it is still very difficult for machines to do so. Most existing methods focus on audio feature engineering with classifiers, which rely on the experience of the algorithm designer. In recent years, deep learning has been widely used in computer hearing. To extract essential features that reflect the audio content and characterize the vocal context in the time domain, this study adopted a long-term recurrent convolutional network (LRCN) to realize vocal detection. The convolutional layer in LRCN functions in feature extraction, and the long short-term memory (LSTM) layer can learn the time sequence relationship. The preprocessing of singing voices and accompaniment separation and the postprocessing of time-domain smoothing were combined to form a complete system. Experiments on five public datasets investigated the impacts of the different features for the fusion, frame size, and block size on LRCN temporal relationship learning, and the effects of preprocessing and postprocessing on performance, and the results confirm that the proposed singing voice detection algorithm reached the state-of-the-art level on public datasets.


2012 ◽  
Vol 461 ◽  
pp. 117-122 ◽  
Author(s):  
Ya Hui Zhao ◽  
Hong Li Wang ◽  
Rong Yi Cui

The AR-Tri-training algorithm is proposed for applying to the abnormal voice detection, and voice detection software is designed by mixed programming used Matlab and VC in this paper. Firstly, training samples are collected and the features of each sample are extracted including centroid, spectral entropy, wavelet and MFCC. Secondly, the assistant learning strategy is proposed, AR-Tri-training algorithm is designed by combining the rich information strategy. Finally, Classifiers are trained by using AR-Tri-training algorithm, and the integrated classifier is applied to voice detection. As can be drawn from the experimental results, AR-Tri-training not only removes mislabeled examples in training process, but also takes full advantage of the unlabeled examples and wrong-learning examples on validation set


Sign in / Sign up

Export Citation Format

Share Document