A Practical Singing Voice Detection System Based on GRU-RNN

Singing voice detection or vocal detection is a classification task that determines whether a given audio segment contains singing voices. This task plays a very important role in vocal-related music information retrieval tasks, such as singer identification. Although humans can easily distinguish between singing and nonsinging parts, it is still very difficult for machines to do so. Most existing methods focus on audio feature engineering with classifiers, which rely on the experience of the algorithm designer. In recent years, deep learning has been widely used in computer hearing. To extract essential features that reflect the audio content and characterize the vocal context in the time domain, this study adopted a long-term recurrent convolutional network (LRCN) to realize vocal detection. The convolutional layer in LRCN functions in feature extraction, and the long short-term memory (LSTM) layer can learn the time sequence relationship. The preprocessing of singing voices and accompaniment separation and the postprocessing of time-domain smoothing were combined to form a complete system. Experiments on five public datasets investigated the impacts of the different features for the fusion, frame size, and block size on LRCN temporal relationship learning, and the effects of preprocessing and postprocessing on performance, and the results confirm that the proposed singing voice detection algorithm reached the state-of-the-art level on public datasets.

Download Full-text

Singing voice detection for karaoke application

Visual Communications and Image Processing 2005 ◽

10.1117/12.631645 ◽

2005 ◽

Cited By ~ 4

Author(s):

Arun Shenoy ◽

Yuansheng Wu ◽

Ye Wang

Keyword(s):

Singing Voice ◽

Voice Detection

Download Full-text

Singing voice detection in pop songs using co-training algorithm

2008 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2008.4517938 ◽

2008 ◽

Cited By ~ 1

Author(s):

Swe Zin Kalayar Khine ◽

Tin Lay Nwe ◽

Haizhou Li

Keyword(s):

Training Algorithm ◽

Singing Voice ◽

Voice Detection

Download Full-text

Singing voice detection in music tracks using direct voice vibrato detection

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4959926 ◽

2009 ◽

Cited By ~ 29

Author(s):

L. Regnier ◽

G. Peeters

Keyword(s):

Singing Voice ◽

Voice Detection

Download Full-text

Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music

10.21437/interspeech.2020-1806 ◽

2020 ◽

Author(s):

Yuanbo Hou ◽

Frank K. Soong ◽

Jian Luan ◽

Shengchen Li

Keyword(s):

Transfer Learning ◽

Instrumental Music ◽

Singing Voice ◽

Voice Detection

Download Full-text

Comparative study of singing voice detection based on deep neural networks and ensemble learning

Human-centric Computing and Information Sciences ◽

10.1186/s13673-018-0158-1 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 2

Author(s):

Shingchern D. You ◽

Chien-Hung Liu ◽

Woei-Kae Chen

Keyword(s):

Neural Networks ◽

Comparative Study ◽

Ensemble Learning ◽

Deep Neural Networks ◽

Singing Voice ◽

Voice Detection

Download Full-text

Context-Aware Features for Singing Voice Detection in Polyphonic Music

Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37425-8_4 ◽

2013 ◽

pp. 43-57 ◽

Cited By ~ 2

Author(s):

Vishweshwara Rao ◽

Chitralekha Gupta ◽

Preeti Rao

Keyword(s):

Context Aware ◽

Singing Voice ◽

Voice Detection ◽

Polyphonic Music

Download Full-text

Singing Voice Detection Using Multi-Feature Deep Fusion with CNN

Lecture Notes in Electrical Engineering - Proceedings of the 7th Conference on Sound and Music Technology (CSMT) ◽

10.1007/978-981-15-2756-2_4 ◽

2019 ◽

pp. 41-52

Author(s):

Xulong Zhang ◽

Shengchen Li ◽

Zijin Li ◽

Shizhe Chen ◽

Yongwei Gao ◽

...

Keyword(s):

Singing Voice ◽

Voice Detection

Download Full-text

Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization

Electronics ◽

10.3390/electronics10101214 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1214

Author(s):

Michael Krause ◽

Meinard Müller ◽

Christof Weiß

Keyword(s):

Machine Learning ◽

Traditional Approach ◽

Training Dataset ◽

Singing Voice ◽

Audio Recordings ◽

Voice Detection ◽

Music Information ◽

Music Audio ◽

Recorded Performances

Automatically detecting the presence of singing in music audio recordings is a central task within music information retrieval. While modern machine-learning systems produce high-quality results on this task, the reported experiments are usually limited to popular music and the trained systems often overfit to confounding factors. In this paper, we aim to gain a deeper understanding of such machine-learning methods and investigate their robustness in a challenging opera scenario. To this end, we compare two state-of-the-art methods for singing voice detection based on supervised learning: A traditional approach relying on hand-crafted features with a random forest classifier, as well as a deep-learning approach relying on convolutional neural networks. To evaluate these algorithms, we make use of a cross-version dataset comprising 16 recorded performances (versions) of Richard Wagner’s four-opera cycle Der Ring des Nibelungen. This scenario allows us to systematically investigate generalization to unseen versions, musical works, or both. In particular, we study the trained systems’ robustness depending on the acoustic and musical variety, as well as the overall size of the training dataset. Our experiments show that both systems can robustly detect singing voice in opera recordings even when trained on relatively small datasets with little variety.

Download Full-text

A Practical Singing Voice Detection System Based on GRU-RNN

On fusion of timbre-motivated features for singing voice detection and singer identification

Research on Singing Voice Detection Based on a Long-Term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing

Singing voice detection for karaoke application

Singing voice detection in pop songs using co-training algorithm

Singing voice detection in music tracks using direct voice vibrato detection

Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music

Comparative study of singing voice detection based on deep neural networks and ensemble learning

Context-Aware Features for Singing Voice Detection in Polyphonic Music

Singing Voice Detection Using Multi-Feature Deep Fusion with CNN

Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization

Export Citation Format