voice detection Latest Research Papers

Singing voice detection or vocal detection is a classification task that determines whether there is a singing voice in a given audio segment. This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal melody extraction, and many more. This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN. It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets. Long-term recurrent convolutional networks have reached impressive results on public datasets. The main goal of the present paper is to investigate both classical and state-of-the-art approaches to singing voice detection.

Download Full-text

High-Performance Fake Voice Detection on Automatic Speaker Verification Systems for the Prevention of Cyber Fraud with Convolutional Neural Networks

10.24251/hicss.2022.764 ◽

2022 ◽

Author(s):

Ricardo Buettner ◽

Jan Gross ◽

Philipp Roessler ◽

Julia Winter ◽

Daniel Sauter ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

High Performance ◽

Speaker Verification ◽

Voice Detection ◽

Verification Systems

Download Full-text

Gammatone spectral latitude features extraction for pathological voice detection and classification

Applied Acoustics ◽

10.1016/j.apacoust.2021.108417 ◽

2022 ◽

Vol 185 ◽

pp. 108417

Author(s):

Changwei Zhou ◽

Yuanbo Wu ◽

Ziqi Fan ◽

Xiaojun Zhang ◽

Di Wu ◽

...

Keyword(s):

Features Extraction ◽

Voice Detection ◽

Pathological Voice

Download Full-text

Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks

Applied Sciences ◽

10.3390/app112411838 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11838

Author(s):

Wenming Gui ◽

Yukun Li ◽

Xian Zang ◽

Jinglan Zhang

Keyword(s):

Neural Networks ◽

Feature Learning ◽

Singing Voice ◽

Feature Maps ◽

Multi Scale ◽

Voice Detection ◽

Public Datasets ◽

The Voice ◽

Channel Properties ◽

Accuracy Performance

Singing voice detection is still a challenging task because the voice can be obscured by instruments having the same frequency band, and even the same timbre, produced by mimicking the mechanism of human singing. Because of the poor adaptability and complexity of feature engineering, there is a recent trend towards feature learning in which deep neural networks play the roles of feature extraction and classification. In this paper, we present two methods to explore the channel properties in the convolution neural network to improve the performance of singing voice detection by feature learning. First, channel attention learning is presented to measure the importance of a feature, in which two attention mechanisms are exploited, i.e., the scaled dot-product and squeeze-and-excitation. This method focuses on learning the importance of the feature map so that the neurons can place more attention on the more important feature maps. Second, the multi-scale representations are fed to the input channels, aiming at adding more information in terms of scale. Generally, different songs need different scales of a spectrogram to be represented, and multi-scale representations ensure the network can choose the best one for the task. In the experimental stage, we proved the effectiveness of the two methods based on three public datasets, with the accuracy performance increasing by up to 2.13 percent compared to its already high initial level.

Download Full-text

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmt-2021-0112 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Lei Geng ◽

Hongfeng Shan ◽

Zhitao Xiao ◽

Wei Wang ◽

Mei Wei

Keyword(s):

Short Term Memory ◽

Multimodal Fusion ◽

Speech Signals ◽

Speech Detection ◽

Voice Pathology Detection ◽

Voice Detection ◽

Lstm Network ◽

The Time Domain ◽

Short Time ◽

Pathological Voice

Abstract Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal’s harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.

Download Full-text

Pathological Voice Detection Using Transfer Learning Methods

10.1109/icsmd53520.2021.9670828 ◽

2021 ◽

Author(s):

Zhang Yihua ◽

Zhu Xincheng ◽

Wu Yuanbo ◽

Zhang Xiaojun ◽

Xu Yishen ◽

...

Keyword(s):

Transfer Learning ◽

Learning Methods ◽

Voice Detection ◽

Pathological Voice

Download Full-text

Optimizing an Automatic Creaky Voice Detection Method for Australian English Speaking Females

10.21437/interspeech.2021-711 ◽

2021 ◽

Author(s):

Hannah White ◽

Joshua Penney ◽

Andy Gibson ◽

Anita Szakay ◽

Felicity Cox

Keyword(s):

Detection Method ◽

English Speaking ◽

Voice Detection ◽

Creaky Voice ◽

Australian English

Download Full-text

Knowledge Distillation for Singing Voice Detection

10.21437/interspeech.2021-636 ◽

2021 ◽

Author(s):

Soumava Paul ◽

Gurunath Reddy M ◽

K. Sreenivasa Rao ◽

Partha Pratim Das

Keyword(s):

Singing Voice ◽

Knowledge Distillation ◽

Voice Detection

Download Full-text

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

Applied Sciences ◽

10.3390/app11157149 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7149

Author(s):

Ji-Yeoun Lee

Keyword(s):

Neural Network ◽

Deep Learning ◽

Linear Prediction ◽

Detection System ◽

Mel Frequency Cepstral Coefficients ◽

Learning Methods ◽

Acoustic Measures ◽

Voice Detection ◽

Voice Data ◽

Pathological Voice

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.

Download Full-text

A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.102604 ◽

2021 ◽

Vol 68 ◽

pp. 102604

Author(s):

D. Meghraoui ◽

B. Boudraa ◽

T. Merazi ◽

P. Gómez Vilda

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Processing Technique ◽

Voice Detection

Download Full-text

voice detection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Singing Voice Detection: A Survey

High-Performance Fake Voice Detection on Automatic Speaker Verification Systems for the Prevention of Cyber Fraud with Convolutional Neural Networks

Gammatone spectral latitude features extraction for pathological voice detection and classification

Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Pathological Voice Detection Using Transfer Learning Methods

Optimizing an Automatic Creaky Voice Detection Method for Australian English Speaking Females

Knowledge Distillation for Singing Voice Detection

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation

Export Citation Format

voice detectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Singing Voice Detection: A Survey

High-Performance Fake Voice Detection on Automatic Speaker Verification Systems for the Prevention of Cyber Fraud with Convolutional Neural Networks

Gammatone spectral latitude features extraction for pathological voice detection and classification

Exploring Channel Properties to Improve Singing Voice Detection with Convolutional Neural Networks

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method

Pathological Voice Detection Using Transfer Learning Methods

Optimizing an Automatic Creaky Voice Detection Method for Australian English Speaking Females

Knowledge Distillation for Singing Voice Detection

Experimental Evaluation of Deep Learning Methods for an Intelligent Pathological Voice Detection System Using the Saarbruecken Voice Database

A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation

voice detection
Recently Published Documents