ideal binary mask Latest Research Papers

Speech and music segregation from a single channel is a challenging task due to background interference and intermingled signals of voice and music channels. It is of immense importance due to its utility in wide range of applications such as music information retrieval, singer identification, lyrics recognition and alignment. This paper presents an effective method for speech and music segregation. Considering the repeating nature of music, we first detect the local repeating structures in the signal using a locally defined window for each segment. After detecting the repeating structure, we extract them and perform separation using a soft time-frequency mask. We apply an ideal binary mask to enhance the speech and music intelligibility. We evaluated the proposed method on the mixtures set at -5 dB, 0 dB, 5 dB from Multimedia Information Retrieval-1000 clips (MIR-1K) dataset. Experimental results demonstrate that the proposed method for speech and music segregation outperforms the existing state-of-the-art methods in terms of Global-Normalized-Signal-to-Distortion Ratio (GNSDR) values

Download Full-text

Quality Evaluation of Speech Enhancement Algorithms for Normal and Hearing Loss Listeners

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2479.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 7-12

Keyword(s):

Mean Square Error ◽

Speech Enhancement ◽

Quality Evaluation ◽

Minimum Mean Square Error ◽

Subjective Quality ◽

Binary Mask ◽

Mean Square ◽

Regional Language ◽

Speech Database ◽

Ideal Binary Mask

The subjective quality test of the enhanced speech from different enhancement algorithms for listeners with normal hearing (NH) capability as well as listeners with hearing impairment (HI) is reported. The subjective quality evaluation of speech enhancement methods in the literature survey is mostly done targeting NH listeners and fewer attempts are observed to subjectively evaluate for HI listeners. The algorithms evaluated are from four different classes: spectral subtraction class(SS), statistical model based class (minimum mean square error), subspace class(PKLT) and auditory class (ideal binary mask using STFT, ideal binary mask using gammatone filterbank and ideal binary mask using gammachirp filterbank). The algorithms are evaluated using four types of real world noises recorded in Indian scenarios namely cafeteria, traffic, station and train at -5, 0, 5 and 10 dB SNRs. The evaluation is being done as per ITU-T P.835 standard in terms of three parametersspeech signal alone, background noise and overall quality. The noisy speech database developed in Indian regional language, Marathi, at four SNRs -5, 0, 5 and 10 dB is used for evaluation. Significant improvement is observed in ideal binary mask algorithm in terms of overall quality and signal distortion ratings for NH and HI listeners. The performance of minimum mean square error is also observed comparable with the ideal binary mask algorithm in some cases.

Download Full-text

A Competing Voices Test for Hearing-Impaired Listeners Applied to Spatial Separation and Ideal Time-Frequency Masks

Trends in Hearing ◽

10.1177/2331216519848288 ◽

2019 ◽

Vol 23 ◽

pp. 233121651984828

Author(s):

Lars Bramsløw ◽

Marianna Vatti ◽

Rikke Rossing ◽

Gaurav Naithani ◽

Niels Henrik Pontoppidan

Keyword(s):

Spatial Separation ◽

Target Sentence ◽

Hearing Impaired ◽

Temporal Position ◽

Time Frequency ◽

Ideal Binary Mask ◽

Spatial Unmasking ◽

Switching Attention ◽

Spatial Condition ◽

The Ideal

People with hearing impairment find competing voices scenarios to be challenging, both with respect to switching attention from one talker to the other, as well as maintaining attention. With the Danish competing voices test (CVT) presented here, the dual-attention skills can be assessed. The CVT provides sentences spoken by three male and three female talkers, played in sentence pairs. The task of the listener is to repeat the target sentence from the sentence pair based on cueing either before or after playback. One potential way of assisting segregation of two talkers is to take advantage of spatial unmasking by presenting one talker per ear after application of time-frequency masks for separating the mixture. Using the CVT, this study evaluated four spatial conditions in 14 moderate-to-severely hearing-impaired listeners to establish benchmark results for this type of algorithm applied to hearing-impaired listeners. The four spatial conditions were as follows: summed (diotic), separate, the ideal ratio mask, and the ideal binary mask. The results show that the test is sensitive to the change in spatial condition. The temporal position of the cue has a large impact, as cueing the target talker before playback focuses the attention toward the target, whereas cueing after playback requires equal attention to the two talkers, which is more difficult. Furthermore, both applied ideal masks show test scores very close to the ideal separate spatial condition, suggesting that this technique is useful for future separation algorithms using estimated rather than ideal masks.

Download Full-text

Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy

Neural Computing and Applications ◽

10.1007/s00521-018-3933-z ◽

2018 ◽

Vol 32 (4) ◽

pp. 1037-1050 ◽

Cited By ~ 6

Author(s):

Kin Wah Edward Lin ◽

B. T. Balamurali ◽

Enyan Koh ◽

Simon Lui ◽

Dorien Herremans

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Deep Convolutional Neural Network ◽

Cross Entropy ◽

Binary Mask ◽

Singing Voice ◽

Ideal Binary Mask ◽

Singing Voice Separation

Download Full-text

Interaural coherence induced ideal binary mask for binaural speech separation and dereverberation

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2016.7918416 ◽

2016 ◽

Author(s):

Yi-Ting Chen ◽

Tzu-Hao Chen ◽

Mao-Chang Huang ◽

Tai-Shih Chi

Keyword(s):

Binary Mask ◽

Speech Separation ◽

Ideal Binary Mask

Download Full-text

Intelligibility Assessment of Ideal Binary-Masked Noisy Speech with Acceptance of Room Acoustic

Journal of Electrical Engineering ◽

10.2478/jee-2014-0054 ◽

2015 ◽

Vol 65 (6) ◽

pp. 325-332

Author(s):

Sedlak Vladimír ◽

Durackova Daniela ◽

Zalusky Roman ◽

Kovacik Tomas

Keyword(s):

Signal To Noise Ratio ◽

Objective Measures ◽

Binary Mask ◽

Reverberation Time ◽

Noisy Signal ◽

Signal To Noise ◽

Noisy Speech ◽

Time Frequency ◽

Ideal Binary Mask ◽

The Ideal

Abstract In this paper the intelligibility of ideal binary-masked noisy signal is evaluated for different signal to noise ratio (SNR), mask error, masker types, distance between source and receiver, reverberation time and local criteria for forming the binary mask. The ideal binary mask is computed from time-frequency decompositions of target and masker signals by thresholding the local SNR within time-frequency units. The intelligibility of separated signal is measured using different objective measures computed in frequency and perceptual domain. The present study replicates and extends the findings which were already presented but mainly shows impact of room acoustic on the intelligibility performance of IBM technique.

Download Full-text