time scale modification
Recently Published Documents

TOTAL DOCUMENTS

116

(FIVE YEARS 7)

H-INDEX

17

(FIVE YEARS 1)

Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords

Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments

Applied Sciences ◽

10.3390/app11188420 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8420

Author(s):

Hemant Kumar Kathania ◽

Sudarsana Reddy Kadiri ◽

Paavo Alku ◽

Mikko Kurimo

Keyword(s):

Additive Noise ◽

Data Augmentation ◽

Training Data ◽

Speaking Rate ◽

Noisy Environments ◽

Testing Phase ◽

Combining Data ◽

Time Scale Modification ◽

Children's Speech

Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and therefore there are two mismatches between training and testing phases (i.e., clean speech in training vs. noisy speech in testing and adult speech in training vs. child speech in testing). This article studies methods to tackle the effects of these two mismatches in recognition of noisy children’s speech by investigating two techniques: data augmentation and time-scale modification. In the former, clean training data of adult speakers are corrupted with additive noise in order to obtain training data that better correspond to the noisy testing conditions. In the latter, the fundamental frequency (F0) and speaking rate of children’s speech are modified in the testing phase in order to reduce differences in the prosodic characteristics between the testing data of child speakers and the training data of adult speakers. A standard ASR system based on DNN–HMM was built and the effects of data augmentation, F0 modification, and speaking rate modification on word error rate (WER) were evaluated first separately and then by combining all three techniques. The experiments were conducted using children’s speech corrupted with additive noise of four different noise types in four different signal-to-noise (SNR) categories. The results show that the combination of all three techniques yielded the best ASR performance. As an example, the WER value averaged over all four noise types in the SNR category of 5 dB dropped from 32.30% to 12.09% when the baseline system, in which no data augmentation or time-scale modification were used, was replaced with a recognizer that was built using a combination of all three techniques. In summary, in recognizing noisy children’s speech with ASR systems trained with clean adult speech, considerable improvements in the recognition performance can be achieved by combining data augmentation based on noise addition in the system training phase and time-scale modification based on modifying F0 and speaking rate of children’s speech in the testing phase.

Download Full-text

A Spectral Variation Function for Variable Time-Scale Modification of Speech

10.1109/ncc52529.2021.9530088 ◽

2021 ◽

Author(s):

Pramod H. Kachare ◽

Prem C. Pandey

Keyword(s):

Spectral Variation ◽

Variable Time ◽

Variation Function ◽

Time Scale Modification

Download Full-text

Vowel Non-Vowel Based Spectral Warping and Time Scale Modification for Improvement in Children’s ASR

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414116 ◽

2021 ◽

Author(s):

Hemant Kathania ◽

Avinash Kumar ◽

Mikko Kurimo

Keyword(s):

Time Scale Modification

Download Full-text

An objective measure of quality for time-scale modification of audio

The Journal of the Acoustical Society of America ◽

10.1121/10.0003753 ◽

2021 ◽

Vol 149 (3) ◽

pp. 1843-1854

Author(s):

Timothy Roberts ◽

Kuldip K. Paliwal

Keyword(s):

Objective Measure ◽

Time Scale Modification

Download Full-text

A time-scale modification dataset with subjective quality labels

The Journal of the Acoustical Society of America ◽

10.1121/10.0001567 ◽

2020 ◽

Vol 148 (1) ◽

pp. 201-210

Author(s):

Timothy Roberts ◽

Kuldip K. Paliwal

Keyword(s):

Subjective Quality ◽

Time Scale Modification

Download Full-text

Time-Scale Modification Using Fuzzy Epoch-Synchronous Overlap-Add (FESOLA)

2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) ◽

10.1109/waspaa.2019.8937258 ◽

2019 ◽

Author(s):

Timothy Roberts ◽

Kuldip K. Paliwal

Keyword(s):

Time Scale Modification

Download Full-text

Speaking-Rate Adaptation of Automatic Speech Recognition System through Fuzzy Classification based Time-Scale Modification

2019 National Conference on Communications (NCC) ◽

10.1109/ncc.2019.8732255 ◽

2019 ◽

Author(s):

S. Shahnawazuddin ◽

Waquar Ahmad ◽

Hemant K. Kathania ◽

Nagaraj Adiga ◽

B. Tarun Sai

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Rate Adaptation ◽

Recognition System ◽

Fuzzy Classification ◽

Speaking Rate ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Time Scale Modification

Download Full-text

Frequency Dependent Time-Scale Modification

2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS) ◽

10.1109/icspcs.2018.8631764 ◽

2018 ◽

Author(s):

Timothy Roberts ◽

Kuldip K. Paliwal

Keyword(s):

Frequency Dependent ◽

Time Scale Modification

Download Full-text

Stereo Time-Scale Modification Using Sum and Difference Transformation

2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS) ◽

10.1109/icspcs.2018.8631776 ◽

2018 ◽

Author(s):

Timothy Roberts ◽

Kuldip K. Paliwal

Keyword(s):

Time Scale Modification

Download Full-text

Effect Of Using Window Type On Time Scale Modification On Voice Recording Using Waveform Similarity Overlap and Add

2018 International Seminar on Intelligent Technology and Its Applications (ISITIA) ◽

10.1109/isitia.2018.8711203 ◽

2018 ◽

Author(s):

Nanda Saputri ◽

Yoyon K. Suprapto ◽

Diah P.Wulandari

Keyword(s):

Time Scale Modification ◽

Waveform Similarity ◽

Voice Recording

Download Full-text