prosody modification
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 10)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 35 (3) ◽  
pp. 235-242
Author(s):  
Vivek Bhardwaj ◽  
Vinay Kukreja ◽  
Amitoj Singh

Most of the automatic speech recognition (ASR) systems are trained using adult speech due to the less availability of the children's speech dataset. The speech recognition rate of such systems is very less when tested using the children's speech, due to the presence of the inter-speaker acoustic variabilities between the adults and children's speech. These inter-speaker acoustic variabilities are mainly because of the higher pitch and lower speaking rate of the children. Thus, the main objective of the research work is to increase the speech recognition rate of the Punjabi-ASR system by reducing these inter-speaker acoustic variabilities with the help of prosody modification and speaker adaptive training. The pitch period and duration (speaking rate) of the speech signal can be altered with prosody modification without influencing the naturalness, message of the signal and helps to overcome the acoustic variations present in the adult's and children's speech. The developed Punjabi-ASR system is trained with the help of adult speech and prosody-modified adult speech. This prosody modified speech overcomes the massive need for children's speech for training the ASR system and improves the recognition rate. Results show that prosody modification and speaker adaptive training helps to minimize the word error rate (WER) of the Punjabi-ASR system to 8.79% when tested using children's speech.


2021 ◽  
Vol 4 (2) ◽  
pp. 59-72
Author(s):  
Philippe Boula de Mareüil ◽  
Albert Rilliard ◽  
Fanny Ivent ◽  
Varvara Kozhevina

In the south of France, the French language has developed in contact with Occitan in Provence and Languedoc, in contact with Catalan in Roussillon. This study reports on a first analysis of data collected in these regions, during a field survey carried out among speakers of Occitan and Catalan, in addition to French. In particular, we compared the prosody of yes/no questions ending in a word stressed on the penultimate syllable (e.g caserna ‘barracks’ in Occitan or Catalan, caserne with a pronounced final schwa in southern French). On the last two syllables of questions, it turns out that the rising-rising pitch pattern is the most common and, according to a perception experiment using prosody modification/resynthesis, that it is preferred to a falling-rising pattern by southern French listeners (without significant differences between Provence and Languedoc). A falling-rising pattern was also observed in Roussillon, possibly resulting from a prosodic transfer from Catalan to French. It was not associated with that region by southern French listeners who took part in a second perceptual experiment. Yet, the intonation patterns found may have different functions: the rising-rising pattern, especially, is most often interpreted as a confirmation query.


2020 ◽  
Vol 131 ◽  
pp. 213-218
Author(s):  
S Shahnawazuddin ◽  
Nagaraj Adiga ◽  
Hemant Kumar Kathania ◽  
B Tarun Sai

Author(s):  
Md Shah Fahad ◽  
Shruti Gupta ◽  
Abhinav ◽  
Shreya Singh ◽  
Akshay Deepak

Background: Emotional speech synthesis is the process of synthesising emotions in a neutral speech – potentially generated by a text-to-speech system – to make an artificial human-machine interaction human-like. It typically involves analysis and modification of speech parameters. Existing work on speech synthesis involving modification of prosody parameters does so at sentence, word, and syllable level. However, further fine-grained modification at vowel level has not been explored yet, thereby motivating our work. Objective: To explore prosody parameters at vowel level for emotion synthesis. Method: Our work modifies prosody features (duration, pitch, and intensity) for emotion synthesis. Specifically, it modifies the duration parameter of vowel-like and pause regions and the pitch and intensity parameters of only vowel-like regions. The modification is gender specific using emotional speech templates stored in a database and done using pitch synchronous overlap and add (PSOLA) method. Result: Comparison was done with the existing work on prosody modification at sentence, word and syllable label on IITKGP-SEHSC database. Improvements of 8.14%, 13.56%, and 2.80% for emotions angry, happy, and fear respectively were obtained for the relative mean opinion score. This was due to: (1) prosody modification at vowel-level being more fine-grained than sentence, word, or syllable level and (2) prosody patterns not being generated for consonant regions because vocal cords do not vibrate during consonant production. Conclusion: Our proposed work shows that an emotional speech generated using prosody modification at vowel-level is more convincible than prosody modification at sentence, word and syllable level.


2019 ◽  
Vol 93 ◽  
pp. 34-42
Author(s):  
S. Shahnawazuddin ◽  
Nagaraj Adiga ◽  
B Tarun Sai ◽  
Waquar Ahmad ◽  
Hemant K. Kathania

Author(s):  
Md Shah Fahad ◽  
Shreya Singh ◽  
Shruti Gupta ◽  
Akshay Deepak ◽  
Abhinav Abhinav

Sign in / Sign up

Export Citation Format

Share Document