scholarly journals A Multistrategy Approach to Improving Pronunciation by Analogy

2000 ◽  
Vol 26 (2) ◽  
pp. 195-219 ◽  
Author(s):  
Yannick Marchand ◽  
Robert I. Damper

Pronunciation by analogy (PbA) is a data-driven method for relating letters to sound, with potential application to next-generation text-to-speech systems. This paper extends previous work on PbA in several directions. First, we have included “full” pattern matching between input letter string and dictionary entries, as well as including lexical stress in letter-to-phoneme conversion. Second, we have extended the method to phoneme-to-letter conversion. Third, and most important, we have experimented with multiple, different strategies for scoring the candidate pronunciations. Individual scores for each strategy are obtained on the basis of rank and either multiplied or summed to produce a final, overall score. Five strategies have been studied and results obtained from all 31 possible combinations. The two combination methods perform comparably, with the product rule only very marginally superior to the sum rule. Nonparametric statistical analysis reveals that performance improves as more strategies are included in the combination: this trend is very highly significant (p < 0:0005). Accordingly for letter-to-phoneme conversion, best results are obtained when all five strategies are combined: word accuracy is raised to 65.5% relative to 61.7% for our best previous result and 63.0% for the best-performing single strategy. These improvements are very highly significant (p ∼ 0 and p < 0:00011 respectively). Similar results were found for phoneme-to-letter and letter-to-stress conversion, although the former was an easier problem for PbA than letter-to-phoneme conversion and the latter was harder. The main sources of error for the multistrategy approach are very similar to those for the best single strategy, and mostly involve vowel letters and phonemes.

PLoS ONE ◽  
2016 ◽  
Vol 11 (1) ◽  
pp. e0146490 ◽  
Author(s):  
Huan-Kai Peng ◽  
Hao-Chih Lee ◽  
Jia-Yu Pan ◽  
Radu Marculescu

2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Hao Yu ◽  
Qingquan Jia ◽  
Ning Wang ◽  
Haiyan Dong

This study introduces a data-driven modeling strategy for smart grid power quality (PQ) coupling assessment based on time series pattern matching to quantify the influence of single and integrated disturbance among nodes in different pollution patterns. Periodic and random PQ patterns are constructed by using multidimensional frequency-domain decomposition for all disturbances. A multidimensional piecewise linear representation based on local extreme points is proposed to extract the patterns features of single and integrated disturbance in consideration of disturbance variation trend and severity. A feature distance of pattern (FDP) is developed to implement pattern matching on univariate PQ time series (UPQTS) and multivariate PQ time series (MPQTS) to quantify the influence of single and integrated disturbance among nodes in the pollution patterns. Case studies on a 14-bus distribution system are performed and analyzed; the accuracy and applicability of the FDP in the smart grid PQ coupling assessment are verified by comparing with other time series pattern matching methods.


2015 ◽  
Author(s):  
Mahsa Sadat Elyasi Langarani ◽  
Jan van Santen ◽  
Seyed Hamidreza Mohammadi ◽  
Alexander Kain

2008 ◽  
Vol 14 (4) ◽  
pp. 527-546 ◽  
Author(s):  
TASANAWAN SOONKLANG ◽  
ROBERT I. DAMPER ◽  
YANNICK MARCHAND

AbstractAutomatic pronunciation of unknown words (i.e., those not in the system dictionary) is a difficult problem in text-to-speech (TTS) synthesis. Currently, many data-driven approaches have been applied to the problem, as a backup strategy for those cases where dictionary matching fails. The difficulty of the problem depends on the complexity of spelling-to-sound mappings according to the particular writing system of the language. Hence, the degree of success achieved varies widely across languages but also across dictionaries, even for the same language with the same method. Further, the sizes of the training and test sets are an important consideration in data-driven approaches. In this paper, we study the variation of letter-to-phoneme transcription accuracy across seven European languages with twelve different lexicons. We also study the relationship between the size of dictionary and the accuracy obtained. The largest dictionaries of each language have been partitioned into ten approximately equal-sized subsets and combined to give ten different-sized test sets. In view of its superior performance in previous work, the transcription method used is pronunciation by analogy (PbA). Best results are obtained for Spanish, generally believed to have a very regular (‘shallow’) orthography, and poorest results for English, a language whose irregular spelling system is legendary. For those languages for which multiple dictionaries were available (i.e., French and English), results were found to vary across dictionaries. For the relationship between dictionary size and transcription accuracy, we find that as dictionary size grows, so performance grows monotonically. However, the performance gain decelerates (tends to saturate) as the dictionary increases in size; the relation can simply be described by a logarithmic regression, one parameter of which (α) can be taken as quantifying the depth of orthography of a language. We find that α for a language is significantly correlated with transcription performance on a small dictionary (approximately 10,000 words) for that language, but less so for asymptotic performance. This may be because our measure of asymptotic performance is unreliable, being extrapolated from the fitted logarithmic regression.


Sign in / Sign up

Export Citation Format

Share Document