A Multistrategy Approach to Improving Pronunciation by Analogy

Pronunciation by analogy (PbA) is a data-driven method for relating letters to sound, with potential application to next-generation text-to-speech systems. This paper extends previous work on PbA in several directions. First, we have included “full” pattern matching between input letter string and dictionary entries, as well as including lexical stress in letter-to-phoneme conversion. Second, we have extended the method to phoneme-to-letter conversion. Third, and most important, we have experimented with multiple, different strategies for scoring the candidate pronunciations. Individual scores for each strategy are obtained on the basis of rank and either multiplied or summed to produce a final, overall score. Five strategies have been studied and results obtained from all 31 possible combinations. The two combination methods perform comparably, with the product rule only very marginally superior to the sum rule. Nonparametric statistical analysis reveals that performance improves as more strategies are included in the combination: this trend is very highly significant (p < 0:0005). Accordingly for letter-to-phoneme conversion, best results are obtained when all five strategies are combined: word accuracy is raised to 65.5% relative to 61.7% for our best previous result and 63.0% for the best-performing single strategy. These improvements are very highly significant (p ∼ 0 and p < 0:00011 respectively). Similar results were found for phoneme-to-letter and letter-to-stress conversion, although the former was an easier problem for PbA than letter-to-phoneme conversion and the latter was harder. The main sources of error for the multistrategy approach are very similar to those for the best single strategy, and mostly involve vowel letters and phonemes.

Download Full-text

An evaluation of Mongolian data-driven Text-to-Speech

2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) ◽

10.1109/icsda.2013.6709881 ◽

2013 ◽

Author(s):

Chagnaa Altangerel ◽

Jaimai Purev ◽

Kerey Yesyenbyek ◽

Chatchawarn Hansakunbuntheung

Keyword(s):

Data Driven ◽

Text To Speech

Download Full-text

A Hybrid Approach to Pattern Matching for Text-to-Speech Conversion

International Conference on Advances in Pattern Recognition ◽

10.1007/978-1-4471-0833-7_25 ◽

1999 ◽

pp. 245-254

Author(s):

Chew Lim Tan ◽

Yan Rong Chen ◽

Paul Hong Jyh Wu

Keyword(s):

Pattern Matching ◽

Hybrid Approach ◽

Text To Speech

Download Full-text

Lexical stress assignment model for the Slovenian text-to-speech synthesis system

Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004. ◽

10.1109/isimp.2004.1434156 ◽

2005 ◽

Author(s):

T. Sef

Keyword(s):

Speech Synthesis ◽

Lexical Stress ◽

Text To Speech ◽

Synthesis System ◽

Stress Assignment ◽

Assignment Model ◽

Text To Speech Synthesis ◽

Lexical Stress Assignment

Download Full-text

Statistical methods in data-driven modeling of Spanish prosody for text to speech

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 ◽

10.1109/icslp.1996.607870 ◽

2002 ◽

Cited By ~ 3

Author(s):

E. Lopez-Gonzalo ◽

J.M. Rodriguez-Garcia

Keyword(s):

Statistical Methods ◽

Data Driven ◽

Text To Speech ◽

Data Driven Modeling

Download Full-text

Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization

PLoS ONE ◽

10.1371/journal.pone.0146490 ◽

2016 ◽

Vol 11 (1) ◽

pp. e0146490 ◽

Cited By ~ 1

Author(s):

Huan-Kai Peng ◽

Hao-Chih Lee ◽

Jia-Yu Pan ◽

Radu Marculescu

Keyword(s):

Pattern Matching ◽

Social Dynamics ◽

Profit Maximization ◽

Data Driven

Download Full-text

A Data-Driven Modeling Strategy for Smart Grid Power Quality Coupling Assessment Based on Time Series Pattern Matching

Mathematical Problems in Engineering ◽

10.1155/2018/2765945 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Hao Yu ◽

Qingquan Jia ◽

Ning Wang ◽

Haiyan Dong

Keyword(s):

Time Series ◽

Smart Grid ◽

Power Quality ◽

Pattern Matching ◽

Distribution System ◽

Piecewise Linear ◽

Extreme Points ◽

Data Driven ◽

Modeling Strategy ◽

Data Driven Modeling

This study introduces a data-driven modeling strategy for smart grid power quality (PQ) coupling assessment based on time series pattern matching to quantify the influence of single and integrated disturbance among nodes in different pollution patterns. Periodic and random PQ patterns are constructed by using multidimensional frequency-domain decomposition for all disturbances. A multidimensional piecewise linear representation based on local extreme points is proposed to extract the patterns features of single and integrated disturbance in consideration of disturbance variation trend and severity. A feature distance of pattern (FDP) is developed to implement pattern matching on univariate PQ time series (UPQTS) and multivariate PQ time series (MPQTS) to quantify the influence of single and integrated disturbance among nodes in the pollution patterns. Case studies on a 14-bus distribution system are performed and analyzed; the accuracy and applicability of the FDP in the smart grid PQ coupling assessment are verified by comparing with other time series pattern matching methods.

Download Full-text

Data-driven foot-based intonation generator for text-to-speech synthesis

10.21437/interspeech.2015-370 ◽

2015 ◽

Author(s):

Mahsa Sadat Elyasi Langarani ◽

Jan van Santen ◽

Seyed Hamidreza Mohammadi ◽

Alexander Kain

Keyword(s):

Speech Synthesis ◽

Data Driven ◽

Text To Speech ◽

Text To Speech Synthesis

Download Full-text

Multilingual pronunciation by analogy

Natural Language Engineering ◽

10.1017/s1351324908004737 ◽

2008 ◽

Vol 14 (4) ◽

pp. 527-546 ◽

Cited By ~ 2

Author(s):

TASANAWAN SOONKLANG ◽

ROBERT I. DAMPER ◽

YANNICK MARCHAND

Keyword(s):

Difficult Problem ◽

Data Driven ◽

Superior Performance ◽

Asymptotic Performance ◽

Dictionary Matching ◽

European Languages ◽

Test Sets ◽

Unknown Words ◽

The Relationship ◽

Pronunciation By Analogy

AbstractAutomatic pronunciation of unknown words (i.e., those not in the system dictionary) is a difficult problem in text-to-speech (TTS) synthesis. Currently, many data-driven approaches have been applied to the problem, as a backup strategy for those cases where dictionary matching fails. The difficulty of the problem depends on the complexity of spelling-to-sound mappings according to the particular writing system of the language. Hence, the degree of success achieved varies widely across languages but also across dictionaries, even for the same language with the same method. Further, the sizes of the training and test sets are an important consideration in data-driven approaches. In this paper, we study the variation of letter-to-phoneme transcription accuracy across seven European languages with twelve different lexicons. We also study the relationship between the size of dictionary and the accuracy obtained. The largest dictionaries of each language have been partitioned into ten approximately equal-sized subsets and combined to give ten different-sized test sets. In view of its superior performance in previous work, the transcription method used is pronunciation by analogy (PbA). Best results are obtained for Spanish, generally believed to have a very regular (‘shallow’) orthography, and poorest results for English, a language whose irregular spelling system is legendary. For those languages for which multiple dictionaries were available (i.e., French and English), results were found to vary across dictionaries. For the relationship between dictionary size and transcription accuracy, we find that as dictionary size grows, so performance grows monotonically. However, the performance gain decelerates (tends to saturate) as the dictionary increases in size; the relation can simply be described by a logarithmic regression, one parameter of which (α) can be taken as quantifying the depth of orthography of a language. We find that α for a language is significantly correlated with transcription performance on a small dictionary (approximately 10,000 words) for that language, but less so for asymptotic performance. This may be because our measure of asymptotic performance is unreliable, being extrapolated from the fitted logarithmic regression.

Download Full-text