AbstractGrapheme to phoneme conversion is one of the
main subsystems of Text-to-Speech (TTS) systems. Converting
sequence of written words to their corresponding
phoneme sequences for the Persian language is more challenging
than other languages; because in the standard orthography
of this language the short vowels are omitted
and the pronunciation ofwords depends on their positions
in a sentence. Common approaches used in the Persian
commercial TTS systems have several modules and complicated
models for natural language processing and homograph
disambiguation that make the implementation
harder as well as reducing the overall precision of system.
In this paper we define the grapheme-to-phoneme conversion
as a sequential labeling problem; and use the modified
Recurrent Neural Networks (RNN) to create a smart
and integrated model for this purpose. The recurrent networks
are modified to be bidirectional and equipped with
Long-Short Term Memory (LSTM) blocks to acquire most
of the past and future contextual information for decision
making. The experiments conducted in this paper show
that in addition to having a unified structure the bidirectional
RNN-LSTM has a good performance in recognizing
the pronunciation of the Persian sentences with the precision
more than 98 percent.