weighted finite state transducers Latest Research Papers

AbstractWeighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text

Composition of Weighted Finite Transducers in MapReduce

10.21203/rs.3.rs-101167/v1 ◽

2020 ◽

Author(s):

Bilal Elghadyry ◽

Faissal Ouardi ◽

Sébastien Verel

Keyword(s):

Speech Processing ◽

Large Scale ◽

Large Scale Data ◽

Finite State Transducers ◽

Wide Range ◽

Finite State ◽

Common Operation ◽

Efficient Representation ◽

Weighted Finite State Transducers ◽

Np Hardness

Abstract Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text

Applying Weighted Finite State Transducers and Ripple Down Rules for Myanmar Name Romanization

2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) ◽

10.1109/ecti-con49241.2020.9158231 ◽

2020 ◽

Author(s):

Wint Theingi Zaw ◽

Shwe Sin Moe ◽

Ye Kyaw Thu ◽

Nyein Nyein Oo

Keyword(s):

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers ◽

Ripple Down Rules

Download Full-text

Weighted finite-state transducers for normalization of historical texts

Natural Language Engineering ◽

10.1017/s1351324918000505 ◽

2019 ◽

Vol 25 (2) ◽

pp. 307-321 ◽

Cited By ~ 1

Author(s):

Izaskun Etxeberria ◽

Iñaki Alegria ◽

Larraitz Uria

Keyword(s):

Channel Model ◽

Simple Solution ◽

Noisy Channel ◽

Training Corpus ◽

Historical Texts ◽

Finite State Transducers ◽

Finite State ◽

Word Forms ◽

Weighted Finite State Transducers ◽

Learning Relations

AbstractThis paper presents a study about methods for normalization of historical texts. The aim of these methods is learning relations between historical and contemporary word forms. We have compiled training and test corpora for different languages and scenarios, and we have tried to read the results related to the features of the corpora and languages. Our proposed method, based on weighted finite-state transducers, is compared to previously published ones. Our method learns to map phonological changes using a noisy channel model; it is a simple solution that can use a limited amount of supervision in order to achieve adequate performance. The compiled corpora are ready to be used for other researchers in order to compare results. Concerning the amount of supervision for the task, we investigate how the size of training corpus affects the results and identify some interesting factors to anticipate the difficulty of the task.

Download Full-text

Speech Activity Detection in online broadcast transcription using Deep Neural Networks and Weighted Finite State Transducers

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7953200 ◽

2017 ◽

Cited By ~ 3

Author(s):

Lukas Mateju ◽

Petr Cerva ◽

Jindrich Zdansky ◽

Jiri Malek

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Activity Detection ◽

Speech Activity ◽

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers ◽

Speech Activity Detection

Download Full-text

Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers

10.18653/v1/w17-4002 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lars Hellsten ◽

Brian Roark ◽

Prasoon Goyal ◽

Cyril Allauzen ◽

Françoise Beaufays ◽

...

Keyword(s):

Keyboard Input ◽

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers

Download Full-text

Modular non-repeating codes for DNA storage

10.1101/057448 ◽

2016 ◽

Cited By ~ 2

Author(s):

Ian Holmes

Keyword(s):

Dna Sequence ◽

Block Code ◽

Information Storage ◽

State Machines ◽

Sequencing Technologies ◽

Finite State Transducers ◽

Dna Storage ◽

Finite State ◽

Simulation Results ◽

Weighted Finite State Transducers

1AbstractWe describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at https://github.com/ihh/dnastore, with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of substitutions, short duplications and deletions.

Download Full-text

A system for automatic alignment of broadcast media captions using weighted finite-state transducers

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) ◽

10.1109/asru.2015.7404861 ◽

2015 ◽

Cited By ~ 1

Author(s):

Peter Bell ◽

Steve Renals

Keyword(s):

Broadcast Media ◽

Finite State Transducers ◽

Automatic Alignment ◽

Finite State ◽

Weighted Finite State Transducers

Download Full-text

The Kestrel TTS text normalization system

Natural Language Engineering ◽

10.1017/s1351324914000175 ◽

2014 ◽

Vol 21 (3) ◽

pp. 333-353 ◽

Cited By ~ 12

Author(s):

PETER EBDEN ◽

RICHARD SPROAT

Keyword(s):

Speech Synthesis ◽

The Core ◽

System A ◽

Input Text ◽

Finite State ◽

Multiple Devices ◽

Text To Speech Synthesis ◽

Weighted Finite State Transducers ◽

Client Side ◽

Text Normalization

AbstractThis paper describes the Kestrel text normalization system, a component of the Google text-to-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars that are compiled into libraries of weighted finite-state transducers (WFSTs). While the use of WFSTs for text normalization is itself not new, Kestrel differs from previous systems in its separation of the initialtokenization and classificationphase of analysis fromverbalization. Input text is first tokenized and different tokens classified using WFSTs. As part of the classification, detectedsemiotic classes– expressions such as currency amounts, dates, times, measure phases, are parsed into protocol buffers (https://code.google.com/p/protobuf/). The protocol buffers are then verbalized, with possible reordering of the elements, again using WFSTs. This paper describes the architecture of Kestrel, the protocol buffer representations of semiotic classes, and presents some examples of grammars for various languages. We also discuss applications and deployments of Kestrel as part of the Google TTS system, which runs on both server and client side on multiple devices, and is used daily by millions of people in nineteen languages and counting.

Download Full-text

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Computational Linguistics ◽

10.1162/coli_a_00198 ◽

2014 ◽

Vol 40 (4) ◽

pp. 733-761

Author(s):

Richard Sproat ◽

Mahsa Yarmohammadi ◽

Izhak Shafran ◽

Brian Roark

Keyword(s):

Language Processing ◽

Language Model ◽

Speech And Language ◽

Word Sequence ◽

Part Of Speech ◽

Finite State ◽

On Line ◽

Word Lattice ◽

Weighted Finite State Transducers ◽

Speech And Language Processing

This paper explores lexicographic semirings and their application to problems in speech and language processing. Specifically, we present two instantiations of binary lexicographic semirings, one involving a pair of tropical weights, and the other a tropical weight paired with a novel string semiring we term the categorial semiring. The first of these is used to yield an exact encoding of backoff models with epsilon transitions. This lexicographic language model semiring allows for off-line optimization of exact models represented as large weighted finite-state transducers in contrast to implicit (on-line) failure transition representations. We present empirical results demonstrating that, even in simple intersection scenarios amenable to the use of failure transitions, the use of the more powerful lexicographic semiring is competitive in terms of time of intersection. The second of these lexicographic semirings is applied to the problem of extracting, from a lattice of word sequences tagged for part of speech, only the single best-scoring part of speech tagging for each word sequence. We do this by incorporating the tags as a categorial weight in the second component of a 〈Tropical, Categorial〉 lexicographic semiring, determinizing the resulting word lattice acceptor in that semiring, and then mapping the tags back as output labels of the word lattice transducer. We compare our approach to a competing method due to Povey et al. (2012).

Download Full-text

weighted finite state transducers
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Composition of weighted finite transducers in MapReduce

Composition of Weighted Finite Transducers in MapReduce

Applying Weighted Finite State Transducers and Ripple Down Rules for Myanmar Name Romanization

Weighted finite-state transducers for normalization of historical texts

Speech Activity Detection in online broadcast transcription using Deep Neural Networks and Weighted Finite State Transducers

Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers

Modular non-repeating codes for DNA storage

A system for automatic alignment of broadcast media captions using weighted finite-state transducers

The Kestrel TTS text normalization system

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

Export Citation Format

weighted finite state transducersRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Composition of weighted finite transducers in MapReduce

Composition of Weighted Finite Transducers in MapReduce

Applying Weighted Finite State Transducers and Ripple Down Rules for Myanmar Name Romanization

Weighted finite-state transducers for normalization of historical texts

Speech Activity Detection in online broadcast transcription using Deep Neural Networks and Weighted Finite State Transducers

Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers

Modular non-repeating codes for DNA storage

A system for automatic alignment of broadcast media captions using weighted finite-state transducers

The Kestrel TTS text normalization system

Applications of Lexicographic Semirings to Problems in Speech and Language Processing

weighted finite state transducers
Recently Published Documents