N-WAY COMPOSITION OF WEIGHTED FINITE-STATE TRANSDUCERS

Composition of weighted transducers is a fundamental algorithm used in many applications, including for computing complex edit-distances between automata, or string kernels in machine learning, or to combine different components of a speech recognition, speech synthesis, or information extraction system. We present a generalization of the composition of weighted transducers, n-way composition, which is dramatically faster in practice than the standard composition algorithm when combining more than two transducers. The worst-case complexity of our algorithm for composing three transducers T1, T2, and T3 resulting in T, is O(|T|Q min (d(T1)d(T3), d(T2)) + |T|E), where |·|Q denotes the number of states, |·|E the number of transitions, and d(·) the maximum out-degree. As in regular composition, the use of perfect hashing requires a pre-processing step with linear-time expected complexity in the size of the input transducers. In many cases, this approach significantly improves on the complexity of standard composition. Our algorithm also leads to a dramatically faster composition in practice. Furthermore, standard composition can be obtained as a special case of our algorithm. We report the results of several experiments demonstrating this improvement. These theoretical and empirical improvements significantly enhance performance in the applications already mentioned.

Download Full-text

Hidden semi-Markov Model based earthquake classification system using Weighted Finite-State Transducers

Nonlinear Processes in Geophysics ◽

10.5194/npg-18-81-2011 ◽

2011 ◽

Vol 18 (1) ◽

pp. 81-89 ◽

Cited By ~ 15

Author(s):

M. Beyreuther ◽

J. Wassermann

Keyword(s):

Seismic Data ◽

Speech Synthesis ◽

Markov Models ◽

Characteristic Functions ◽

Time Dependency ◽

General Applicability ◽

Earthquake Detection ◽

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers

Abstract. Automatic earthquake detection and classification is required for efficient analysis of large seismic datasets. Such techniques are particularly important now because access to measures of ground motion is nearly unlimited and the target waveforms (earthquakes) are often hard to detect and classify. Here, we propose to use models from speech synthesis which extend the double stochastic models from speech recognition by integrating a more realistic duration of the target waveforms. The method, which has general applicability, is applied to earthquake detection and classification. First, we generate characteristic functions from the time-series. The Hidden semi-Markov Models are estimated from the characteristic functions and Weighted Finite-State Transducers are constructed for the classification. We test our scheme on one month of continuous seismic data, which corresponds to 370 151 classifications, showing that incorporating the time dependency explicitly in the models significantly improves the results compared to Hidden Markov Models.

Download Full-text

Multilingual text analysis for text-to-speech synthesis

Natural Language Engineering ◽

10.1017/s1351324997001654 ◽

1996 ◽

Vol 2 (4) ◽

pp. 369-380 ◽

Cited By ~ 14

Author(s):

RICHARD SPROAT

Keyword(s):

Text Analysis ◽

Speech Synthesis ◽

Text To Speech ◽

Finite State Transducers ◽

Finite State ◽

Text To Speech Synthesis ◽

Weighted Finite State Transducers ◽

Multilingual Text ◽

Phonological Rules

We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite state transducers, which serves as the text analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.

Download Full-text

The Kestrel TTS text normalization system

Natural Language Engineering ◽

10.1017/s1351324914000175 ◽

2014 ◽

Vol 21 (3) ◽

pp. 333-353 ◽

Cited By ~ 12

Author(s):

PETER EBDEN ◽

RICHARD SPROAT

Keyword(s):

Speech Synthesis ◽

The Core ◽

System A ◽

Input Text ◽

Finite State ◽

Multiple Devices ◽

Text To Speech Synthesis ◽

Weighted Finite State Transducers ◽

Client Side ◽

Text Normalization

AbstractThis paper describes the Kestrel text normalization system, a component of the Google text-to-speech synthesis (TTS) system. At the core of Kestrel are text-normalization grammars that are compiled into libraries of weighted finite-state transducers (WFSTs). While the use of WFSTs for text normalization is itself not new, Kestrel differs from previous systems in its separation of the initialtokenization and classificationphase of analysis fromverbalization. Input text is first tokenized and different tokens classified using WFSTs. As part of the classification, detectedsemiotic classes– expressions such as currency amounts, dates, times, measure phases, are parsed into protocol buffers (https://code.google.com/p/protobuf/). The protocol buffers are then verbalized, with possible reordering of the elements, again using WFSTs. This paper describes the architecture of Kestrel, the protocol buffer representations of semiotic classes, and presents some examples of grammars for various languages. We also discuss applications and deployments of Kestrel as part of the Google TTS system, which runs on both server and client side on multiple devices, and is used daily by millions of people in nineteen languages and counting.

Download Full-text

Almost-linear time decoding algorithm for topological codes

Quantum ◽

10.22331/q-2021-12-02-595 ◽

2021 ◽

Vol 5 ◽

pp. 595

Author(s):

Nicolas Delfosse ◽

Naomi H. Nickerson

Keyword(s):

Minimum Distance ◽

Large Scale ◽

Quantum Computer ◽

Linear Time ◽

Decoding Algorithm ◽

Worst Case ◽

Case Complexity ◽

Fast Decoding ◽

Worst Case Complexity ◽

Toric Code

In order to build a large scale quantum computer, one must be able to correct errors extremely fast. We design a fast decoding algorithm for topological codes to correct for Pauli errors and erasure and combination of both errors and erasure. Our algorithm has a worst case complexity of O(nα(n)), where n is the number of physical qubits and α is the inverse of Ackermann's function, which is very slowly growing. For all practical purposes, α(n)≤3. We prove that our algorithm performs optimally for errors of weight up to (d−1)/2 and for loss of up to d−1 qubits, where d is the minimum distance of the code. Numerically, we obtain a threshold of 9.9% for the 2d-toric code with perfect syndrome measurements and 2.6% with faulty measurements.

Download Full-text

Composition of weighted finite transducers in MapReduce

Journal Of Big Data ◽

10.1186/s40537-020-00397-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Bilal Elghadyry ◽

Faissal Ouardi ◽

Sébastien Verel

Keyword(s):

Speech Processing ◽

Large Scale ◽

Large Scale Data ◽

Finite State Transducers ◽

Wide Range ◽

Finite State ◽

Common Operation ◽

Efficient Representation ◽

Weighted Finite State Transducers ◽

Np Hardness

AbstractWeighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text

Average case complexity under the universal distribution equals worst-case complexity

Information Processing Letters ◽

10.1016/0020-0190(92)90138-l ◽

1992 ◽

Vol 42 (3) ◽

pp. 145-149 ◽

Cited By ~ 32

Author(s):

Ming Li ◽

Paul M.B. Vitányi

Keyword(s):

Worst Case ◽

Average Case ◽

Case Complexity ◽

Universal Distribution ◽

Average Case Complexity ◽

Worst Case Complexity

Download Full-text

The worst case complexity of the fredholm equation of the second kind with non-periodic free term and noise information

Numerical Functional Analysis and Optimization ◽

10.1080/01630569808816831 ◽

1998 ◽

Vol 19 (3-4) ◽

pp. 329-343 ◽

Cited By ~ 2

Author(s):

Tianzi Jiang

Keyword(s):

Fredholm Equation ◽

Free Term ◽

Worst Case ◽

Case Complexity ◽

Worst Case Complexity

Download Full-text

European language translation with weighted finite state transducers

Proceedings of the Third Workshop on Statistical Machine Translation - StatMT '08 ◽

10.3115/1626394.1626410 ◽

2008 ◽

Cited By ~ 1

Author(s):

Graeme Blackwood ◽

Adrià de Gispert ◽

Jamie Brunning ◽

William Byrne

Keyword(s):

Language Translation ◽

European Language ◽

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers

Download Full-text

Inductive Synthesis for Probabilistic Programs Reaches New Horizons

Tools and Algorithms for the Construction and Analysis of Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-72016-2_11 ◽

2021 ◽

pp. 191-209

Author(s):

Roman Andriushchenko ◽

Milan Češka ◽

Sebastian Junges ◽

Joost-Pieter Katoen

Keyword(s):

Deductive Reasoning ◽

Synthesis Process ◽

Worst Case ◽

New Horizons ◽

The Family ◽

Pruning Strategy ◽

Finite State ◽

Novel Method ◽

Partially Observable ◽

Probabilistic Programs

AbstractThis paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.

Download Full-text

Advancing Clinical Cohort Selection with Genomics Analysis on a Distributed Platform

10.21203/rs.2.9249/v1 ◽

2019 ◽

Author(s):

Jaclyn Marjorie Smith ◽

Melvin Lathara ◽

Hollis Wright ◽

Brian Hill ◽

Nalini Ganapati ◽

...

Keyword(s):

Precision Medicine ◽

Large Scale ◽

Linear Time ◽

Distributed Storage ◽

Treatment Options ◽

Ease Of Use ◽

Time Data ◽

Worst Case ◽

Research Perspective ◽

Medicine Analysis

Abstract Background The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. Results We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. Conclusions This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine.

Download Full-text