Finite state methods in natural language processing

Finite state methods have been in common use in various areas of natural language processing (NLP) for many years. A series of specialized workshops in this area illustrates this. In 1996, András Kornai organized a very successful workshop entitled Extended Finite State Models of Language. One of the results of that workshop was a special issue of Natural Language Engineering (Volume 2, Number 4). In 1998, Kemal Oflazer organized a workshop called Finite State Methods in Natural Language Processing. A selection of submissions for this workshop were later included in a special issue of Computational Linguistics (Volume 26, Number 1). Inspired by these events, Lauri Karttunen, Kimmo Koskenniemi and Gertjan van Noord took the initiative for a workshop on finite state methods in NLP in Helsinki, as part of the European Summer School in Language, Logic and Information. As a related special event, the 20th anniversary of two-level morphology was celebrated. The appreciation of these events led us to believe that once again it should be possible, with some additional submissions, to compose an interesting special issue of this journal.

Download Full-text

Finite-state methods and models in natural language processing

Natural Language Engineering ◽

10.1017/s1351324911000015 ◽

2011 ◽

Vol 17 (2) ◽

pp. 141-144

Author(s):

ANSSI YLI-JYRÄ ◽

ANDRÁS KORNAI ◽

JACQUES SAKAROVITCH

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Current Issue ◽

Special Issue ◽

The Public ◽

The Past ◽

Finite State ◽

Final Selection

For the past two decades, specialised events on finite-state methods have been successful in presenting interesting studies on natural language processing to the public through journals and collections. The FSMNLP workshops have become well-known among researchers and are now the main forum of the Association for Computational Linguistics' (ACL) Special Interest Group on Finite-State Methods (SIGFSM). The current issue on finite-state methods and models in natural language processing was planned in 2008 in this context as a response to a call for special issue proposals. In 2010, the issue received a total of sixteen submissions, some of which were extended and updated versions of workshop papers, and others which were completely new. The final selection, consisting of only seven papers that could fit into one issue, is not fully representative, but complements the prior special issues in a nice way. The selected papers showcase a few areas where finite-state methods have less than obvious and sometimes even groundbreaking relevance to natural language processing (NLP) applications.

Download Full-text

Extended Finite State Models of Language András Kornai (editor) (BBN Technologies) Cambridge University Press (Studies in natural language processing), 1999, xii+278 pp and CD-ROM; hardbound, ISBN 0-521-63198-X, $59.95

Computational Linguistics ◽

10.1162/coli.2000.26.2.282 ◽

2000 ◽

Vol 26 (2) ◽

pp. 282-285

Author(s):

Ed Kaiser

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cd Rom ◽

Cambridge University ◽

Finite State ◽

State Models ◽

Finite State Models

Download Full-text

Finite-State Technology

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.39 ◽

2018 ◽

Author(s):

Mans Hulden

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Finite State Machines ◽

Regular Languages ◽

Finite State Automata ◽

State Machines ◽

Computational Phonology ◽

Finite State

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.

Download Full-text

Finite-state methods in natural language processing and mathematics of language. Introduction to the special issue

Journal of Language Modelling ◽

10.15398/jlm.v7i2.248 ◽

2019 ◽

Vol 7 (2) ◽

pp. 1

Author(s):

Frank Drewes ◽

Makoto Kanazawa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Special Issue ◽

Finite State ◽

And Mathematics ◽

Mathematics Of Language

Download Full-text

MorphoBr: an open source large-coverage full-form lexicon for morphological analysis of Portuguese

Texto Livre Linguagem e Tecnologia ◽

10.17851/1983-3652.11.3.1-25 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1-25

Author(s):

Leonel Figueiredo de Alencar ◽

Bruno Cuconato ◽

Alexandre Rademaker

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Computational Linguistics ◽

Language Processing ◽

Morphological Analysis ◽

Computational Techniques ◽

Processing Technologies ◽

Finite State ◽

Full Form

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.

Download Full-text

Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction

Computational Linguistics ◽

10.1162/coli_a_00373 ◽

2020 ◽

Vol 46 (2) ◽

pp. 249-255

Author(s):

Marta R. Costa-jussà ◽

Cristina España-Bonet ◽

Pascale Fung ◽

Noah A. Smith

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Semantic Representations ◽

Special Issue

We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing. We situate the special issue’s five articles in the context of our fast-changing field, explaining our motivation for this project. We offer a brief summary of the work in the issue, which includes developments on lexical and sentential semantic representations, from symbolic and neural perspectives.

Download Full-text

Editorial Note

Natural Language Engineering ◽

10.1017/s1351324909990246 ◽

2010 ◽

Vol 16 (1) ◽

pp. 1-2

Author(s):

Ruslan Mitkov

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Industrial Applications ◽

Editorial Note ◽

Original Research ◽

Language Engineering ◽

Practical Applications ◽

Market Opportunities

Natural Language Engineering (NLE) enters the second decade of the twenty-first century having established itself as a leading forum for high-quality articles covering all aspects of applied Natural Language Processing research, including, but not limited to, the engineering of natural language methods and applications. It continues to promote first class original research and bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. The journal has responded in several ways to the ongoing interest in and growth of research in this area. In 2007 NLE increased its number of pages per issue, thus enabling the publication of more articles. As of January 2010, new publication types are also promoted. In addition to welcoming articles which report on original, unpublished research, the journal now invites surveys presenting the state of the art in important areas of Natural Language Engineering and Natural Language Processing (such as tasks, tools, resources or applications) as well as squibs discussing specific problems. Book reviews and reports on industrial applications will continue to have a prominent place in the Journal. Conference reports, comparative discussions of Natural Language Engineering products and policy-orientated papers examining, for example, funding programmes or market opportunities, are welcome too. Special issues will remain an important feature of the Journal. We envisage one special issue per year, on average. Special issues are selected on a competitive basis after regular calls for proposals.

Download Full-text

Computational Phonology

Linguistics ◽

10.1093/obo/9780199772810-0249 ◽

2019 ◽

Author(s):

Jane Chandlee

Keyword(s):

Computational Linguistics ◽

Language Processing ◽

Optimality Theory ◽

Learning Algorithms ◽

Point Of Interest ◽

Computational Phonology ◽

Language Technology ◽

Finite State ◽

State Models ◽

Finite State Models

Much like the term “computational linguistics”, the term “computational phonology” has come to mean different things to different people. Research grounded in a variety of methodologies and formalisms can be included in its scope. The common thread of the research that falls under this umbrella term is the use of computational methods to investigate questions of interest in phonology, primarily how to delimit the set of possible phonological patterns from the larger set of “logically possible” patterns and how those patterns are learned. Computational phonology arguably began with the foundational result that Sound Pattern of English (SPE) rules are regular relations (provided they can’t recursively apply to their own structural change), which means they can be modeled with finite-state transducers (FSTs) and that a system of ordered rules can be composed into a single FST. The significance of this result can be seen in the prominence of finite-state models both in theoretical phonology research and in more applied areas like natural language processing and human language technology. The shift in the field of phonology from rule-based grammars to constraint-based frameworks like Optimality Theory (OT) initially sparked interest in the question of how to model OT with FSTs and thereby preserve the noted restriction of phonology to the complexity level of regular. But an additional point of interest for computational work on OT stemmed from the ways in which its architecture readily lends itself to the development of learning algorithms and models, including statistical approaches that address recognized challenges such as gradient acceptability, process optionality, and the learning of underlying forms and hidden structure. Another line of research has taken on the question of to what extent phonology is not just regular, but subregular, meaning describable with proper subclasses of the regular languages and relations. The advantages of subregular modeling of phonological phenomena are argued to be stronger typological explanations, in that the computational properties that establish the subclasses as properly subregular restrict the kinds of phenomena that can be described in desirable ways. Also, these same restrictions lead directly to provably correct learning algorithms. Once again this work has made extensive use of the finite-state formalism, but it has also employed logical characterizations that more readily extend from strings to non-linear phenomena such as autosegmental representations and syllable structure.

Download Full-text