An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing

Finite-State Technology

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.39 ◽

2018 ◽

Author(s):

Mans Hulden

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Finite State Machines ◽

Regular Languages ◽

Finite State Automata ◽

State Machines ◽

Computational Phonology ◽

Finite State

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.

Download Full-text

Finite-State Methods and Natural Language Processing

10.1007/11780885 ◽

2006 ◽

Cited By ~ 1

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Finite State

Download Full-text

Finite-state methods in natural language processing and mathematics of language. Introduction to the special issue

Journal of Language Modelling ◽

10.15398/jlm.v7i2.248 ◽

2019 ◽

Vol 7 (2) ◽

pp. 1

Author(s):

Frank Drewes ◽

Makoto Kanazawa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Special Issue ◽

Finite State ◽

And Mathematics ◽

Mathematics Of Language

Download Full-text

Natural Language to SQL query Generation

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35804 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 5069-5072

Author(s):

Kiran Raj R

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Regular Expression ◽

Parts Of Speech ◽

Query Generation ◽

Sql Query ◽

Speech Tagging ◽

The Web

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.

Download Full-text

MorphoBr: an open source large-coverage full-form lexicon for morphological analysis of Portuguese

Texto Livre Linguagem e Tecnologia ◽

10.17851/1983-3652.11.3.1-25 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1-25

Author(s):

Leonel Figueiredo de Alencar ◽

Bruno Cuconato ◽

Alexandre Rademaker

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Computational Linguistics ◽

Language Processing ◽

Morphological Analysis ◽

Computational Techniques ◽

Processing Technologies ◽

Finite State ◽

Full Form

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.

Download Full-text

Finite-State Methods and Natural Language Processing

10.1007/978-3-642-14684-8 ◽

2010 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Finite State

Download Full-text

Using Ensemble Models to Classify the Sentiment Expressed in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8931 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8931 ◽

Cited By ~ 5

Author(s):

James A. McCart ◽

Dezon K. Finch ◽

Jay Jarman ◽

Edward Hickling ◽

Jason D. Lind ◽

...

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Regular Expression ◽

Shared Task ◽

Suicide Notes ◽

The Mean ◽

The U.S

In 2007, suicide was the tenth leading cause of death in the U.S. Given the significance of this problem, suicide was the focus of the 2011 Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing (NLP) shared task competition (track two). Specifically, the challenge concentrated on sentiment analysis, predicting the presence or absence of 15 emotions (labels) simultaneously in a collection of suicide notes spanning over 70 years. Our team explored multiple approaches combining regular expression-based rules, statistical text mining (STM), and an approach that applies weights to text while accounting for multiple labels. Our best submission used an ensemble of both rules and STM models to achieve a micro-averaged F1 score of 0.5023, slightly above the mean from the 26 teams that competed (0.4875).

Download Full-text

Finite-state methods and models in natural language processing

Natural Language Engineering ◽

10.1017/s1351324911000015 ◽

2011 ◽

Vol 17 (2) ◽

pp. 141-144

Author(s):

ANSSI YLI-JYRÄ ◽

ANDRÁS KORNAI ◽

JACQUES SAKAROVITCH

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Current Issue ◽

Special Issue ◽

The Public ◽

The Past ◽

Finite State ◽

Final Selection

For the past two decades, specialised events on finite-state methods have been successful in presenting interesting studies on natural language processing to the public through journals and collections. The FSMNLP workshops have become well-known among researchers and are now the main forum of the Association for Computational Linguistics' (ACL) Special Interest Group on Finite-State Methods (SIGFSM). The current issue on finite-state methods and models in natural language processing was planned in 2008 in this context as a response to a call for special issue proposals. In 2010, the issue received a total of sixteen submissions, some of which were extended and updated versions of workshop papers, and others which were completely new. The final selection, consisting of only seven papers that could fit into one issue, is not fully representative, but complements the prior special issues in a nice way. The selected papers showcase a few areas where finite-state methods have less than obvious and sometimes even groundbreaking relevance to natural language processing (NLP) applications.

Download Full-text

PENERAPAN NATURAL LANGUAGE PROCESSING BERBASIS VIRTUAL ASSISTANT PADA BAGIAN ADMINISTRASI AKADEMIK STMIK DHARMA WACANA

International Research on Big-Data and Computer Technology: I-Robot ◽

10.53514/ir.v5i1.228 ◽

2021 ◽

Vol 5 (1) ◽

pp. 33-47

Author(s):

Arief Adjie Wicaksono ◽

Ridwan Yusuf ◽

Tri Aristi Saputri

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Pattern Matching ◽

Language Processing ◽

Regular Expression

Sekolah Tinggi Ilmu Manajemen Informatika dan Komputer (STMIK) Dharma Wacana memiliki beberapa bagian seperti Bagian Administrasi Akademik yang memiliki tugas melaksanakan pelayanan dibidang akademik. Bagian Administrasi Akademik menjadi sumber informasi terkait kegiatan perkuliahan. Kebutuhan informasi perkuliahan belum efektif dikarenakan terbatasnya jam kerja dari pegawai dan masih banyak pertanyaan berulang yang berdatangan ke Bagian Administrasi Akademik, seperti pertanyaan yang telah ditanyakan oleh seorang mahasiswa kemudian ditanyakan lagi oleh mahasiswa lainnya. Tujuan dari penelitian ini adalah melakukan observasi dan wawancara terhadap mahasiswa dan pegawai Bagian Administrasi Akademik serta menganalisis kelemahannya sehingga dapat menjadi acuan untuk merancang aplikasi dengan penerapan Natural Language Processing (NLP). Pada penelitian telah dibangun Virtual Assistant berupa Chatbot yang tersedia pada platform messenger yaitu LINE, Facebook dan Telegram yang hanya bertindak layaknya bagian informasi perkuliahan. NLP dengan pendekatan pattern matching menggunakan regular expression diterapkan dalam proses mengenali pertanyaan mahasiswa sehingga Virtual Assistant dapat memberikan jawaban yang sesuai.

Download Full-text

Survey: Finite-state technology in natural language processing

Theoretical Computer Science ◽

10.1016/j.tcs.2016.05.030 ◽

2017 ◽

Vol 679 ◽

pp. 2-17 ◽

Cited By ~ 2

Author(s):

Andreas Maletti

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Finite State ◽

State Technology

Download Full-text