Learning Tag Dependencies for Sequence Tagging

Sequence tagging is the basis for multiple applications in natural language processing. Despite successes in learning long term token sequence dependencies with neural network, tag dependencies are rarely considered previously. Sequence tagging actually possesses complex dependencies and interactions among the input tokens and the output tags. We propose a novel multi-channel model, which handles different ranges of token-tag dependencies and their interactions simultaneously. A tag LSTM is augmented to manage the output tag dependencies and word-tag interactions, while three mechanisms are presented to efficiently incorporate token context representation and tag dependency. Extensive experiments on part-of-speech tagging and named entity recognition tasks show that the proposed model outperforms the BiLSTM-CRF baseline by effectively incorporating the tag dependency feature.

Download Full-text

Bodo Resources for NLP - An Overview of Existing Primary Resources for Bodo

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.12 ◽

2021 ◽

Author(s):

Mwnthai Narzary ◽

Gwmsrang Muchahary ◽

Maharaj Brahma ◽

Sanjib Narzary ◽

Pranav Kumar Singh ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Parallel Corpus ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Challenges And Opportunities ◽

Speech Tagging

With over 1.4 million Bodo speakers, there is a need for Automated Language Processing systems such as Machine translation, Part Of Speech tagging, Speech recognition, Named Entity Recognition, and so on. In order to develop such a system it requires a sufficient amount of dataset. In this paper we present a detailed description of the primary resources available for Bodo language that can be used as datasets to study Natural Language Processing and its applications. We have listed out different resources available for Bodo language: 8,005 Lexicon dataset collected from agriculture and health, Raw corpus dataset of 2,915,544 words, Tagged corpus consisting of 30,000 sentences, Parallel corpus of 28,359 sentences from tourism, agriculture and health and Tagged and Parallel corpus dataset of 37,768 sentences. We further discuss the challenges and opportunities present in Bodo language.

Download Full-text

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

10.18653/v1/2021.naacl-demos.1 ◽

2021 ◽

Author(s):

Linh The Nguyen ◽

Dat Quoc Nguyen

Keyword(s):

Named Entity Recognition ◽

Learning Model ◽

Entity Recognition ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Task Learning ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Joint Part-of-Speech Tagging and Named Entity Recognition Using Factor Graphs

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32790-2_28 ◽

2012 ◽

pp. 232-239 ◽

Cited By ~ 1

Author(s):

György Móra ◽

Veronika Vincze

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Factor Graphs ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition based on a Hidden Markov Model in part-of-speech tagging

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) ◽

10.1109/icadiwt.2008.4664380 ◽

2008 ◽

Cited By ~ 4

Author(s):

Ryohei Ageishi ◽

Takao Miura

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Part-of-speech Tagging and Named Entity Recognition Using Improved Hidden Markov Model and Bloom Filter

2018 International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon.2018.8674901 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ankita ◽

K. A. Abdul Nazeer

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Bloom Filter ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Part of Speech Tagging and Named Entity Recognition

Text Analysis with R - Quantitative Methods in the Humanities and Social Sciences ◽

10.1007/978-3-030-39643-5_18 ◽

2020 ◽

pp. 237-245

Author(s):

Matthew L. Jockers ◽

Rosamond Thalken

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00059 ◽

2017 ◽

Vol 5 ◽

pp. 247-261 ◽

Cited By ~ 1

Author(s):

Gáabor Berend

Keyword(s):

Sparse Coding ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Sequence Labeling ◽

Part Of Speech ◽

Proposed Model

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e. 150 sentences per language.

Download Full-text

LVBERT: Transformer-Based Model for Latvian Language Understanding

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200610 ◽

2020 ◽

Author(s):

Artūrs Znotiņš ◽

Guntis Barzdiņš

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state-of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to facilitate future research and downstream applications for Latvian NLP.

Download Full-text

Correcting Word Segmentation and Part-of-Speech Tagging Errors for Chinese Named Entity Recognition

The Internet Challenge: Technology and Applications ◽

10.1007/978-94-010-0494-7_4 ◽

2002 ◽

pp. 29-36 ◽

Cited By ~ 1

Author(s):

Tianfang Yao ◽

Wei Ding ◽

Gregor Erbach

Keyword(s):

Named Entity Recognition ◽

Word Segmentation ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

An Attention-Based Model Using Character Composition of Entities in Chinese Relation Extraction

Information ◽

10.3390/info11020079 ◽

2020 ◽

Vol 11 (2) ◽

pp. 79 ◽

Cited By ~ 2

Author(s):

Xiaoyu Han ◽

Yue Zhang ◽

Wenkai Zhang ◽

Tinglei Huang

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Relation Extraction ◽

Entity Recognition ◽

Additional Information ◽

Named Entity ◽

Proposed Model ◽

The Relationship ◽

Crucial Part

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.

Download Full-text