Joint Morphological and Syntactic Analysis for Richly Inflected Languages

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

Download Full-text

LVBERT: Transformer-Based Model for Latvian Language Understanding

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200610 ◽

2020 ◽

Author(s):

Artūrs Znotiņš ◽

Guntis Barzdiņš

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state-of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to facilitate future research and downstream applications for Latvian NLP.

Download Full-text

Part-Of-Speech Tagging in French: State-of-the-Art and Obstacles

2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) ◽

10.1109/snams52053.2020.9336546 ◽

2020 ◽

Author(s):

Edouard Ngor Sarr ◽

Ousmane Sall ◽

Lamine Faty

Keyword(s):

State Of The Art ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

A Deep Learning-based Two-Steps Pipeline Model for Korean Morphological Analysis and Part-of-Speech Tagging

Journal of KIISE ◽

10.5626/jok.2021.48.4.444 ◽

2021 ◽

Vol 48 (4) ◽

pp. 444-452

Author(s):

Jun Young Youn ◽

Jae Sung Lee

Keyword(s):

Deep Learning ◽

Morphological Analysis ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Pipeline Model ◽

Speech Tagging

Download Full-text

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

10.18653/v1/2021.naacl-demos.1 ◽

2021 ◽

Author(s):

Linh The Nguyen ◽

Dat Quoc Nguyen

Keyword(s):

Named Entity Recognition ◽

Learning Model ◽

Entity Recognition ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Task Learning ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Word-Sense Disambiguation

10.1093/oxfordhb/9780199276349.013.0013 ◽

2012 ◽

Cited By ~ 1

Author(s):

Mark Stevenson ◽

Yorick Wilks

Keyword(s):

Language Processing ◽

Word Sense Disambiguation ◽

Syntactic Analysis ◽

Word Sense ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Evaluation Strategies ◽

Sense Disambiguation ◽

Translation Systems ◽

Speech Tagging

Word-sense disambiguation (WSD) is the process of identifying the meanings of words in context. This article begins with discussing the origins of the problem in the earliest machine translation systems. Early attempts to solve the WSD problem suffered from a lack of coverage. The main approaches to tackle the problem were dictionary-based, connectionist, and statistical strategies. This article concludes with a review of evaluation strategies for WSD and possible applications of the technology. WSD is an ‘intermediate’ task in language processing: like part-of-speech tagging or syntactic analysis, it is unlikely that anyone other than linguists would be interested in its results for their own sake. ‘Final’ tasks produce results of use to those without a specific interest in language and often make use of ‘intermediate’ tasks. WSD is a long-standing and important problem in the field of language processing.

Download Full-text

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Computational Linguistics ◽

10.1162/coli_a_00425 ◽

2021 ◽

pp. 1-38

Author(s):

Gözde Gül Şahin

Keyword(s):

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Dependency Parsing ◽

Low Resource ◽

Part Of Speech Tagging ◽

High Resource ◽

Part Of Speech ◽

Augmentation Techniques ◽

Speech Tagging

Abstract Data-hungry deep neural networks have established themselves as the defacto standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic training data points from existing data. Although NLP has recently witnessed a load of textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies which perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion) and character (e.g., character swapping) levels.We systematically compare the methods on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families using various models including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and the model type (e.g., token-level augmentation provide significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).

Download Full-text

Joint Model of Korean Part-of-Speech Tagging and Dependency Parsing with Partial Tagged Corpus

International Journal of Knowledge Engineering ◽

10.7763/ijke.2015.v1.8 ◽

2015 ◽

Vol 1 (1) ◽

pp. 49-53

Author(s):

Youngmin Park ◽

◽

Jungyun Seo ◽

Keyword(s):

Joint Model ◽

Dependency Parsing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch005 ◽

2020 ◽

pp. 103-116

Author(s):

Mourad Sarrouti ◽

Said Ouatik El Alaoui

Keyword(s):

Question Answering ◽

State Of The Art ◽

Biomedical Domain ◽

Open Domain ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Current State ◽

Sentiment Score ◽

Speech Tagging ◽

Sentiment Word

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.

Download Full-text

Punjabi Pos Tagger: Rule Based and HMM

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0106 ◽

2017 ◽

Vol 7 (7) ◽

pp. 193

Author(s):

Umrinderpal Singh ◽

Vishal Goyal

Keyword(s):

Information Retrieval ◽

Language Processing ◽

State Of The Art ◽

Input Word ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unseen Data ◽

Pos Tagger ◽

Speech Tagging

The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not widely explored territory. We have discussed Rule Based and HMM based Part of Speech tagger for Punjabi along with the comparison of their accuracies of both approaches. The System is developed using 35 different standard part of speech tag. We evaluate our system on unseen data with state-of-the-art accuracy 93.3%.

Download Full-text

Part of Speech Tagging: Shallow or Deep Learning?

Northern European Journal of Language Technology ◽

10.3384/nejlt.2000-1533.1851 ◽

2018 ◽

Vol 5 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Robert Östling

Keyword(s):

Neural Network ◽

Computational Efficiency ◽

State Of The Art ◽

Neural Network Approach ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

The Neural Network ◽

Efficient Machine ◽

Speech Tagging

Deep neural networks have advanced the state of the art in numerous fields, but they generally suffer from low computational efficiency and the level of improvement compared to more efficient machine learning models is not always significant. We perform a thorough PoS tagging evaluation on the Universal Dependencies treebanks, pitting a state-of-the-art neural network approach against UDPipe and our sparse structured perceptron-based tagger, efselab. In terms of computational efficiency, efselab is three orders of magnitude faster than the neural network model, while being more accurate than either of the other systems on 47 of 65 treebanks.

Download Full-text