Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach

Research on Chinese Named Entity Recognition Based on Ontology

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.195-196.1180 ◽

2012 ◽

Vol 195-196 ◽

pp. 1180-1185

Author(s):

Wei Li Chang ◽

Fang Luo ◽

Ji Lai Qian

Keyword(s):

Language Processing ◽

Conditional Random Fields ◽

Critical Role ◽

Named Entity Recognition ◽

Recall Rate ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech ◽

Speech Features ◽

Precision Rate

As a critical role in many Natural Language Processing (NLP) applications, such as Information Extraction, Machine Translation etc, Chinese Named Entity Recognition (NER) remains a challenging task because of its characteristics. This paper proposes a method of Chinese NER, which combining Conditional Random Fields (CRFs) model with domain ontology as a semantic feature besides word and part of speech features. Experiments were made to compare the two kinds of feature templates, and the precision rate and recall rate of Chinese NER rose to 90.86% and 88.23%, which showed remarkable performance of the proposed approach. Combination of ontology and CRFs method increased effectively the precision and recall of Chinese NER.

Download Full-text

Learning Task-Specific Representation for Novel Words in Sequence Labeling

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/715 ◽

2019 ◽

Author(s):

Minlong Peng ◽

Qi Zhang ◽

Xiaoyu Xing ◽

Tao Gui ◽

Jinlan Fu ◽

...

Keyword(s):

Empirical Studies ◽

Named Entity Recognition ◽

Learning Task ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Sequence Labeling ◽

Part Of Speech ◽

Word Representation

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

Download Full-text

A Probability based Classification of Named Entities for Malayalam Language combining Word, Part of Speech and Lexicalized features

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1968.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 839-842

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Supervised Machine Learning ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Part Of Speech ◽

Classification Probability ◽

Malayalam Language

Named Entity Recognition is the process wherein named entities which are designators of a sentence are identified. Designators of a sentence are domain specific. The proposed system identifies named entities in Malayalam language belonging to tourism domain which generally includes names of persons, places, organizations, dates etc. The system uses word, part of speech and lexicalized features to find the probability of a word belonging to a named entity category and to do the appropriate classification. Probability is calculated based on supervised machine learning using word and part of speech features present in a tagged training corpus and using certain rules applied based on lexicalized features.

Download Full-text

Advances in Computational Linguistics and Text Processing Frameworks

Advances in Computer and Electrical Engineering - Handbook of Research on Engineering Innovations and Technology Management in Organizations ◽

10.4018/978-1-7998-2772-6.ch012 ◽

2020 ◽

pp. 217-244

Author(s):

Ayush Srivastav ◽

Hera Khan ◽

Amit Kumar Mishra

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Text Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech

The chapter provides an eloquent account of the major methodologies and advances in the field of Natural Language Processing. The most popular models that have been used over time for the task of Natural Language Processing have been discussed along with their applications in their specific tasks. The chapter begins with the fundamental concepts of regex and tokenization. It provides an insight to text preprocessing and its methodologies such as Stemming and Lemmatization, Stop Word Removal, followed by Part-of-Speech tagging and Named Entity Recognition. Further, this chapter elaborates the concept of Word Embedding, its various types, and some common frameworks such as word2vec, GloVe, and fastText. A brief description of classification algorithms used in Natural Language Processing is provided next, followed by Neural Networks and its advanced forms such as Recursive Neural Networks and Seq2seq models that are used in Computational Linguistics. A brief description of chatbots and Memory Networks concludes the chapter.

Download Full-text

An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes

Data ◽

10.3390/data5030060 ◽

2020 ◽

Vol 5 (3) ◽

pp. 60

Author(s):

Nasser Alshammari ◽

Saad Alanazi

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Definite Article ◽

Linguistic Features ◽

Annotation Scheme ◽

Named Entity ◽

Part Of Speech ◽

Annotation Process ◽

Arabic Natural Language Processing

This article outlines a novel data descriptor that provides the Arabic natural language processing community with a dataset dedicated to named entity recognition tasks for diseases. The dataset comprises more than 60 thousand words, which were annotated manually by two independent annotators using the inside–outside (IO) annotation scheme. To ensure the reliability of the annotation process, the inter-annotator agreements rate was calculated, and it scored 95.14%. Due to the lack of research efforts in the literature dedicated to studying Arabic multi-annotation schemes, a distinguishing and a novel aspect of this dataset is the inclusion of six more annotation schemes that will bridge the gap by allowing researchers to explore and compare the effects of these schemes on the performance of the Arabic named entity recognizers. These annotation schemes are IOE, IOB, BIES, IOBES, IE, and BI. Additionally, five linguistic features, including part-of-speech tags, stopwords, gazetteers, lexical markers, and the presence of the definite article, are provided for each record in the dataset.

Download Full-text

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

10.18653/v1/2021.naacl-demos.1 ◽

2021 ◽

Author(s):

Linh The Nguyen ◽

Dat Quoc Nguyen

Keyword(s):

Named Entity Recognition ◽

Learning Model ◽

Entity Recognition ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Task Learning ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Joint Part-of-Speech Tagging and Named Entity Recognition Using Factor Graphs

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32790-2_28 ◽

2012 ◽

pp. 232-239 ◽

Cited By ~ 1

Author(s):

György Móra ◽

Veronika Vincze

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Factor Graphs ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition based on a Hidden Markov Model in part-of-speech tagging

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) ◽

10.1109/icadiwt.2008.4664380 ◽

2008 ◽

Cited By ~ 4

Author(s):

Ryohei Ageishi ◽

Takao Miura

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition in Bengali using system combination

Lingvisticae Investigationes ◽

10.1075/li.37.1.01ekb ◽

2014 ◽

Vol 37 (1) ◽

pp. 1-22 ◽

Cited By ~ 2

Author(s):

Asif Ekbal ◽

Sivaji Bandyopadhyay

Keyword(s):

Conditional Random Field ◽

Named Entity Recognition ◽

Machine Learning Algorithms ◽

Unlabeled Data ◽

Entity Recognition ◽

Supervised Machine Learning ◽

Support Vector ◽

System Combination ◽

Named Entity ◽

Part Of Speech

This paper reports a voted Named Entity Recognition (NER) system that exploits appropriate unlabeled data. Initially, we develop NER systems using the supervised machine learning algorithms such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). Each of these models makes use of the language independent features in the form of different contextual and orthographic word-level features along with the language dependent features extracted from the Part-of-Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method are also used as the features in each of the classifiers. A semi-supervised method is proposed to describe the measures to automatically select effective unlabeled documents as well as sentences from the unlabeled data. Finally, the supervised models are combined together into a final system by defining appropriate weighted voting technique. Experimental results for a resource-poor language like Bengali show the effectiveness of the proposed approach with the overall recall, precision and F-measure values of 93.81%, 92.18% and 92.98%, respectively.

Download Full-text

Named entity recognition in texts with the help of part of speech tagging

Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics ◽

10.17721/1812-5409.2018/4.11 ◽

2018 ◽

pp. 74-83

Author(s):

M. Bevza

Keyword(s):

State Of The Art ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Recent Developments ◽

Future Work

We analyze neural network architectures that yield state of the art results on named entity recognition task and propose a number of new architectures for improving results even further. We have analyzed a number of ideas and approaches that researchers have used to achieve state of the art results in a variety of NLP tasks. In this work, we present a few architectures which we consider to be most likely to improve the existing state of the art solutions for named entity recognition task and part of speech tasks. The architectures are inspired by recent developments in multi-task learning. This work tests the hypothesis that NER and POS are related tasks and adding information about POS tags as input to the network can help achieve better NER results. And vice versa, information about NER tags can help solve the task of POS tagging. This work also contains the implementation of the network and results of the experiments together with the conclusions and future work.

Download Full-text