CustNER

This article describes CustNER: a system for named-entity recognition (NER) of person, location, and organization. Realizing the incorrect annotations of existing NER, four categories of false negatives have been identified. The NEs not annotated contain nationalities, have corresponding resource in DBpedia, are acronyms of other NEs. A rule-based system, CustNER, has been proposed that utilizes existing NERs and DBpedia knowledge base. CustNER has been trained on the open knowledge extraction (OKE) challenge 2017 dataset and evaluated on OKE and CoNLL03 (Conference on Natural Language Learning) datasets. The OKE dataset has also been annotated with the three types. Evaluation results show that CustNER outperforms existing NERs with F score 12.4% better than Stanford NER and 3.1% better than Illinois NER. On another standard evaluation dataset for which the system is not trained, the CoNLL03 dataset, CustNER gives results comparable to existing systems with F score 3.9% better than Stanford NER, though Illinois NER F score is 1.3% better than CustNER.

Download Full-text

Integrating Rule-Based System with Classification for Arabic Named Entity Recognition

Computational Linguistics and Intelligent Text Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-28604-9_26 ◽

2012 ◽

pp. 311-322 ◽

Cited By ~ 26

Author(s):

Sherief Abdallah ◽

Khaled Shaalan ◽

Muhammad Shoaib

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Rule Based ◽

Rule Based System ◽

Named Entity

Download Full-text

Nested Named Entity Recognition via Second-best Sequence Learning and Decoding

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00334 ◽

2020 ◽

Vol 8 ◽

pp. 605-620 ◽

Cited By ~ 1

Author(s):

Takashi Shibuya ◽

Eduard Hovy

Keyword(s):

Random Field ◽

Sequence Learning ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Neural Model ◽

Entity Recognition ◽

Named Entities ◽

Second Best ◽

Named Entity ◽

Better Than

When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also inner nested ones. We design an objective function for training a neural model that treats the tag sequence for nested entities as the second best path within the span of their parent entity. In addition, we provide the decoding method for inference that extracts entities iteratively from outermost ones to inner ones in an outside-to-inside way. Our method has no additional hyperparameters to the conditional random field based model widely used for flat named entity recognition tasks. Experiments demonstrate that our method performs better than or at least as well as existing methods capable of handling nested entities, achieving F1-scores of 85.82%, 84.34%, and 77.36% on ACE-2004, ACE-2005, and GENIA datasets, respectively.

Download Full-text

J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00094 ◽

2016 ◽

Vol 4 ◽

pp. 215-229 ◽

Cited By ~ 11

Author(s):

Dat Ba Nguyen ◽

Martin Theobald ◽

Gerhard Weikum

Keyword(s):

Knowledge Base ◽

Graphical Model ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Probabilistic Graphical Model ◽

Linguistic Features ◽

False Negatives ◽

New Approach ◽

Named Entity

Methods for Named Entity Recognition and Disambiguation (NERD) perform NER and NED in two separate stages. Therefore, NED may be penalized with respect to precision by NER false positives, and suffers in recall from NER false negatives. Conversely, NED does not fully exploit information computed by NER such as types of mentions. This paper presents J-NERD, a new approach to perform NER and NED jointly, by means of a probabilistic graphical model that captures mention spans, mention types, and the mapping of mentions to entities in a knowledge base. We present experiments with different kinds of texts from the CoNLL’03, ACE’05, and ClueWeb’09-FACC1 corpora. J-NERD consistently outperforms state-of-the-art competitors in end-to-end NERD precision, recall, and F1.

Download Full-text

Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Information ◽

10.3390/info11020082 ◽

2020 ◽

Vol 11 (2) ◽

pp. 82

Author(s):

SaiKiranmai Gorla ◽

Lalita Bhanu Murthy Neti ◽

Aruna Malapati

Keyword(s):

Language Processing ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Limited Resources ◽

Support Vector ◽

Named Entity ◽

Word Level ◽

Asian Languages ◽

Better Than

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.

Download Full-text

Data and knowledge-driven named entity recognition for cyber security

Cybersecurity ◽

10.1186/s42400-021-00072-y ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Chen Gao ◽

Xuan Zhang ◽

Hui Liu

Keyword(s):

Deep Learning ◽

Cyber Security ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Driven ◽

Named Entity ◽

Text Features ◽

Complex Words ◽

Deep Learning Model ◽

Better Than

AbstractNamed Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of datasets, but this data-driven method usually lacks the ability to deal with rare entities. Gasmi et al. proposed a deep learning method for named entity recognition in the field of cyber security, and achieved good results, reaching an F1 value of 82.8%. But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge, this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition. In addition, based on the data-driven deep learning model, an attention mechanism is adopted to enrich the local features of the text, better models the context, and improves the recognition effect of complex entities. Experimental results show that our method is better than the baseline model. Our model is more effective in identifying cyber security entities. The Precision, Recall and F1 value reached 90.19%, 86.60% and 88.36% respectively.

Download Full-text

The Automatic Detection of Dataset Names in Scientific Articles

Data ◽

10.3390/data6080084 ◽

2021 ◽

Vol 6 (8) ◽

pp. 84

Author(s):

Jenny Heddes ◽

Pim Meerdink ◽

Miguel Pieters ◽

Maarten Marx

Keyword(s):

Open Access ◽

Gold Standard ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Rule Based ◽

Named Entity ◽

Gold Standard Dataset ◽

The Many ◽

Better Than

We study the task of recognizing named datasets in scientific articles as a Named Entity Recognition (NER) problem. Noticing that available annotated datasets were not adequate for our goals, we annotated 6000 sentences extracted from four major AI conferences, with roughly half of them containing one or more named datasets. A distinguishing feature of this set is the many sentences using enumerations, conjunctions and ellipses, resulting in long BI+ tag sequences. On all measures, the SciBERT NER tagger performed best and most robustly. Our baseline rule based tagger performed remarkably well and better than several state-of-the-art methods. The gold standard dataset, with links and offsets from each sentence to the (open access available) articles together with the annotation guidelines and all code used in the experiments, is available on GitHub.

Download Full-text

MAF-CNER : A Chinese Named Entity Recognition Model Based on Multifeature Adaptive Fusion

Complexity ◽

10.1155/2021/6696064 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Xuming Han ◽

Feng Zhou ◽

Zhiyuan Hao ◽

Qiaoming Liu ◽

Yong Li ◽

...

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Recognition Performance ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Basic Model ◽

Recognition Ability ◽

Adaptive Fusion ◽

Better Than

Named entity recognition (NER) is a subtask in natural language processing, and its accuracy greatly affects the effectiveness of downstream tasks. Aiming at the problem of insufficient expression of potential Chinese features in named entity recognition tasks, this paper proposes a multifeature adaptive fusion Chinese named entity recognition (MAF-CNER) model. The model uses bidirectional long short-term memory (BiLSTM) neural network to extract stroke and radical features and adopts a weighted concatenation method to fuse two sets of features adaptively. This method can better integrate the two sets of features, thereby improving the model entity recognition ability. In order to fully test the entity recognition performance of this model, we compared the basic model and other mainstream models on Microsoft Research Asia (MSRA) and “China People’s Daily” dataset from January to June 1998. Experimental results show that this model is better than other models, with F1 values of 97.01% and 96.78%, respectively.

Download Full-text

A Neural Multi-Task Learning Framework to Jointly Model Medical Named Entity Recognition and Normalization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301817 ◽

2019 ◽

Vol 33 ◽

pp. 817-824 ◽

Cited By ~ 9

Author(s):

Sendong Zhao ◽

Ting Liu ◽

Sicheng Zhao ◽

Fei Wang

Keyword(s):

Named Entity Recognition ◽

Model Performance ◽

Joint Modeling ◽

Entity Recognition ◽

Named Entity ◽

Learning Framework ◽

Task Learning ◽

Feedback Strategies ◽

Model Recognition ◽

Better Than

State-of-the-art studies have demonstrated the superiority of joint modeling over pipeline implementation for medical named entity recognition and normalization due to the mutual benefits between the two processes. To exploit these benefits in a more sophisticated way, we propose a novel deep neural multi-task learning framework with explicit feedback strategies to jointly model recognition and normalization. On one hand, our method benefits from the general representations of both tasks provided by multi-task learning. On the other hand, our method successfully converts hierarchical tasks into a parallel multi-task setting while maintaining the mutual supports between tasks. Both of these aspects improve the model performance. Experimental results demonstrate that our method performs significantly better than state-of-theart approaches on two publicly available medical literature datasets.

Download Full-text

Rule-based Named Entity Recognition (NER) to Determine Time Expression for Balinese Text Document

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v09.i04.p14 ◽

2021 ◽

Vol 9 (4) ◽

pp. 555

Author(s):

Ni Made Sinta Wahyuni ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Direct Observation ◽

Named Entity Recognition ◽

Entity Recognition ◽

Text Documents ◽

Rule Based ◽

Named Entity ◽

Text Document ◽

Or Organization ◽

Person Location ◽

F Measure

Named Entity Recognition (NER) is a process to identify words or phrases as a named entity, such as a person, location, time expression, or organization. In this research, we are interested in developing a NER which able to identify the time expression entity in Balinese text documents. The time expression entity becomes an important component in the text because it is usually followed by important facts and information. NER was built using a rules-based approach. The rules are built based on direct observation of documents and pay attention to the morphological and contextual structures. Based on the experiments conducted, the average results of the precision, recall, and f-measure values were 0.85, 0.87, and 0.85.

Download Full-text

NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic

Natural Language Engineering ◽

10.1017/s1351324916000097 ◽

2016 ◽

Vol 23 (3) ◽

pp. 441-472 ◽

Cited By ~ 8

Author(s):

MAI OUDAH ◽

KHALED SHAALAN

Keyword(s):

Language Processing ◽

Hybrid Approach ◽

Named Entity Recognition ◽

Entity Recognition ◽

Rule Base ◽

Named Entities ◽

Rule Based ◽

Named Entity ◽

Linguistic Rules ◽

Person Location

AbstractNamed Entity Recognition (NER) is an essential task for many natural language processing systems, which makes use of various linguistic resources. NER becomes more complicated when the language in use is morphologically rich and structurally complex, such as Arabic. This language has a set of characteristics that makes it particularly challenging to handle. In a previous work, we have proposed an Arabic NER system that follows the hybrid approach, i.e. integrates both rule-based and machine learning-based NER approaches. Our hybrid NER system is the state-of-the-art in Arabic NER according to its performance on standard evaluation datasets. In this article, we discuss a novel methodology for overcoming the coverage drawback of rule-based NER systems in order to improve their performance and allow for automated rule update. The presented mechanism utilizes the recognition decisions made by the hybrid NER system in order to identify the weaknesses of the rule-based component and derive new linguistic rules aiming at enhancing the rule base, which will help in achieving more reliable and accurate results. We used ACE 2004 Newswire standard dataset as a resource for extracting and analyzing new linguistic rules for person, location and organization names recognition. We formulate each new rule based on two distinctive feature groups, i.e. Gazetteers of each type of named entities and Part-of-Speech tags, in particular noun and proper noun. Fourteen new patterns are derived, formulated as grammar rules, and evaluated in terms of coverage. The conducted experiments exploit a POS tagged version of the ACE 2004 NW dataset. The empirical results show that the performance of the enhanced rule-based system, i.e. NERA 2.0, improves the coverage of the previously misclassified person, location and organization named entities types by 69.93 per cent, 57.09 per cent and 54.28 per cent, respectively.

Download Full-text