A survey of named entity recognition and classification

This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words in a document. Evaluation techniques, ranging from intuitive exact match to very complex matching techniques with adjustable cost of errors, are an indisputable key to progress.

Download Full-text

An Improved Word Representation for Deep Learning Based NER in Indian Languages

Information ◽

10.3390/info10060186 ◽

2019 ◽

Vol 10 (6) ◽

pp. 186 ◽

Cited By ~ 1

Author(s):

Ajees A P ◽

Manju K ◽

Sumam Mary Idicula

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Indian Languages ◽

Named Entity ◽

Text Document ◽

Learning Techniques ◽

Word Representation

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Download Full-text

Named Entity Recognition using Machine learning techniques for Telugu language

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2016.7883220 ◽

2016 ◽

Cited By ~ 1

Author(s):

M. Humera Khanam ◽

Md.A. Khudhus ◽

M.S. Prasad Babu

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Named Entity ◽

Learning Techniques

Download Full-text

Unsupervised Technique for Automatically Extracting Components of References

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1644.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1000-1004

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scientific Publications ◽

Standard Format ◽

Named Entity ◽

Learning Techniques ◽

The Individual

The automatic extraction of bibliographic data remains a difficult task to the present day, when it's realized that the scientific publications are not in a standard format and every publications has its own template. There are many “regular expression” techniques and “supervised machine learning” techniques for extracting the entire details of the references mentioned within the bibliographic section. But there's no much difference within the percentage of their success. Our idea is to seek out whether unsupervised machine learning techniques can help us in increasing the share of success. This paper presents a technique for segregating and automatically extracting the individual components of references like Authors, Title of the references, publications details, etc., using “Unsupervised technique”, “Named-Entity recognition”(NER) technique and link these references to their corresponding full text article with the assistance of google

Download Full-text

A Kernel-Based Approach for Biomedical Named Entity Recognition

The Scientific World JOURNAL ◽

10.1155/2013/950796 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Rakesh Patra ◽

Sujan Kumar Saha

Keyword(s):

Kernel Function ◽

Text Processing ◽

Named Entity Recognition ◽

Kernel Functions ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Named Entity ◽

Tree Kernel

Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.

Download Full-text

End-to-End Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-Level Vs. Character-Level

Communications in Computer and Information Science - Computational Linguistics ◽

10.1007/978-981-10-8438-6_18 ◽

2018 ◽

pp. 219-232 ◽

Cited By ~ 5

Author(s):

Thai-Hoang Pham ◽

Phuong Le-Hong

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Named Entity Recognition ◽

Network Models ◽

Entity Recognition ◽

Neural Network Models ◽

Named Entity ◽

Word Level ◽

End To End

Download Full-text

Chemical Named Entity Recognition Using Deep Learning Techniques

Advances in Computational Intelligence and Robotics - Deep Natural Language Processing and AI Applications for Industry 5.0 ◽

10.4018/978-1-7998-7728-8.ch004 ◽

2021 ◽

pp. 59-73

Author(s):

Hema R. ◽

Ajantha Devi

Keyword(s):

Deep Learning ◽

Chemical Elements ◽

Data Extraction ◽

Named Entity Recognition ◽

Entity Recognition ◽

Related Data ◽

Named Entity ◽

Chemical Structures ◽

Learning Techniques ◽

Chemical Named Entity Recognition

Chemical entities can be represented in different forms like chemical names, chemical formulae, and chemical structures. Because of the different classification frameworks for chemical names, the task of distinguishing proof or extraction of chemical elements with less ambiguous is considered a major test. Compound named entity recognition (NER) is the initial phase in any chemical-related data extraction strategy. The majority of the chemical NER is done utilizing dictionary-based, rule-based, and machine learning procedures. Recently, deep learning methods have evolved, and, in this chapter, the authors sketch out the various deep learning techniques applied for chemical NER. First, the authors introduced the fundamental concepts of chemical named entity recognition, the textual contents of chemical documents, and how these chemicals are represented in chemical literature. The chapter concludes with the strengths and weaknesses of the above methods and also the types of the chemical entities extracted.

Download Full-text

Chinese Clinical Named Entity Recognition with Word-Level Information Incorporating Dictionaries

2019 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2019.8852113 ◽

2019 ◽

Cited By ~ 2

Author(s):

Ningjie Lu ◽

Jun Zheng ◽

Wen Wu ◽

Yan Yang ◽

Kaiwei Chen ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Word Level ◽

Level Information

Download Full-text

A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

Comparative and Functional Genomics ◽

10.1002/cfg.457 ◽

2005 ◽

Vol 6 (1-2) ◽

pp. 77-85 ◽

Cited By ~ 17

Author(s):

Shipra Dingare ◽

Malvina Nissim ◽

Jenny Finkel ◽

Christopher Manning ◽

Claire Grover

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Rapid Adaptation ◽

Exact Match ◽

Web Searches ◽

Named Entities ◽

Data Annotation ◽

Knowledge Resources ◽

Named Entity ◽

Biomedical Named Entity Recognition

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.

Download Full-text

Character level and word level embedding with bidirectional LSTM – Dynamic recurrent neural network for biomedical named entity recognition from literature

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103609 ◽

2020 ◽

Vol 112 ◽

pp. 103609

Author(s):

Sudhakaran Gajendran ◽

Manjula D ◽

Vijayan Sugumaran

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Word Level ◽

Bidirectional Lstm ◽

Biomedical Named Entity Recognition

Download Full-text

Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Information ◽

10.3390/info11020082 ◽

2020 ◽

Vol 11 (2) ◽

pp. 82

Author(s):

SaiKiranmai Gorla ◽

Lalita Bhanu Murthy Neti ◽

Aruna Malapati

Keyword(s):

Language Processing ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Limited Resources ◽

Support Vector ◽

Named Entity ◽

Word Level ◽

Asian Languages ◽

Better Than

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.

Download Full-text