Study of Named Entity Recognition for Indian Languages

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Download Full-text

Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages

Inductive Logic Programming - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13840-9_16 ◽

2010 ◽

pp. 178-185 ◽

Cited By ~ 3

Author(s):

Anup Patel ◽

Ganesh Ramakrishnan ◽

Pushpak Bhattacharya

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity

Download Full-text

The first named entity recognizer in Maithili: Resource creation and system development

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210051 ◽

2021 ◽

pp. 1-13

Author(s):

Ankur Priyadarshi ◽

Sujan Kumar Saha

Keyword(s):

Conditional Random Fields ◽

Native Speakers ◽

System Development ◽

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Neural Models ◽

Named Entity ◽

Official Languages ◽

Resource Creation

In this paper, we present our effort on the development of a Maithili Named Entity Recognition (NER) system. Maithili is one of the official languages of India, with around 50 million native speakers. Although various NER systems have been developed in several Indian languages, we did not find any openly available NER resource or system in Maithili. For the development, we manually annotated a Maithili NER corpus containing around 200K words. We prepared a baseline classifier using Conditional Random Fields (CRF). Then we ran many experiments using various recurrent neural networks (RNN). We collected larger raw corpus to obtain better word embedding and character embedding. In our experiments, we found, neural models are better than CRF; a CRF layer is effective for the prediction of the final output in the RNN models; character embedding is effective in Maithili language. We also investigated the effectiveness of gazetteer lists in neural models. We prepared a few gazetteer lists from various web resources and used those in the neural models. The incorporation of the gazetteer layer caused performance improvement. The final system achieved an f-measure of 91.6% with 94.9% precision and 88.53% recall.

Download Full-text

A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies

Expert Systems with Applications ◽

10.1016/j.eswa.2011.05.004 ◽

2011 ◽

Vol 38 (12) ◽

pp. 14760-14772 ◽

Cited By ~ 21

Author(s):

Asif Ekbal ◽

Sriparna Saha

Keyword(s):

Simulated Annealing ◽

Case Studies ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classifier Ensemble ◽

Indian Languages ◽

Named Entity

Download Full-text

A Survey on Various Approach used in Named Entity Recognition for Indian Languages

International Journal of Computer Applications ◽

10.5120/ijca2017913878 ◽

2017 ◽

Vol 167 (1) ◽

pp. 11-18 ◽

Cited By ~ 1

Author(s):

Dikshan N. ◽

Harshad Bhadka

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity

Download Full-text

Named Entity Recognition: A Survey for Indian Languages

2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) ◽

10.1109/icicict46008.2019.8993236 ◽

2019 ◽

Author(s):

Krishnanjan Bhattacharjee ◽

Shiva Karthik S ◽

Swati Mehta ◽

Ajai Kumar ◽

Ria Mehta ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity

Download Full-text

Named Entity Recognition in Indian Languages Using Maximum Entropy Approach

International Journal of Computer Processing Of Languages ◽

10.1142/s1793840608001913 ◽

2008 ◽

Vol 21 (03) ◽

pp. 205-237 ◽

Cited By ~ 1

Author(s):

ASIF EKBAL ◽

SIVAJI BANDYOPADHYAY

Keyword(s):

Maximum Entropy ◽

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity

Download Full-text

A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi

Linguistic Issues in Language Technology ◽

10.33011/lilt.v2i.1203 ◽

2009 ◽

Vol 2 ◽

Author(s):

Asif Ekbal ◽

Sivaji Bandyopadhyay

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Contextual Information ◽

Named Entity Recognition ◽

Standard Test ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity ◽

Test Sets ◽

Validation Tests

This paper describes the development of Named Entity Recognition (NER) systems for two leading Indian languages, namely Bengali and Hindi, using the Conditional Random Field (CRF) framework. The system makes use of different types of contextual information along with a variety of features that are helpful in predicting the different named entity (NE) classes. This set of features includes language independent as well as language dependent components. We have used the annotated corpora of 122,467 tokens for Bengali and 502,974 tokens for Hindi tagged with a tag set of twelve different NE classes, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL). We have considered only the tags that denote person names, location names, organization names, number expressions, time expressions and measurement expressions. A number of experiments have been carried out in order to find out the most suitable features for NER in Bengali and Hindi. The system has been tested with the gold standard test sets of 35K for Bengali and 50K tokens for Hindi. Evaluation results in overall f-score values of 81.15% for Bengali and 78.29% for Hindi for the test sets. 10-fold cross validation tests yield f-score values of 83.89% for Bengali and 80.93% for Hindi. ANOVA analysis is performed to show that the performance improvement due to the use of language dependent features is statistically significant.

Download Full-text

A Survey of Named Entity Recognition in Assamese and other Indian Languages

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2014.3310 ◽

2014 ◽

Vol 3 (3) ◽

pp. 105-112 ◽

Cited By ~ 1

Author(s):

Gitimoni Talukdar ◽

Pranjal Protim Borah ◽

Arup Baruah

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity

Download Full-text