Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia

Named Entity Recognition (NER) deals with identifying and classifying atomic texts into pre-defined ontological classes. It is the enabling technique to many complex knowledge acquisition tasks. The recent flourish of Web resources has opened new opportunities and challenges for knowledge acquisition. In the domain of NER and its application in ontology population, considerable research work has been dedicated to exploiting background knowledge from Web resources to enhance the accuracy of the system. This chapter gives a review of existing literature in this domain with an emphasis on using background knowledge extracted from the Web resources. The authors discuss the benefits of using background knowledge and the inadequacies of existing work. They then propose a novel method that automatically creates domain-specific background knowledge by exploring the Wikipedia knowledge base in a domain- and language-independent way. The authors empirically show that the method can be adapted to ontology population, and generates high quality background knowledge that improves the accuracy of domain-specific NER.

Download Full-text

A Probability based Classification of Named Entities for Malayalam Language combining Word, Part of Speech and Lexicalized features

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1968.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 839-842

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Supervised Machine Learning ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Part Of Speech ◽

Classification Probability ◽

Malayalam Language

Named Entity Recognition is the process wherein named entities which are designators of a sentence are identified. Designators of a sentence are domain specific. The proposed system identifies named entities in Malayalam language belonging to tourism domain which generally includes names of persons, places, organizations, dates etc. The system uses word, part of speech and lexicalized features to find the probability of a word belonging to a named entity category and to do the appropriate classification. Probability is calculated based on supervised machine learning using word and part of speech features present in a tagged training corpus and using certain rules applied based on lexicalized features.

Download Full-text

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

10.21203/rs.3.rs-90025/v1 ◽

2020 ◽

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Vinay Reddy ◽

Sakthivel Rajendran ◽

Imran Razzak ◽

...

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Future Research ◽

Named Entity ◽

Domain Specific ◽

Context Dependent ◽

Biomedical Named Entity Recognition

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Download Full-text

Improving the Performance of a Named Entity Recognition System with Knowledge Acquisition

Lecture Notes in Computer Science - Knowledge Engineering and Knowledge Management ◽

10.1007/978-3-642-33876-2_11 ◽

2012 ◽

pp. 97-113 ◽

Cited By ~ 5

Author(s):

Myung Hee Kim ◽

Paul Compton

Keyword(s):

Knowledge Acquisition ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Named Entity

Download Full-text

Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91452-7_53 ◽

2018 ◽

pp. 830-846

Author(s):

Jiangtao Zhang ◽

Juanzi Li ◽

Xiao-Li Li ◽

Yixin Cao ◽

Lei Hou ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Domain Specific ◽

Dual Representations

Download Full-text

Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103435 ◽

2020 ◽

Vol 106 ◽

pp. 103435

Author(s):

Yuan Li ◽

Guodong Du ◽

Yan Xiang ◽

Shaozi Li ◽

Lei Ma ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Specific Knowledge ◽

Named Entity ◽

Domain Specific ◽

Domain Specific Knowledge

Download Full-text

Techniques for Named Entity Recognition

Advances in Human and Social Aspects of Technology - Collaboration and the Semantic Web ◽

10.4018/978-1-4666-0894-8.ch011 ◽

2012 ◽

pp. 191-217 ◽

Cited By ~ 1

Author(s):

Girish Keshav Palshikar

Keyword(s):

Semantic Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Number Of Factors ◽

Web Contents ◽

Biological Domain

While building and using a fully semantic understanding of Web contents is a distant goal, named entities (NEs) provide a small, tractable set of elements carrying a well-defined semantics. Generic named entities are names of persons, locations, organizations, phone numbers, and dates, while domain-specific named entities includes names of for example, proteins, enzymes, organisms, genes, cells, et cetera, in the biological domain. An ability to automatically perform named entity recognition (NER) – i.e., identify occurrences of NE in Web contents – can have multiple benefits, such as improving the expressiveness of queries and also improving the quality of the search results. A number of factors make building highly accurate NER a challenging task. Given the importance of NER in semantic processing of text, this chapter presents a detailed survey of NER techniques for English text.

Download Full-text

Techniques for Named Entity Recognition

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch022 ◽

2013 ◽

pp. 400-426 ◽

Cited By ~ 2

Author(s):

Girish Keshav Palshikar

Keyword(s):

Semantic Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Number Of Factors ◽

Web Contents ◽

Biological Domain

Download Full-text

The Impact of Domain-Specific Pre-Training on Named Entity Recognition Tasks in Materials Science

SSRN Electronic Journal ◽

10.2139/ssrn.3950755 ◽

2021 ◽

Author(s):

Nicholas Walker ◽

Amalie Trewartha ◽

Haoyan Huo ◽

Sanghoon Lee ◽

Kevin Cruse ◽

...

Keyword(s):

Materials Science ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Domain Specific ◽

The Impact

Download Full-text

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain

Applied Sciences ◽

10.3390/app11136007 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6007

Author(s):

Muzamil Hussain Syed ◽

Sun-Tae Chung

Keyword(s):

Domain Adaptation ◽

Language Model ◽

Named Entity Recognition ◽

Word Embedding ◽

Fine Tuning ◽

Entity Recognition ◽

Language Models ◽

Feature Vectors ◽

Named Entity ◽

Domain Specific

Entity-based information extraction is one of the main applications of Natural Language Processing (NLP). Recently, deep transfer-learning utilizing contextualized word embedding from pre-trained language models has shown remarkable results for many NLP tasks, including Named-entity recognition (NER). BERT (Bidirectional Encoder Representations from Transformers) is gaining prominent attention among various contextualized word embedding models as a state-of-the-art pre-trained language model. It is quite expensive to train a BERT model from scratch for a new application domain since it needs a huge dataset and enormous computing time. In this paper, we focus on menu entity extraction from online user reviews for the restaurant and propose a simple but effective approach for NER task on a new domain where a large dataset is rarely available or difficult to prepare, such as food menu domain, based on domain adaptation technique for word embedding and fine-tuning the popular NER task network model ‘Bi-LSTM+CRF’ with extended feature vectors. The proposed NER approach (named as ‘MenuNER’) consists of two step-processes: (1) Domain adaptation for target domain; further pre-training of the off-the-shelf BERT language model (BERT-base) in semi-supervised fashion on a domain-specific dataset, and (2) Supervised fine-tuning the popular Bi-LSTM+CRF network for downstream task with extended feature vectors obtained by concatenating word embedding from the domain-adapted pre-trained BERT model from the first step, character embedding and POS tag feature information. Experimental results on handcrafted food menu corpus from customers’ review dataset show that our proposed approach for domain-specific NER task, that is: food menu named-entity recognition, performs significantly better than the one based on the baseline off-the-shelf BERT-base model. The proposed approach achieves 92.5% F1 score on the YELP dataset for the MenuNER task.

Download Full-text