Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets

Due to the explosive growth in the amount of text snippets over the past few years and their sparsity of text, organizations are unable to effectively and efficiently classify them, missing out on business opportunities. This paper presents TETSC: the Topically-Enriched Text Snippet Classification method. TETSC aims to solve the classification problem for text snippets in any domain. TETSC recognizes that there are different types of text snippets and, therefore, allows for stop word removal, named-entity recognition, and topical enrichment for the different types of text snippets. TETSC has been implemented in the production systems of a personal finance organization, which resulted in a classification error reduction of over 21%. Highlights: The authors create the TETSC method for classifying topically-enriched text snippets; the authors differentiate between different types of text snippets; the authors show a successful application of Named-Entity Recognition to text snippets; using multiple enrichment strategies appears to reduce effectivity.

Download Full-text

A Probability based Classification of Named Entities for Malayalam Language combining Word, Part of Speech and Lexicalized features

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1968.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 839-842

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Supervised Machine Learning ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Part Of Speech ◽

Classification Probability ◽

Malayalam Language

Named Entity Recognition is the process wherein named entities which are designators of a sentence are identified. Designators of a sentence are domain specific. The proposed system identifies named entities in Malayalam language belonging to tourism domain which generally includes names of persons, places, organizations, dates etc. The system uses word, part of speech and lexicalized features to find the probability of a word belonging to a named entity category and to do the appropriate classification. Probability is calculated based on supervised machine learning using word and part of speech features present in a tagged training corpus and using certain rules applied based on lexicalized features.

Download Full-text

Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity Recognition

Progress in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30244-3_60 ◽

2019 ◽

pp. 731-743

Author(s):

Adriano Mendonça Rocha ◽

Marcelo de Almeida Maia

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

10.21203/rs.3.rs-90025/v1 ◽

2020 ◽

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Vinay Reddy ◽

Sakthivel Rajendran ◽

Imran Razzak ◽

...

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Future Research ◽

Named Entity ◽

Domain Specific ◽

Context Dependent ◽

Biomedical Named Entity Recognition

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Download Full-text

Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91452-7_53 ◽

2018 ◽

pp. 830-846

Author(s):

Jiangtao Zhang ◽

Juanzi Li ◽

Xiao-Li Li ◽

Yixin Cao ◽

Lei Hou ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Domain Specific ◽

Dual Representations

Download Full-text

Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103435 ◽

2020 ◽

Vol 106 ◽

pp. 103435

Author(s):

Yuan Li ◽

Guodong Du ◽

Yan Xiang ◽

Shaozi Li ◽

Lei Ma ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Specific Knowledge ◽

Named Entity ◽

Domain Specific ◽

Domain Specific Knowledge

Download Full-text

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

BioMed Research International ◽

10.1155/2014/240403 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6 ◽

Cited By ~ 49

Author(s):

Buzhou Tang ◽

Hongxin Cao ◽

Xiaolong Wang ◽

Qingcai Chen ◽

Hua Xu

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Biomedical Domain ◽

Crucial Step ◽

Named Entity ◽

Different Types ◽

Word Representation ◽

Biomedical Named Entity Recognition

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements inF-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.

Download Full-text

Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2019.05.022 ◽

2019 ◽

Vol 129 ◽

pp. 100-106 ◽

Cited By ~ 1

Author(s):

Arantza Casillas ◽

Nerea Ezeiza ◽

Iakes Goenaga ◽

Alicia Pérez ◽

Xabier Soto

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Different Types

Download Full-text

Named Entity Recognition with Gating Mechanism and Parallel BiLSTM

Journal of Web Engineering ◽

10.13052/jwe1540-9589.20413 ◽

2021 ◽

Author(s):

Yenan Yi ◽

Yijie Bian

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Weighted Sum ◽

Named Entity ◽

Gating Mechanism ◽

Final Output ◽

Different Types ◽

Character Sequences ◽

The One ◽

Types Of Information

In this paper, we propose a novel neural network for named entity recognition, which is improved in two aspects. On the one hand, our model uses a parallel BiLSTM structure to generate character-level word representations. By inputting character sequences of words into several independent and parallel BiLSTMs, we can obtain word representations from different representation subspaces, because the parameters of these BiLSTMs are randomly initialized. This method can enhance the expression abilities of character-level word representations. On the other hand, we use a two-layer BiLSTM with gating mechanism to model sentences. Since the features extracted by each layer in a multi-layer LSTM from texts contain different types of information, we use the gating mechanism to assign appropriate weights to the outputs of each layer, and take the weighted sum of these outputs as the final output for named entity recognition. Our model only changes the structure, does not need any feature engineering or external knowledge source, which is a complete end-to-end NER model. We used the CoNLL-2003 English and German datasets to evaluate our model and got better results compared with baseline models.

Download Full-text

Named Entity Recognition for Ontology Population using Background Knowledge from Wikipedia

Ontology Learning and Knowledge Discovery Using the Web ◽

10.4018/978-1-60960-625-1.ch005 ◽

2011 ◽

pp. 79-104 ◽

Cited By ~ 5

Author(s):

Ziqi Zhang ◽

Fabio Ciravegna

Keyword(s):

Knowledge Acquisition ◽

Research Work ◽

Named Entity Recognition ◽

Background Knowledge ◽

Entity Recognition ◽

Web Resources ◽

Named Entity ◽

Domain Specific ◽

Ontology Population ◽

Complex Knowledge

Named Entity Recognition (NER) deals with identifying and classifying atomic texts into pre-defined ontological classes. It is the enabling technique to many complex knowledge acquisition tasks. The recent flourish of Web resources has opened new opportunities and challenges for knowledge acquisition. In the domain of NER and its application in ontology population, considerable research work has been dedicated to exploiting background knowledge from Web resources to enhance the accuracy of the system. This chapter gives a review of existing literature in this domain with an emphasis on using background knowledge extracted from the Web resources. The authors discuss the benefits of using background knowledge and the inadequacies of existing work. They then propose a novel method that automatically creates domain-specific background knowledge by exploring the Wikipedia knowledge base in a domain- and language-independent way. The authors empirically show that the method can be adapted to ontology population, and generates high quality background knowledge that improves the accuracy of domain-specific NER.

Download Full-text

Techniques for Named Entity Recognition

Advances in Human and Social Aspects of Technology - Collaboration and the Semantic Web ◽

10.4018/978-1-4666-0894-8.ch011 ◽

2012 ◽

pp. 191-217 ◽

Cited By ~ 1

Author(s):

Girish Keshav Palshikar

Keyword(s):

Semantic Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entities ◽

Named Entity ◽

Domain Specific ◽

Number Of Factors ◽

Web Contents ◽

Biological Domain

While building and using a fully semantic understanding of Web contents is a distant goal, named entities (NEs) provide a small, tractable set of elements carrying a well-defined semantics. Generic named entities are names of persons, locations, organizations, phone numbers, and dates, while domain-specific named entities includes names of for example, proteins, enzymes, organisms, genes, cells, et cetera, in the biological domain. An ability to automatically perform named entity recognition (NER) – i.e., identify occurrences of NE in Web contents – can have multiple benefits, such as improving the expressiveness of queries and also improving the quality of the search results. A number of factors make building highly accurate NER a challenging task. Given the importance of NER in semantic processing of text, this chapter presents a detailed survey of NER techniques for English text.

Download Full-text