Named Entity System for Tweets in Hindi Language

Due to the growing need of smart-health applications in Hindi language, there is a rapid demand for health-related Named Entity Recognition (NER) system for Hindi. For the purpose of the same, this research considers Twitter social network to extract tweets dated 1st October 2016 to 15th October 2017 from Patanjali, Dabur and other Hindi language-oriented Twitter based health sites; while considering four NE types- Person, Disease, Consumable and Organization. To the best of its knowledge, the considered Twitter dataset and NE types for Hindi language is one of the first resources that is being taken care. This article introduces three stage NER system for Tweets in Hindi language (HinTwtNER system)- pre-processing stage; machine Learning stage (Hyperspace Analogue to Language (HAL) and Conditional Random Field (CRF)); and post-processing stage. HinTwtNER looks into binary features and achieves an overall F-score of 49.87% which is comparable to the Twitter based NER systems for English and other languages.

Download Full-text

Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16193628 ◽

2019 ◽

Vol 16 (19) ◽

pp. 3628 ◽

Cited By ~ 5

Author(s):

Erdenebileg Batbaatar ◽

Keun Ho Ryu

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Adverse Drug Events ◽

Viterbi Algorithm ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Health Related

Named Entity Recognition (NER) in the healthcare domain involves identifying and categorizing disease, drugs, and symptoms for biosurveillance, extracting their related properties and activities, and identifying adverse drug events appearing in texts. These tasks are important challenges in healthcare. Analyzing user messages in social media networks such as Twitter can provide opportunities to detect and manage public health events. Twitter provides a broad range of short messages that contain interesting information for information extraction. In this paper, we present a Health-Related Named Entity Recognition (HNER) task using healthcare-domain ontology that can recognize health-related entities from large numbers of user messages from Twitter. For this task, we employ a deep learning architecture which is based on a recurrent neural network (RNN) with little feature engineering. To achieve our goal, we collected a large number of Twitter messages containing health-related information, and detected biomedical entities from the Unified Medical Language System (UMLS). A bidirectional long short-term memory (BiLSTM) model learned rich context information, and a convolutional neural network (CNN) was used to produce character-level features. The conditional random field (CRF) model predicted a sequence of labels that corresponded to a sequence of inputs, and the Viterbi algorithm was used to detect health-related entities from Twitter messages. We provide comprehensive results giving valuable insights for identifying medical entities in Twitter for various applications. The BiLSTM-CRF model achieved a precision of 93.99%, recall of 73.31%, and F1-score of 81.77% for disease or syndrome HNER; a precision of 90.83%, recall of 81.98%, and F1-score of 87.52% for sign or symptom HNER; and a precision of 94.85%, recall of 73.47%, and F1-score of 84.51% for pharmacologic substance named entities. The ontology-based manual annotation results show that it is possible to perform high-quality annotation despite the complexity of medical terminology and the lack of context in tweets.

Download Full-text

Named entity recognition approaches: A study applied to English and Hindi language

2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015] ◽

10.1109/iccpct.2015.7159443 ◽

2015 ◽

Cited By ~ 3

Author(s):

Gowri Prasad ◽

K. K. Fousiya

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Hindi Language

Download Full-text

Using Conditional Random Field in Named Entity Recognition for Crime Location Identification

International Journal of Mechanical Engineering and Robotics Research ◽

10.18178/ijmerr.9.2.252-257 ◽

2020 ◽

pp. 252-257

Author(s):

Quintin Jackson Goraseb ◽

◽

Nathar Shah

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Location Identification

Download Full-text

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9010015 ◽

2019 ◽

Vol 9 (1) ◽

pp. 15 ◽

Cited By ~ 2

Author(s):

Runyu Fan ◽

Lizhe Wang ◽

Jining Yan ◽

Weijing Song ◽

Yingqian Zhu ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Knowledge Graph ◽

Geological Hazard ◽

Geological Hazards ◽

Named Entity ◽

Corpus Construction

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.

Download Full-text

CWPC_BiAtt: Character–Word–Position Combined BiLSTM-Attention for Chinese Named Entity Recognition

Information ◽

10.3390/info11010045 ◽

2020 ◽

Vol 11 (1) ◽

pp. 45 ◽

Cited By ~ 1

Author(s):

Shardrom Johnson ◽

Sherlock Shen ◽

Yuanchen Liu

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Attention Mechanism ◽

Entity Recognition ◽

Position Information ◽

Named Entity ◽

Pos Tagging ◽

Word Position

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.

Download Full-text

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

Applied Sciences ◽

10.3390/app9183658 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3658 ◽

Cited By ~ 6

Author(s):

Jianliang Yang ◽

Yuenan Liu ◽

Minghui Qian ◽

Chenghua Guan ◽

Xiangfei Yuan

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Large Scale ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Language Models ◽

Named Entity

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.

Download Full-text

A Low-Cost Named Entity Recognition Research Based on Active Learning

Scientific Programming ◽

10.1155/2018/1890683 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Han Huang ◽

Hongyu Wang ◽

Dawei Jin

Keyword(s):

Active Learning ◽

Language Processing ◽

Selection Process ◽

Conditional Random Field ◽

Low Cost ◽

Named Entity Recognition ◽

Entity Recognition ◽

Training Set ◽

Processing Technologies ◽

Named Entity

Named entity recognition (NER) is an indispensable and very important part of many natural language processing technologies, such as information extraction, information retrieval, and intelligent Q & A. This paper describes the development of the AL-CRF model, which is a NER approach based on active learning (AL). The algorithmic sequence of the processes performed by the AL-CRF model is the following: first, the samples are clustered using the k-means approach. Then, stratified sampling is performed on the produced clusters in order to obtain initial samples, which are used to train the basic conditional random field (CRF) classifier. The next step includes the initiation of the selection process which uses the criterion of entropy. More specifically, samples having the highest entropy values are added to the training set. Afterwards, the learning process is repeated, and the CRF classifier is retrained based on the obtained training set. The learning and the selection process of the AL is running iteratively until the harmonic mean F stabilizes and the final NER model is obtained. Several NER experiments are performed on legislative and medical cases in order to validate the AL-CRF performance. The testing data include Chinese judicial documents and Chinese electronic medical records (EMRs). Testing indicates that our proposed algorithm has better recognition accuracy and recall rate compared to the conventional CRF model. Moreover, the main advantage of our approach is that it requires fewer manually labelled training samples, and at the same time, it is more effective. This can result in a more cost effective and more reliable process.

Download Full-text

Nested Named Entity Recognition via Second-best Sequence Learning and Decoding

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00334 ◽

2020 ◽

Vol 8 ◽

pp. 605-620 ◽

Cited By ~ 1

Author(s):

Takashi Shibuya ◽

Eduard Hovy

Keyword(s):

Random Field ◽

Sequence Learning ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Neural Model ◽

Entity Recognition ◽

Named Entities ◽

Second Best ◽

Named Entity ◽

Better Than

When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also inner nested ones. We design an objective function for training a neural model that treats the tag sequence for nested entities as the second best path within the span of their parent entity. In addition, we provide the decoding method for inference that extracts entities iteratively from outermost ones to inner ones in an outside-to-inside way. Our method has no additional hyperparameters to the conditional random field based model widely used for flat named entity recognition tasks. Experiments demonstrate that our method performs better than or at least as well as existing methods capable of handling nested entities, achieving F1-scores of 85.82%, 84.34%, and 77.36% on ACE-2004, ACE-2005, and GENIA datasets, respectively.

Download Full-text

Pengenalan Entitas Bernama Otomatis untuk Bahasa Indonesia dengan Pendekatan Pembelajaran Mesin

10.31227/osf.io/vud2p ◽

2018 ◽

Cited By ~ 2

Author(s):

Yudi Wibisono ◽

Masayu Leylia Khodra

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory ◽

F Measure ◽

Bahasa Indonesia

Pengenalan entitas bernama (named-entity recognition atau NER) adalah proses otomatis mengekstraksi entitas bernama yang dianggap penting di dalam sebuah teks dan menentukan kategorinya ke dalam kategori terdefinisi. Sebagai contoh, untuk teks berita, NER dapat mengekstraksi nama orang, nama organisasi, dan nama lokasi. NER bermanfaat dalam berbagai aplikasi analisis teks, misalnya pencarian, sistem tanya jawab, peringkasan teks dan mesin penerjemah. Tantangan utama NER adalah penanganan ambiguitas makna karena konteks kata pada kalimat, misalnya kata “Cendana” dapat merupakan nama lokasi (Jalan Cendana), atau nama organisasi (Keluarga Cendana), atau nama tanaman. Tantangan lainnya adalah penentuan batas entitas, misalnya “[Istora Senayan] [Jakarta]”. Berbagai kakas NER telah dikembangkan untuk berbagai bahasa terutama Bahasa Inggris dengan kinerja yang baik, tetapi kakas NER bahasa Indonesia masih memiliki kinerja yang belum baik. Makalah ini membahas pendekatan berbasis pembelajaran mesin untuk menghasilkan model NER bahasa Indonesia. Pendekatan ini sangat bergantung pada korpus yang menjadi sumber belajar, dan teknik pembelajaran mesin yang digunakan. Teknik yang akan digunakan adalah LSTM - CRF (Long Short Term Memory – Conditional Random Field). Hasil terbaik (F-measure = 0.72) didapatkan dengan menggunakan word embedding GloVe Wikipedia Bahasa Indonesia.

Download Full-text

Studying the impact of various features on the performance of Conditional Random Field-based Arabic Named Entity Recognition

2013 ACS International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa.2013.6616423 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alia Morsi ◽

Ahmed Rafea

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

The Impact

Download Full-text