named entity recognition Latest Research Papers

Joined Type Length Encoding for Nested Named Entity Recognition

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3487057 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-23

Author(s):

Mohammad Sadegh Sheikhaei ◽

Hasan Zafari ◽

Yuan Tian

Keyword(s):

State Of The Art ◽

Named Entity Recognition ◽

Ensemble Method ◽

Entity Recognition ◽

Neural Models ◽

Named Entities ◽

Layer Sequence ◽

Named Entity ◽

The Ensemble Method ◽

Single Sequence

In this article, we propose a new encoding scheme for named entity recognition (NER) called Joined Type-Length encoding (JoinedTL). Unlike most existing named entity encoding schemes, which focus on flat entities, JoinedTL can label nested named entities in a single sequence. JoinedTL uses a packed encoding to represent both type and span of a named entity, which not only results in less tagged tokens compared to existing encoding schemes, but also enables it to support nested NER. We evaluate the effectiveness of JoinedTL for nested NER on three nested NER datasets: GENIA in English, GermEval in German, and PerNest, our newly created nested NER dataset in Persian. We apply CharLSTM+WordLSTM+CRF, a three-layer sequence tagging model on three datasets encoded using JoinedTL and two existing nested NE encoding schemes, i.e., JoinedBIO and JoinedBILOU. Our experiment results show that CharLSTM+WordLSTM+CRF trained with JoinedTL encoded datasets can achieve competitive F1 scores as the ones trained with datasets encoded by two other encodings, but with 27%–48% less tagged tokens. To leverage the power of three different encodings, i.e., JoinedTL, JoinedBIO, and JoinedBILOU, we propose an encoding-based ensemble method for nested NER. Evaluation results show that the ensemble method achieves higher F1 scores on all datasets than the three models each trained using one of the three encodings. By using nested NE encodings including JoinedTL with CharLSTM+WordLSTM+CRF, we establish new state-of-the-art performance with an F1 score of 83.7 on PerNest, 74.9 on GENIA, and 70.5 on GermEval, surpassing two recent neural models specially designed for nested NER.

A Statistical Language Model for Pre-Trained Sequence Labeling: A Case Study on Vietnamese

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483524 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-21

Author(s):

Xianwen Liao ◽

Yongzhong Huang ◽

Peng Yang ◽

Lei Chen

Keyword(s):

Language Model ◽

Dynamic Programming Algorithm ◽

Named Entity Recognition ◽

Word Segmentation ◽

Training Data ◽

Entity Recognition ◽

Divide And Conquer ◽

Programming Algorithm ◽

Statistical Language Model ◽

Sequence Labeling

By defining the computable word segmentation unit and studying its probability characteristics, we establish an unsupervised statistical language model (SLM) for a new pre-trained sequence labeling framework in this article. The proposed SLM is an optimization model, and its objective is to maximize the total binding force of all candidate word segmentation units in sentences under the condition of no annotated datasets and vocabularies. To solve SLM, we design a recursive divide-and-conquer dynamic programming algorithm. By integrating SLM with the popular sequence labeling models, Vietnamese word segmentation, part-of-speech tagging and named entity recognition experiments are performed. The experimental results show that our SLM can effectively promote the performance of sequence labeling tasks. Just using less than 10% of training data and without using a dictionary, the performance of our sequence labeling framework is better than the state-of-the-art Vietnamese word segmentation toolkit VnCoreNLP on the cross-dataset test. SLM has no hyper-parameter to be tuned, and it is completely unsupervised and applicable to any other analytic language. Thus, it has good domain adaptability.

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Neural Processing Letters ◽

10.1007/s11063-021-10737-x ◽

2022 ◽

Author(s):

Mingyi Liu ◽

Zhiying Tu ◽

Tong Zhang ◽

Tonghua Su ◽

Xiaofei Xu ◽

...

Keyword(s):

Active Learning ◽

Learning Strategy ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Active Learning Strategy

Named entity recognition for Chinese construction documents based on conditional random field

Frontiers of Engineering Management ◽

10.1007/s42524-021-0179-8 ◽

2022 ◽

Author(s):

Qiqi Zhang ◽

Cong Xue ◽

Xing Su ◽

Peng Zhou ◽

Xiangyu Wang ◽

...

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Construction Documents

Extraction and Analysis of Social Networks Data to Detect Traffic Accidents

Information ◽

10.3390/info13010026 ◽

2022 ◽

Vol 13 (1) ◽

pp. 26

Author(s):

Nestor Suat-Rojas ◽

Camilo Gutierrez-Osorio ◽

Cesar Pedraza

Keyword(s):

Social Networks ◽

Traffic Accident ◽

Traffic Accidents ◽

Low Cost ◽

Information Source ◽

Named Entity Recognition ◽

Industrial Area ◽

Entity Recognition ◽

Detection Methods ◽

Traffic Information

Traffic accident detection is an important strategy governments can use to implement policies intended to reduce accidents. They usually use techniques such as image processing, RFID devices, among others. Social network mining has emerged as a low-cost alternative. However, social networks come with several challenges such as informal language and misspellings. This paper proposes a method to extract traffic accident data from Twitter in Spanish. The method consists of four phases. The first phase establishes the data collection mechanisms. The second consists of vectorially representing the messages and classifying them as accidents or non-accidents. The third phase uses named entity recognition techniques to detect the location. In the fourth phase, locations pass through a geocoder that returns their geographic coordinates. This method was applied to Bogota city and the data on Twitter were compared with the official traffic information source; comparisons showed some influence of Twitter on the commercial and industrial area of the city. The results reveal how effective the information on accidents reported on Twitter can be. It should therefore be considered as a source of information that may complement existing detection methods.

HULTIG-C: NLP Corpus and Services in the Cloud

10.21203/rs.3.rs-696114/v2 ◽

2022 ◽

Author(s):

Sebastião Pais ◽

João Cordeiro ◽

Muhammad Jamil

Keyword(s):

Question Answering ◽

Named Entity Recognition ◽

Language Identification ◽

Entity Recognition ◽

Corpus Annotation ◽

Named Entity ◽

Corpus Construction ◽

Corpus Creation ◽

Specialized Texts ◽

Main Components

Abstract Nowadays, the use of language corpora for many purposes has increased significantly. General corpora exist for numerous languages, but research often needs more specialized corpora. The Web’s rapid growth has significantly improved access to thousands of online documents, highly specialized texts and comparable texts on the same subject covering several languages in electronic form. However, research has continued to concentrate on corpus annotation instead of corpus creation tools. Consequently, many researchers create their corpora, independently solve problems, and generate project-specific systems. The corpus construction is used for many NLP applications, including machine translation, information retrieval, and question-answering. This paper presents a new NLP Corpus and Services in the Cloud called HULTIG-C. HULTIG-C is characterized by various languages that include unique annotations such as keywords set, sentences set, named entity recognition set, and multiword set. Moreover, a framework incorporates the main components for license detection, language identification, boilerplate removal and document deduplication to process the HULTIG-C. Furthermore, this paper presents some potential issues related to constructing multilingual corpora from the Web.

Borrowing wisdom from world: modeling rich external knowledge for Chinese named entity recognition

Neural Computing and Applications ◽

10.1007/s00521-021-06680-6 ◽

2022 ◽

Author(s):

Yu Nie ◽

Yilai Zhang ◽

Yongkang Peng ◽

Lisha Yang

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

External Knowledge ◽

Named Entity ◽

World Modeling

Predicting the impact of online news articles – is information necessary?

Multimedia Tools and Applications ◽

10.1007/s11042-021-11621-5 ◽

2022 ◽

Author(s):

Judita Preiss

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Syntactic Structure ◽

Named Entity Recognition ◽

Relation Extraction ◽

Online News ◽

Entity Recognition ◽

News Article ◽

Popularity Prediction ◽

The Impact

AbstractWe exploit the Twitter platform to create a dataset of news articles derived from tweets concerning COVID-19, and use the associated tweets to define a number of popularity measures. The focus on (potentially) biomedical news articles allows the quantity of biomedically valid information (as extracted by biomedical relation extraction) to be included in the list of explored features. Aside from forming part of a systematic correlation exploration, the features – ranging from the semantic relations through readability measures to the article’s digital content – are used within a number of machine learning classifier and regression algorithms. Unsurprisingly, the results support that for more complex articles (as determined by a readability measure) more sophisticated syntactic structure may be expected. A weak correlation is found with information within an article suggesting that other factors, such as numbers of videos, have a notable impact on the popularity of a news article. The best popularity prediction performance is obtained using a random forest machine learning algorithm, and the feature describing the quantity of biomedical information is in the top 3 most important features in almost a third of the experiments performed. Additionally, this feature is found to be more valuable than the widely used named entity recognition.

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Big Data and Cognitive Computing ◽

10.3390/bdcc6010004 ◽

2022 ◽

Vol 6 (1) ◽

pp. 4

Author(s):

Dmitry Soshnikov ◽

Tatiana Petrova ◽

Vickie Soshnikova ◽

Andrey Grunin

Keyword(s):

Artificial Intelligence ◽

Named Entity Recognition ◽

Treatment Strategies ◽

Main Idea ◽

Signs And Symptoms ◽

Entity Recognition ◽

Text Corpus ◽

Scientific Papers ◽

Fast Processing ◽

Structured Information

Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).

Hierarchical shared transfer learning for biomedical named entity recognition

BMC Bioinformatics ◽

10.1186/s12859-021-04551-4 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Zhaoying Chai ◽

Han Jin ◽

Shenghui Shi ◽

Siyan Zhan ◽

Lin Zhuo ◽

...

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Medical Information ◽

Named Entity Recognition ◽

Fine Tuning ◽

Entity Recognition ◽

Single Task ◽

Named Entity ◽

Task Learning ◽

Biomedical Named Entity Recognition

Abstract Background Biomedical named entity recognition (BioNER) is a basic and important medical information extraction task to extract medical entities with special meaning from medical texts. In recent years, deep learning has become the main research direction of BioNER due to its excellent data-driven context coding ability. However, in BioNER task, deep learning has the problem of poor generalization and instability. Results we propose the hierarchical shared transfer learning, which combines multi-task learning and fine-tuning, and realizes the multi-level information fusion between the underlying entity features and the upper data features. We select 14 datasets containing 4 types of entities for training and evaluate the model. The experimental results showed that the F1-scores of the five gold standard datasets BC5CDR-chemical, BC5CDR-disease, BC2GM, BC4CHEMD, NCBI-disease and LINNAEUS were increased by 0.57, 0.90, 0.42, 0.77, 0.98 and − 2.16 compared to the single-task XLNet-CRF model. BC5CDR-chemical, BC5CDR-disease and BC4CHEMD achieved state-of-the-art results.The reasons why LINNAEUS’s multi-task results are lower than single-task results are discussed at the dataset level. Conclusion Compared with using multi-task learning and fine-tuning alone, the model has more accurate recognition ability of medical entities, and has higher generalization and stability.

named entity recognition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Joined Type Length Encoding for Nested Named Entity Recognition

A Statistical Language Model for Pre-Trained Sequence Labeling: A Case Study on Vietnamese

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Named entity recognition for Chinese construction documents based on conditional random field

Extraction and Analysis of Social Networks Data to Detect Traffic Accidents

HULTIG-C: NLP Corpus and Services in the Cloud

Borrowing wisdom from world: modeling rich external knowledge for Chinese named entity recognition

Predicting the impact of online news articles – is information necessary?

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Hierarchical shared transfer learning for biomedical named entity recognition

Export Citation Format

named entity recognitionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Joined Type Length Encoding for Nested Named Entity Recognition

A Statistical Language Model for Pre-Trained Sequence Labeling: A Case Study on Vietnamese

LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

Named entity recognition for Chinese construction documents based on conditional random field

Extraction and Analysis of Social Networks Data to Detect Traffic Accidents

HULTIG-C: NLP Corpus and Services in the Cloud

Borrowing wisdom from world: modeling rich external knowledge for Chinese named entity recognition

Predicting the impact of online news articles – is information necessary?

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals

Hierarchical shared transfer learning for biomedical named entity recognition

named entity recognition
Recently Published Documents