name entity recognition
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 22)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
pp. 107558
Author(s):  
Zhao Fang ◽  
Qiang Zhang ◽  
Stanley Kok ◽  
Ling Li ◽  
Anning Wang ◽  
...  

2021 ◽  
Author(s):  
Dao-Ling Huang ◽  
Quanlei Zeng ◽  
Yun Xiong ◽  
Shuixia Liu ◽  
Chaoqun Pang ◽  
...  

A combined high-quality manual annotation and deep-learning natural language processing study is reported to make accurate name entity recognition (NER) for biomedical literatures. A home-made version of entity annotation guidelines on biomedical literatures was constructed. Our manual annotations have an overall over 92% consistency for all the four entity types such as gene, variant, disease and species with the same publicly available annotated corpora from other experts previously. A total of 400 full biomedical articles from PubMed are annotated based on our home-made entity annotation guidelines. Both a BERT-based large model and a DistilBERT-based simplified model were constructed, trained and optimized for offline and online inference, respectively. The F1-scores of NER of gene, variant, disease and species for the BERT-based model are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those for the DistilBERT-based model are 95.14%, 86.26%, 91.37% and 89.92%, respectively. The F1 scores of the DistilBERT-based NER model retains 97.8%, 92.2%, 98.7% and 93.9% of those of BERT-based NER for gene, variant, disease and species, respectively. Moreover, the performance for both our BERT-based NER model and DistilBERT-based NER model outperforms that of the state-of-art model,BioBERT, indicating the significance to train an NER model on biomedical-domain literatures jointly with high-quality annotated datasets.


Author(s):  
Rufai Yusuf Zakari ◽  
Zaharaddeen Karami Lawal ◽  
Idris Abdulmumin

The processing of natural languages is an area of computer science that has gained growing attention recently. NLP helps computers recognize, in other words, the ways in which people use their language. NLP research, however, has been performed predominantly on languages with abundant quantities of annotated data, such as English, French, German and Arabic. While the Hausa Language is Africa's second most commonly used language, only a few studies have so far focused on Hausa Natural Language Processing (HNLP). In this research paper, using a keyword index and article title search, we present a systematic analysis of the current literature applicable to HNLP in the Google Scholar database from 2015 to June 2020. A very few research papers on HNLP research, especially in areas such as part-of-speech tagging (POS), Name Entity Recognition (NER), Words Embedding, Speech Recognition and Machine Translation, have just recently been released. This is due to the fact that for training intelligent models, NLP depends on a huge amount of human-annotated data. HNLP is now attracting researchers' attention after extensive research on NLP in English and other languages has been performed. The key objectives of this paper are to promote research, to define likely areas for future studies in the HNLP, and to assist in the creation of further examinations by researchers for relevant studies.


Informatics ◽  
2021 ◽  
Vol 8 (2) ◽  
pp. 37
Author(s):  
Gonçalo Carnaz ◽  
Vitor Beires Nogueira ◽  
Mário Antunes

Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.


Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 786
Author(s):  
Siqi Chen ◽  
Yijie Pei ◽  
Zunwang Ke ◽  
Wushour Silamu

Named entity recognition (NER) is an important task in the processing of natural language, which needs to determine entity boundaries and classify them into pre-defined categories. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to obtain high performance. However, there is minimal annotated data available about Uyghur and Hungarian (UH languages) NER tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for named entity recognition tasks: fine-tuning the pre-trained language model. Therefore, we propose a fine-tuning method for a low-resource language model, which constructs a fine-tuning dataset through data augmentation; then the dataset of a high-resource language is added; and finally the cross-language pre-trained model is fine-tuned on this dataset. In addition, we propose an attention-based fine-tuning strategy that uses symmetry to better select relevant semantic and syntactic information from pre-trained language models and apply these symmetry features to name entity recognition tasks. We evaluated our approach on Uyghur and Hungarian datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the available resources for named entity recognition and some of the open research questions.


2021 ◽  
Vol 2020 (1) ◽  
pp. 319-327
Author(s):  
Alif Andika Putra ◽  
Robert Kurniawan

Provinsi DKI Jakarta merupakan salah satu daerah rawan terjadi kebakaran. BPBD DKI Jakarta sebagai Lembaga penanggulangan bencana memiliki salah satu misi yaitu meningkatkan kesiagaan masyarakat kota Jakarta terhadap bencana, salah satunya bencana kebakaran. Peningkatan kesiagaan terhadap bencana kebakaran dapat dilakukan dengan penyajian informasi mengenai lokasi rawan terjadinya kebakaran. BPBD DKI Jakarta dalam hal ini dapat memanfaatkan perkembangan teknologi informasi dan komunikasi, seperti internet sebagai sumber daya informasi. Persebaran informasi melalui internet salah satunya dimuat dalam bentuk web berita online. Informasi yang terdapat pada artikel berita online dapat dijadikan sebagai sumber informasi dalam memperoleh data. Suatu rangkaian proses diperlukan untuk dapat mengekstraksi informasi yang ada didalam artikel berita online. Pada penelitian ini, Eksraksi informasi pada artikel berita online dilakukan dengan mengklasifikasi entity ke dalam kelas-kelas tertentu menggunakan Name Entity Recognition (NER) dengan pendekatan deep learning hybrid network model Bidirectional LSTM-CNNs (BLSTM-CNNs). Penelitian ini menunjukan model NER dengan BLSTM-CNNs memiliki performa yang baik berdasarkan hasil perhitungan F1-score, presisi dan recall. Kemudian, dilakukan pemetaan berdasarkan entity lokasi yang terdapat dalam artikel berita online hasil klasifikasi menggunakan model NER dengan BLSTM-CNNs.


Author(s):  
Chantana Chantrapornchai ◽  
Aphisit Tunsakul

In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classification and text summarization. The first step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.


Author(s):  
Mahmoud Azimaee ◽  
Gangamma Kalappa ◽  
Nikola Milosevic ◽  
Goran Nenadic ◽  
Hesam Dadafarin ◽  
...  

IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and ApproachWe aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. ResultsPreliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / ImplicationsA collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders.


2020 ◽  
Vol 31 (2) ◽  
pp. 31-38
Author(s):  
Marwa Elgamal ◽  
Mohamed Abou-Kreisha ◽  
Reda Abo Elezz ◽  
Salwa Hamada

Author(s):  
Nadhia Salsabila Azzahra ◽  
Muhammad Okky Ibrohim ◽  
Junaedi Fahmi ◽  
Bagus Fajar Apriyanto ◽  
Oskar Riandi

Sign in / Sign up

Export Citation Format

Share Document