name entity
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 20)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
pp. 107558
Author(s):  
Zhao Fang ◽  
Qiang Zhang ◽  
Stanley Kok ◽  
Ling Li ◽  
Anning Wang ◽  
...  

2021 ◽  
Author(s):  
Dao-Ling Huang ◽  
Quanlei Zeng ◽  
Yun Xiong ◽  
Shuixia Liu ◽  
Chaoqun Pang ◽  
...  

A combined high-quality manual annotation and deep-learning natural language processing study is reported to make accurate name entity recognition (NER) for biomedical literatures. A home-made version of entity annotation guidelines on biomedical literatures was constructed. Our manual annotations have an overall over 92% consistency for all the four entity types such as gene, variant, disease and species with the same publicly available annotated corpora from other experts previously. A total of 400 full biomedical articles from PubMed are annotated based on our home-made entity annotation guidelines. Both a BERT-based large model and a DistilBERT-based simplified model were constructed, trained and optimized for offline and online inference, respectively. The F1-scores of NER of gene, variant, disease and species for the BERT-based model are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those for the DistilBERT-based model are 95.14%, 86.26%, 91.37% and 89.92%, respectively. The F1 scores of the DistilBERT-based NER model retains 97.8%, 92.2%, 98.7% and 93.9% of those of BERT-based NER for gene, variant, disease and species, respectively. Moreover, the performance for both our BERT-based NER model and DistilBERT-based NER model outperforms that of the state-of-art model,BioBERT, indicating the significance to train an NER model on biomedical-domain literatures jointly with high-quality annotated datasets.


Informatics ◽  
2021 ◽  
Vol 8 (2) ◽  
pp. 37
Author(s):  
Gonçalo Carnaz ◽  
Vitor Beires Nogueira ◽  
Mário Antunes

Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.


2021 ◽  
Vol 2020 (1) ◽  
pp. 319-327
Author(s):  
Alif Andika Putra ◽  
Robert Kurniawan

Provinsi DKI Jakarta merupakan salah satu daerah rawan terjadi kebakaran. BPBD DKI Jakarta sebagai Lembaga penanggulangan bencana memiliki salah satu misi yaitu meningkatkan kesiagaan masyarakat kota Jakarta terhadap bencana, salah satunya bencana kebakaran. Peningkatan kesiagaan terhadap bencana kebakaran dapat dilakukan dengan penyajian informasi mengenai lokasi rawan terjadinya kebakaran. BPBD DKI Jakarta dalam hal ini dapat memanfaatkan perkembangan teknologi informasi dan komunikasi, seperti internet sebagai sumber daya informasi. Persebaran informasi melalui internet salah satunya dimuat dalam bentuk web berita online. Informasi yang terdapat pada artikel berita online dapat dijadikan sebagai sumber informasi dalam memperoleh data. Suatu rangkaian proses diperlukan untuk dapat mengekstraksi informasi yang ada didalam artikel berita online. Pada penelitian ini, Eksraksi informasi pada artikel berita online dilakukan dengan mengklasifikasi entity ke dalam kelas-kelas tertentu menggunakan Name Entity Recognition (NER) dengan pendekatan deep learning hybrid network model Bidirectional LSTM-CNNs (BLSTM-CNNs). Penelitian ini menunjukan model NER dengan BLSTM-CNNs memiliki performa yang baik berdasarkan hasil perhitungan F1-score, presisi dan recall. Kemudian, dilakukan pemetaan berdasarkan entity lokasi yang terdapat dalam artikel berita online hasil klasifikasi menggunakan model NER dengan BLSTM-CNNs.


Author(s):  
Chantana Chantrapornchai ◽  
Aphisit Tunsakul

In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classification and text summarization. The first step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.


Author(s):  
Russa Biswas ◽  
Radina Sofronova ◽  
Mehwish Alam ◽  
Nicolas Heist ◽  
Heiko Paulheim ◽  
...  
Keyword(s):  

2020 ◽  
Vol 9 (6) ◽  
pp. 1-22
Author(s):  
Omar ASBAYOU

This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine classification of Arabic NE. These patterns use syntactico-semantic combination of morpho-syntactic and syntactic entities. It also uses lexical classification of trigger words and NE extensions. These linguistic data are essential not only to name entity extraction but also to the taxonomic classification and to determining the NE frontiers. Our method is also based on the contextualisation and on the notion of NE class attributes and values. Inspired from X-bar theory and immediate constituents, we built a rule-based NER system composed of five levels of syntactico-semantic combination. We also show how the fine NE annotations in our system output (XML database) is exploited in information retrieval and information extraction.


Author(s):  
Mahmoud Azimaee ◽  
Gangamma Kalappa ◽  
Nikola Milosevic ◽  
Goran Nenadic ◽  
Hesam Dadafarin ◽  
...  

IntroductionA significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and ApproachWe aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. ResultsPreliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / ImplicationsA collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders.


2020 ◽  
Vol 31 (2) ◽  
pp. 31-38
Author(s):  
Marwa Elgamal ◽  
Mohamed Abou-Kreisha ◽  
Reda Abo Elezz ◽  
Salwa Hamada

Author(s):  
Nadhia Salsabila Azzahra ◽  
Muhammad Okky Ibrohim ◽  
Junaedi Fahmi ◽  
Bagus Fajar Apriyanto ◽  
Oskar Riandi

Sign in / Sign up

Export Citation Format

Share Document