A real time Named Entity Recognition system for Arabic text mining

AbstractMost BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision of recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. The new BioCreative/BeCalm TIPS task focuses purely on these. To participate in this task, I implemented a BeCalm API within the real-time tagging server also used by the Reflect and EXTRACT tools. In addition to retrieval of patent abstracts, PubMed abstracts, and Pub-Med Central open-access articles as required in the TIPS task, the BeCalm API implementation facilitates retrieval of documents from other sources specified as custom request parameters. As in earlier tests, the tagger proved to be both highly efficient and stable, being able to consistently process requests of 5000 abstracts in less than half a minute including retrieval of the document text.

Download Full-text

Text mining of 15 million full-text scientific articles

10.1101/162099 ◽

2017 ◽

Cited By ~ 5

Author(s):

David Westergaard ◽

Hans-Henrik Stærfeldt ◽

Christian Tønsberg ◽

Lars Juhl Jensen ◽

Søren Brunak

Keyword(s):

Text Mining ◽

Full Text ◽

Disease Gene ◽

Scientific Literature ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Data Sets ◽

Named Entity ◽

Benchmark Data

AbstractAcross academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823–2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein–protein, disease–gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

Download Full-text

Real-time tagging of biomedical entities

10.1101/078469 ◽

2016 ◽

Author(s):

Evangelos Pafilis ◽

Lars Juhl Jensen

Keyword(s):

Real Time ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Manual Annotation ◽

Automatic Annotation ◽

Web Resources ◽

Named Entity ◽

Bulk Processing

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labor intensive. We have developed a fast dictionary-based named entity recognition system, which is used for both real-time and bulk processing of text in a variety of biomedical web resources. We propose to adapt the system to make it interoperable with the PubAnnotation and Open Annotation standards.

Download Full-text

PIN64 Identifying COVID-19 Patients from Unstructured Notes: Performance of a Commercial Clinical Named Entity Recognition System

Value in Health ◽

10.1016/j.jval.2021.04.1252 ◽

2021 ◽

Vol 24 ◽

pp. S117-S118

Author(s):

V. Kumar ◽

L. Rasouliyan ◽

S. Long ◽

M.B. Rao

Keyword(s):

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Named Entity

Download Full-text

A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107601 ◽

2021 ◽

pp. 107601

Author(s):

Archana Goyal ◽

Vishal Gupta ◽

Manish Kumar

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity

Download Full-text

Adaptation of a named entity recognition system for the ESTER 2 evaluation campaign

2009 International Conference on Natural Language Processing and Knowledge Engineering ◽

10.1109/nlpke.2009.5313775 ◽

2009 ◽

Cited By ~ 2

Author(s):

Caroline Brun ◽

Maud Ehrmann

Keyword(s):

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Named Entity

Download Full-text

Constructing Uyghur Named Entity Recognition System Using Neural Machine Translation Tag Projection

Lecture Notes in Computer Science - Chinese Computational Linguistics ◽

10.1007/978-3-030-63031-7_18 ◽

2020 ◽

pp. 247-260

Author(s):

Azmat Anwar ◽

Xiao Li ◽

Yating Yang ◽

Rui Dong ◽

Turghun Osman

Keyword(s):

Machine Translation ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Neural Machine Translation ◽

Named Entity

Download Full-text

SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations

BMC Bioinformatics ◽

10.1186/s12859-021-04397-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Nícia Rosário-Ferreira ◽

Victor Guimarães ◽

Vítor S. Costa ◽

Irina S. Moreira

Keyword(s):

Text Mining ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Disease Similarity ◽

Disease Associations ◽

Named Entity Normalization ◽

Mining Tool ◽

Or Gene ◽

Text Mining Tool

Abstract Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.

Download Full-text