Supporting Named Entity Recognition and Document Classification for Effective Text Retrieval

Mapping Intimacies ◽

10.5772/intechopen.95076 ◽

2021 ◽

Author(s):

Philippe Tamla ◽

Florian Freund ◽

Matthias Hemmje

Keyword(s):

Machine Learning ◽

Knowledge Management ◽

Management System ◽

Named Entity Recognition ◽

Document Classification ◽

Knowledge Management System ◽

Entity Recognition ◽

Text Documents ◽

Named Entity ◽

Target Environment

In this research paper, we present a system for named entity recognition and automatic document classification in an innovative knowledge management system for Applied Gaming. The objective of this project is to facilitate the management of machine learning-based named entity recognition models, that can be used for both: extracting different types of named entities and classifying text documents from different sources on the Web. We present real-world use case scenarios and derive features for training and managing NER models with the Stanford NLP machine learning API. Then, the integration of our developed NER system with an expert rule-based system is presented, which allows an automatic classification of text documents into different taxonomy categories available in the knowledge management system. Finally, we present the results of two evaluations. First, a functional evaluation that demonstrates the portability of our NER system using a standard text corpus in the medical area. Second, a qualitative evaluation that was conducted to optimize the overall user interface of our system and enable a suitable integration into the target environment.

Download Full-text

Supporting Named Entity Recognition and Document Classification in a Knowledge Management System for Applied Gaming

Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.5220/0010145001080121 ◽

2020 ◽

Author(s):

Philippe Tamla ◽

Florian Freund ◽

Matthias Hemmje

Keyword(s):

Knowledge Management ◽

Management System ◽

Named Entity Recognition ◽

Document Classification ◽

Knowledge Management System ◽

Entity Recognition ◽

Named Entity

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text

Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00008 ◽

2019 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Tomasz Oliwa ◽

Steven B. Maron ◽

Leah M. Chase ◽

Samantha Lomnicki ◽

Daniel V.T. Catenacci ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Classification Model ◽

Supervised Machine Learning ◽

Named Entity ◽

Pathology Reports

PURPOSE Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.

Download Full-text

A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval ◽

10.1145/3443279.3443307 ◽

2020 ◽

Author(s):

Rafiullah Momand ◽

Shakirullah Waseeb ◽

Ahmad Masood Latif Rai

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

SCIENTIFIC NAMED ENTITY RECOGNITION WITH THE HELP OF MODERN METHODS

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.11 ◽

2021 ◽

Vol 75 (3) ◽

pp. 94-99

Author(s):

A.M. Yelenov ◽

◽

A.B. Jaxylykova ◽

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Support Vector ◽

Scientific Article ◽

Natural Languages ◽

Named Entity ◽

Learning Area

This research focuses on a comparative study of the Named Entity Recognition task for scientific article texts. Natural language processing could be considered as one of the cornerstones in the machine learning area which devotes its attention to the problems connected with the understanding of different natural languages and linguistic analysis. It was already shown that current deep learning techniques have a good performance and accuracy in such areas as image recognition, pattern recognition, computer vision, that could mean that such technology probably would be successful in the neuro-linguistic programming area too and lead to a dramatic increase on the research interest on this topic. For a very long time, quite trivial algorithms have been used in this area, such as support vector machines or various types of regression, basic encoding on text data was also used, which did not provide high results. The following dataset was used to process the experiment models: Dataset Scientific Entity Relation Core. The algorithms used were Long short-term memory, Random Forest Classifier with Conditional Random Fields, and Named-entity recognition with Bidirectional Encoder Representations from Transformers. In the findings, the metrics scores of all models were compared to each other to make a comparison. This research is devoted to the processing of scientific articles, concerning the machine learning area, because the subject is not investigated on enough properly level.The consideration of this task can help machines to understand natural languages better, so that they can solve other neuro-linguistic programming tasks better, enhancing scores in common sense.

Download Full-text

A systematic exposition of Punjabi Named Entity Recognition using different Machine Learning models

10.1109/icirca51532.2021.9544894 ◽

2021 ◽

Author(s):

Amandeep Kaur ◽

Sonam Khattar

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Models ◽

Named Entity ◽

Systematic Exposition ◽

Machine Learning Models

Download Full-text

Machine Learning Algorithms for Portuguese Named Entity Recognition

INTELIGENCIA ARTIFICIAL ◽

10.4114/ia.v11i36.893 ◽

2007 ◽

Vol 11 (36) ◽

Cited By ~ 7

Author(s):

R. L. Milidiú ◽

J. C. Duarte ◽

R. Cavalcante

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Named Entity Recognition ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Named Entity

Download Full-text

Bringing Named Entity Recognition on Drupal Content Management System

Advances in Intelligent Systems and Computing - 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014) ◽

10.1007/978-3-319-07581-5_31 ◽

2014 ◽

pp. 261-268 ◽

Cited By ~ 1

Author(s):

José Ferrnandes ◽

Anália Lourenço

Keyword(s):

Management System ◽

Named Entity Recognition ◽

Content Management ◽

Entity Recognition ◽

Content Management System ◽

Named Entity

Download Full-text

A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-59569-6_30 ◽

2017 ◽

pp. 254-259 ◽

Cited By ~ 2

Author(s):

Mete Taşpınar ◽

Murat Can Ganiz ◽

Tankut Acarman

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Word Embeddings ◽

Named Entity ◽

Simple Machine ◽

Machine Learning Approach ◽

Feature Based

Download Full-text

Using machine learning to maintain rule-based named-entity recognition and classification systems

10.3115/1073012.1073067 ◽

2001 ◽

Cited By ~ 17

Author(s):

Georgios Petasis ◽

Frantz Vichot ◽

Francis Wolinski ◽

Georgios Paliouras ◽

Vangelis Karkaletsis ◽

...

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Classification Systems ◽

Entity Recognition ◽

Rule Based ◽

Named Entity

Download Full-text