Developing and Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition

2018 ◽  
Vol 6 (10) ◽  
pp. 235-248
Author(s):  
Rehan Khan ◽  
A.J. Singh
2021 ◽  
Vol 10 (1) ◽  
pp. 99
Author(s):  
Kenny Kurniadi ◽  
Ngurah Agus Sanjaya ER

Named Entity Recognition (NER) is part of information extraction whose task is to classify text which is categorized into several classes such as names of people (figures), organizations, and locations. In this study, the authors propose making a NER identify the names of characters in Balinese language documents. This study will use a rule-based method (rule-based). Rules are build based on the morphological structure and linguistic meaning of Balinese names. The research conducted, that the system has an accuracy of 67.41%, precision of 83.42%, recall of 77.83%, and F-Score of 80.53%.


2019 ◽  
Vol 37 (6) ◽  
pp. 7401-7413 ◽  
Author(s):  
Redwanul Karim ◽  
M. A. Muhiminul Islam ◽  
Sazid Rahman Simanto ◽  
Saif Ahmed Chowdhury ◽  
Kalyan Roy ◽  
...  

2014 ◽  
Vol 32 (2) ◽  
pp. 276-284 ◽  
Author(s):  
Ping Bao ◽  
Suoling Zhu

Purpose – The purpose of this paper is to present a system for recognition of location names in ancient books written in languages, such as Chinese, in which proper names are not signaled by an initial capital letter. Design/methodology/approach – Rule-based and statistical methods were combined to develop a set of rules for identification of product-related location names in the local chronicles of Guangdong. A name recognition system, with functions of document management, information extraction and storage, rule management, location name recognition, and inquiry and statistics, was developed using Microsoft's .NET framework, SQL Server 2005, ADO.NET and XML. The system was evaluated with precision ratio, recall ratio and the comprehensive index, F. Findings – The system was quite successful at recognizing product-related location names (F was 71.8 percent), demonstrating the potential for application of automatic named entity recognition techniques in digital collation of ancient books such as local chronicles. Research limitations/implications – Results suffered from limitations in initial digitization of the text. Statistical methods, such as the hidden Markov model, should be combined with an extended set of recognition rules to improve recognition scores and system efficiency. Practical implications – Electronic access to local chronicles by location name saves time for chorographers and provides researchers with new opportunities. Social implications – Named entity recognition brings previously isolated ancient documents together in a knowledge base of scholarly and cultural value. Originality/value – Automatic name recognition can be implemented in information extraction from ancient books in languages other than English. The system described here can also be adapted to modern texts and other named entities.


Author(s):  
Ginger Tsueng ◽  
Max Nanis ◽  
Jennifer T Fouquier ◽  
Michael Mayers ◽  
Benjamin M Good ◽  
...  

Abstract Motivation Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depends on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction. Results In this paper, we introduce the Relationship Extraction Module of the web-based application Mark2Cure and demonstrate that citizen scientists can perform relationship extraction. We confirm the importance of accurate named entity recognition on user performance of relationship extraction and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the Mark2Cure Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration, and natural language processing. Availability Mark2Cure platform: https://mark2cure.org. Mark2Cure source code: https://github.com/sulab/mark2cure Data and analysis code for this paper: https://github.com/gtsueng/M2C_rel_nb Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Ginger Tsueng ◽  
Steven M. Nanis ◽  
Jennifer Fouquier ◽  
Benjamin M Good ◽  
Andrew I Su

I.AbstractBiomedical literature represents one of the largest and fastest growing collections of unstructured biomedical knowledge. Finding critical information buried in the literature can be challenging. In order to extract information from freeflowing text, researchers need to: 1. identify the entities in the text (named entity recognition), 2. apply a standardized vocabulary to these entities (normalization), and 3. identify how entities in the text are related to one another (relationship extraction). Researchers have primarily approached these information extraction tasks through manual expert curation, and computational methods. We have previously demonstrated that named entity recognition (NER) tasks can be crowdsourced to a group of nonexperts via the paid microtask platform, Amazon Mechanical Turk (AMT); and can dramatically reduce the cost and increase the throughput of biocuration efforts. However, given the size of the biomedical literature even information extraction via paid microtask platforms is not scalable. With our web-based application Mark2Cure (http://mark2cure.org), we demonstrate that NER tasks can also be performed by volunteer citizen scientists with high accuracy. We apply metrics from the Zooniverse Matrices of Citizen Science Success and provide the results here to serve as a basis of comparison for other citizen science projects. Further, we discuss design considerations, issues, and the application of analytics for successfully moving a crowdsourcing workflow from a paid microtask platform to a citizen science platform. To our knowledge, this study is the first application of citizen science to a natural language processing task.


Sign in / Sign up

Export Citation Format

Share Document