Named entity recognition goes to old regime France: geographic text analysis for early modern French corpora

PurposeBy mapping-out the capabilities, challenges and limitations of named-entity recognition (NER), this article aims to synthesise the state of the art of NER in the context of the early modern research field and to inform discussions about the kind of resources, methods and directions that may be pursued to enrich the application of the technique going forward.Design/methodology/approachThrough an extensive literature review, this article maps out the current capabilities, challenges and limitations of NER and establishes the state of the art of the technique in the context of the early modern, digitally augmented research field. It also presents a new case study of NER research undertaken by Enlightenment Architectures: Sir Hans Sloane's Catalogues of his Collections (2016–2021), a Leverhulme funded research project and collaboration between the British Museum and University College London, with contributing expertise from the British Library and the Natural History Museum.FindingsCurrently, it is not possible to benchmark the capabilities of NER as applied to documents of the early modern period. The authors also draw attention to the situated nature of authority files, and current conceptualisations of NER, leading them to the conclusion that more robust reporting and critical analysis of NER approaches and findings is required.Research limitations/implicationsThis article examines NER as applied to early modern textual sources, which are mostly studied by Humanists. As addressed in this article, detailed reporting of NER processes and outcomes is not necessarily valued by the disciplines of the Humanities, with the result that it can be difficult to locate relevant data and metrics in project outputs. The authors have tried to mitigate this by contacting projects discussed in this paper directly, to further verify the details they report here.Practical implicationsThe authors suggest that a forum is needed where tools are evaluated according to community standards. Within the wider NER community, the MUC and ConLL corpora are used for such experimental set-ups and are accompanied by a conference series, and may be seen as a useful model for this. The ultimate nature of such a forum must be discussed with the whole research community of the early modern domain.Social implicationsNER is an algorithmic intervention that transforms data according to certain rules-, patterns- or training data and ultimately affects how the authors interpret the results. The creation, use and promotion of algorithmic technologies like NER is not a neutral process, and neither is their output A more critical understanding of the role and impact of NER on early modern documents and research and focalization of some of the data- and human-centric aspects of NER routines that are currently overlooked are called for in this paper.Originality/valueThis article presents a state of the art snapshot of NER, its applications and potential, in the context of early modern research. It also seeks to inform discussions about the kinds of resources, methods and directions that may be pursued to enrich the application of NER going forward. It draws attention to the situated nature of authority files, and current conceptualisations of NER, and concludes that more robust reporting of NER approaches and findings are urgently required. The Appendix sets out a comprehensive summary of digital tools and resources surveyed in this article.

Download Full-text

SZTE-NLP: Clinical Text Analysis with Named Entity Recognition

10.3115/v1/s14-2108 ◽

2014 ◽

Author(s):

Melinda Katona ◽

Richárd Farkas

Keyword(s):

Text Analysis ◽

Named Entity Recognition ◽

Entity Recognition ◽

Clinical Text ◽

Named Entity

Download Full-text

MetaMap Lite: an evaluation of a new Java implementation of MetaMap

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocw177 ◽

2017 ◽

Vol 24 (4) ◽

pp. 841-844 ◽

Cited By ~ 37

Author(s):

Dina Demner-Fushman ◽

Willie J Rogers ◽

Alan R Aronson

Keyword(s):

Text Analysis ◽

Text Processing ◽

Named Entity Recognition ◽

Biomedical Literature ◽

Entity Recognition ◽

Biomedical Text ◽

Clinical Text ◽

Unified Medical Language System ◽

Named Entity ◽

Medical Language

Abstract MetaMap is a widely used named entity recognition tool that identifies concepts from the Unified Medical Language System Metathesaurus in text. This study presents MetaMap Lite, an implementation of some of the basic MetaMap functions in Java. On several collections of biomedical literature and clinical text, MetaMap Lite demonstrated real-time speed and precision, recall, and F1 scores comparable to or exceeding those of MetaMap and other popular biomedical text processing tools, clinical Text Analysis and Knowledge Extraction System (cTAKES) and DNorm.

Download Full-text

Developing and Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.235248 ◽

2018 ◽

Vol 6 (10) ◽

pp. 235-248

Author(s):

Rehan Khan ◽

A.J. Singh

Keyword(s):

Information Extraction ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Arabic named entity recognition using optimized feature sets

10.3115/1613715.1613755 ◽

2008 ◽

Cited By ~ 38

Author(s):

Yassine Benajiba ◽

Mona Diab ◽

Paolo Rosso

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Feature Sets ◽

Named Entity

Download Full-text

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

10.3115/v1/p14-5003 ◽

2014 ◽

Cited By ~ 25

Author(s):

Jana Straková ◽

Milan Straka ◽

Jan Hajič

Keyword(s):

Open Source ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Pos Tagging

Download Full-text

Semantische Suche nach wissenschaftlichen Videos – Automatische Verschlagwortung durch Named Entity Recognition

Zeitschrift für Bibliothekswesen und Bibliographie ◽

10.3196/18642950146145154 ◽

2014 ◽

Vol 61 (4-5) ◽

pp. 254-258 ◽

Cited By ~ 2

Author(s):

Margret Plank ◽

Sven Strobel

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text