annotation system
Recently Published Documents


TOTAL DOCUMENTS

296
(FIVE YEARS 55)

H-INDEX

22
(FIVE YEARS 3)

2021 ◽  
pp. 567-583
Author(s):  
Pranav Guruprasad ◽  
S. Sujith Kumar ◽  
C. Vigneswaran ◽  
V. Srinivasa Chakravarthy

2021 ◽  
Vol 19 (3) ◽  
pp. e23
Author(s):  
Sizhuo Ouyang ◽  
Yuxing Wang ◽  
Kaiyin Zhou ◽  
Jingbo Xia

Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.


JAMIA Open ◽  
2021 ◽  
Author(s):  
Himanshu S Sahoo ◽  
Greg M Silverman ◽  
Nicholas E Ingraham ◽  
Monica I Lupei ◽  
Michael A Puskarich ◽  
...  

Abstract Objective With COVID-19 there was a need for rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from high resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution. Materials and Methods Performance, resource utilization and runtime of the rule-based gazetteer was compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP and MedTagger. Results This rule-based gazetteer was fastest, had low resource footprint and similar performance for weighted micro-average and macro-average measures of precision, recall and f1-score compared to other annotation systems. Discussion Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime. Conclusion This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of health care settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of post-acute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime and similar weighted micro-average and macro-average measures for precision, recall and f1-score compared to industry standard annotation systems. Lay Summary With COVID-19 came an unprecedented need to identify symptoms of COVID-19 patients under investigation (PUIs) in a time sensitive, resource-efficient and accurate manner. While available annotation systems perform well for smaller healthcare settings, they fail to scale in larger healthcare systems where 10,000+ clinical notes are generated a day. This study covers 3 improvements addressing key limitations of current annotation systems. (1) High resource utilization and poor scalability of existing annotation systems. The presented rule-based gazetteer is a high-throughput annotation system for processing high volume of notes, thus, providing opportunity for clinicians to make more informed time-sensitive decisions around patient care. (2) Equally important is our developed rule-based gazetteer performs similar or better than current annotation systems for symptom identification. (3) Due to minimal resource needs of the rule-based gazetteer, it could be deployed at healthcare sites lacking a robust infrastructure where industry standard annotation systems cannot be deployed because of low resource availability.


2021 ◽  
Vol 1 (XXIII) ◽  
pp. 101-122
Author(s):  
Katarzyna Góra

Valence dictionaries are very often specialized works for advanced readers which present how particular linguistic units combine with its subordinates. The article is a critical analysis of a dictionary entry for the lexical unit of reward contained in A Valency Dictionary of English, a Corpus-Based Analysis of the Complementation Patterns of English Verbs, Nouns and Adjectives [2004]. A complementary proposal regarding the predicate-argument structure and its annotation system will be provided based on the theoretical model proposed by S. Karolak [1984; 2002] called Semantic Syntax (SS) and more specifically its extended model called explicative syntax [Kiklewicz et al. 2010; 2019]. The research findings demonstrate the need for coordinated international projects that should integrate both the syntactic as well as the semantic levels in order to gradually meet the objective of an integrated language description encompassing both the grammar and the lexicon.


2021 ◽  
Vol 11 (10) ◽  
pp. 4378
Author(s):  
Davide Colla ◽  
Annamaria Goy ◽  
Marco Leontino ◽  
Diego Magro

The research question this paper aims at answering is the following: In an ontology-driven annotation system, can the information extracted from external resources (namely, Wikidata) provide users with useful suggestions in the characterization of entities used for the annotation of documents from historical archives? The context of the research is the PRiSMHA project, in which the main goal is the development of a proof-of-concept prototype ontology-driven system for semantic metadata generation. The assumption behind this effort is that an effective access to historical archives needs a rich semantic knowledge, relying on a domain ontology, that describes the content of archival resources. In the paper, we present a new feature of the annotation system: when characterizing a new entity (e.g., a person), some properties describing it are automatically pre-filled in, and more complex semantic representations (e.g., events the entity is involved in) are suggested; both kinds of suggestions are based on information retrieved from Wikidata. In the paper, we describe the automatic algorithm devised to support the definition of the mappings between the Wikidata semantic model and the PRiSMHA ontology, as well as the process used to extract information from Wikidata and to generate suggestions based on the defined mappings. Finally, we discuss the results of a qualitative evaluation of the suggestions, which provides a positive answer to the initial research question and indicates possible improvements.


2021 ◽  
Vol 4 (2) ◽  
pp. 17-57
Author(s):  
Piet Mertens

This paper first proposes a labeling scheme for tonal aspects of speech and then describes an automatic annotation system using this transcription. This fine-grained transcription provides labels indicating pitch level and pitch movement of individual syllables. Of the five pitch levels, three (low, mid, high) are defined on the basis of pitch changes in the local context and two (bottom, top) are defined relative to the boundaries of the speaker’s global pitch range. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size, using size categories (pitch intervals) adjusted relative to the speaker’s pitch range. The automatic tonal annotation system combines several processing steps: segmentation into syllable peaks, pause detection, pitch stylization, pitch range estimation, classification of the intra-syllabic pitch contour, and pitch level assignment. It uses a dedicated and rule-based procedure, which unlike commonly used supervised learning techniques does not require a labeled corpus for training the model. The paper also includes a preliminary evaluation of the annotation system, for a reference corpus of nearly 14 minutes of spontaneous speech in French and Dutch, in order to quantify the annotation errors. The results, expressed in terms of standard measures of precision, recall, accuracy and Fmeasure are encouraging. For pitch levels low, mid and high an F-measure between 0.946 and 0.815 is obtained and for pitch movements a value between 0.708 and 1. Provided additional modules for the detection of prominence and prosodic boundaries, the resulting annotation may serve as an input for a phonological annotation.  


Sign in / Sign up

Export Citation Format

Share Document