automatic annotation
Recently Published Documents


TOTAL DOCUMENTS

453
(FIVE YEARS 130)

H-INDEX

22
(FIVE YEARS 3)

2022 ◽  
Vol 12 (1) ◽  
pp. 25
Author(s):  
Varvara Koshman ◽  
Anastasia Funkner ◽  
Sergey Kovalchuk

Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.


2022 ◽  
Vol 40 (1) ◽  
pp. 71-82
Author(s):  
Shogo Okano ◽  
Tatsuhito Makino ◽  
Kosei Demura

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xuan Gu ◽  
Zhengya Sun ◽  
Wensheng Zhang

Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.


2021 ◽  
Vol 72 (2) ◽  
pp. 383-393
Author(s):  
Svetlozara Leseva ◽  
Ivelina Stoyanova ◽  
Hristina Kukova

Abstract The paper presents work in progress on the compilation and automatic annotation of a dataset comprising examples of stative verbs in parallel Bulgarian-Russian corpora with the goal of facilitating the elaboration of a classification of stative verbs in the two languages based on their lexical and semantic properties. We extract stative verbs from the Bulgarian and the Russian WordNets with their assigned conceptual information (frames) from FrameNet. We then assign the set of probable Bulgarian and Russian stative verbs to the verb instances in a parallel Bulgarian-Russian corpus using WordNet correspondences to filter out unlikely stative candidates. Further, manual inspection will ensure high quality of the resource and its application for the purposes of semantic analysis.


2021 ◽  
Vol 3 ◽  
Author(s):  
Dennis Fassmeyer ◽  
Gabriel Anzer ◽  
Pascal Bauer ◽  
Ulf Brefeld

We study the automatic annotation of situations in soccer games. At first sight, this translates nicely into a standard supervised learning problem. However, in a fully supervised setting, predictive accuracies are supposed to correlate positively with the amount of labeled situations: more labeled training data simply promise better performance. Unfortunately, non-trivially annotated situations in soccer games are scarce, expensive and almost always require human experts; a fully supervised approach appears infeasible. Hence, we split the problem into two parts and learn (i) a meaningful feature representation using variational autoencoders on unlabeled data at large scales and (ii) a large-margin classifier acting in this feature space but utilize only a few (manually) annotated examples of the situation of interest. We propose four different architectures of the variational autoencoder and empirically study the detection of corner kicks, crosses and counterattacks. We observe high predictive accuracies above 90% AUC irrespectively of the task.


2021 ◽  
Author(s):  
Varvara Koshman ◽  
Anastasia Funkner ◽  
Sergey Kovalchuk

Electronic Medical Records (EMR) contain a lot of valuable data about patients, which is however unstructured. There is a lack of labeled medical text data in Russian and there are no tools for automatic annotation. We present an unsupervised approach to medical data annotation. Morphological and syntactical analyses of initial sentences produce syntactic trees, from which similar subtrees are then grouped by Word2Vec and labeled using dictionaries and Wikidata categories. This method can be used to automatically label EMRs in Russian and proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabularies.


Author(s):  
Asim Abbas ◽  
Muhammad Afzal ◽  
Jamil Hussain ◽  
Taqdir Ali ◽  
Hafiz Syed Muhammad Bilal ◽  
...  

Extracting clinical concepts, such as problems, diagnosis, and treatment, from unstructured clinical narrative documents enables data-driven approaches such as machine and deep learning to support advanced applications such as clinical decision-support systems, the assessment of disease progression, and the intelligent analysis of treatment efficacy. Various tools such as cTAKES, Sophia, MetaMap, and other rules-based approaches and algorithms have been used for automatic concept extraction. Recently, machine- and deep-learning approaches have been used to extract, classify, and accurately annotate terms and phrases. However, the requirement of an annotated dataset, which is labor-intensive, impedes the success of data-driven approaches. A rule-based mechanism could support the process of annotation, but existing rule-based approaches fail to adequately capture contextual, syntactic, and semantic patterns. This study intends to introduce a comprehensive rule-based system that automatically extracts clinical concepts from unstructured narratives with higher accuracy and transparency. The proposed system is a pipelined approach, capable of recognizing clinical concepts of three types, problem, treatment, and test, in the dataset collected from a published repository as a part of the I2b2 challenge 2010. The system’s performance is compared with that of three existing systems: Quick UMLS, BIO-CRF, and the Rules (i2b2) model. Compared to the baseline systems, the average F1-score of 72.94% was found to be 13% better than Quick UMLS, 3% better than BIO CRF, and 30.1% better than the Rules (i2b2) model. Individually, the system performance was noticeably higher for problem-related concepts, with an F1-score of 80.45%, followed by treatment-related concepts and test-related concepts, with F1-scores of 76.06% and 55.3%, respectively. The proposed methodology significantly improves the performance of concept extraction from unstructured clinical narratives by exploiting the linguistic and lexical semantic features. The approach can ease the automatic annotation process of clinical data, which ultimately improves the performance of supervised data-driven applications trained with these data.


Sign in / Sign up

Export Citation Format

Share Document