Indexed Text-Analysis

1974 ◽  
Vol 13 (03) ◽  
pp. 179-183 ◽  
Author(s):  
K. Kayser ◽  
W. W. Höpker ◽  
U. Müller

General conditions for medical text analysis are discussed. By means of formal description the errors which occur during manual codification with the over-cross method are analysed by distribution in different classes of diagnoses. It is pointed out that the largest error arises through incorrect correlation of the diagnoses in the summary of findings with those of the thesaurus and that, furthermore, a thesaurus of 4,500 medical terms is not sufficient for documentation in pathology. The entropy losses were only slightly larger than the losses of diagnoses calculated by percentage. The distribution of the classes of diagnoses follows a general statistical theory. In the over-cross method a loss of information Iμ = 1.532 in a total entropy of HD = 5.789 must be reckoned with as shown in an example.

1991 ◽  
Vol 30 (04) ◽  
pp. 275-283 ◽  
Author(s):  
P. M. Pietrzyk

Abstract:Much information about patients is stored in free text. Hence, the computerized processing of medical language data has been a well-known goal of medical informatics resulting in different paradigms. In Gottingen, a Medical Text Analysis System for German (abbr. MediTAS) has been under development for some time, trying to combine and to extend these paradigms. This article concentrates on the automated syntax analysis of German medical utterances. The investigated text material consists of 8,790 distinct utterances extracted from the summary sections of about 18,400 cytopathological findings reports. The parsing is based upon a new approach called Left-Associative Grammar (LAG) developed by Hausser. By extending considerably the LAG approach, most of the grammatical constructions occurring in the text material could be covered.


Author(s):  
Karl G. Jöreskog ◽  
Ulf H. Olsson ◽  
Fan Y. Wallentin

2021 ◽  
Author(s):  
Yunjin Yum ◽  
Jeong Moon Lee ◽  
Moon Joung Jang ◽  
Yoojoong Kim ◽  
Jong-Ho Kim ◽  
...  

BACKGROUND The fact that medical terms require special expertise and are becoming increasingly complex makes it difficult to employ natural language processing techniques in medical informatics. Several human-validated reference standards for medical terms have been developed to evaluate word embedding models using the semantic similarity and relatedness of medical word pairs. However, there are very few reference standards in non-English languages. In addition, because the existing reference standards were developed a long time ago, there is a need to develop an updated standard to represent recent findings in medical sciences. OBJECTIVE We propose a new Korean word pair reference set to verify embedding models. METHODS From January 2010 to December 2020, 518 medical textbooks, 72,844 health information news, and 15,698 medical research articles were collected, and the top 10,000 medical terms were selected to develop medical word pairs. Sixteen attending physicians participated in the verification of the developed set with 607 word pairs. RESULTS The proportion of word pairs answered by all participants was 90.8% (551/607) for the similarity task and 86.5% (525/605) for the relatedness task. The similarity and relatedness of the word pair showed a high correlation (ρ=0.70, P<.001). The intraclass correlation coefficients to assess the inter-rater agreements of the word pair sets were 0.47 on the similarity task and 0.53 on the relatedness task. The final reference standard was 604 word pairs for the similarity task and 599 word pairs for relatedness, excluding word pairs with answers corresponding to outliers and word pairs that were answered by less than 50% of all the respondents. When FastText models were applied to the final reference standard word pair sets, the embedding models learning medical documents had a higher correlation between the calculated cosine similarity scores compared to human-judged similarity and relatedness scores (ρ=0.12, namu vs. ρ=0.47, with medical text for the similarity task and ρ=0.02, with namu vs. ρ=0.30, with medical text for the relatedness task). CONCLUSIONS Korean medical word pair reference standard sets for semantic similarity and relatedness were developed based on medical documents from the past 10 years. It is expected that our word pair reference sets will be actively utilized in the development of medical and multilingual natural language processing technology in the future.


Sign in / Sign up

Export Citation Format

Share Document