A hybrid cross-language name matching technique using novel modified Levenshtein Distance

Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports. RadLex coding of radiology reports is widely used in many countries, but translation and localization of RadLex in China are far from being established. Although automatic RadLex coding is a common way for non-standard radiology reports, the high-accuracy cross-language RadLex coding is hardly achieved due to the limitation of up-to-date auto-translation and text similarity algorithms and still requires further research. Methods We present an effective approach that combines a hybrid translation and a Multilayer Perceptron weighting text similarity ensemble algorithm for automatic RadLex coding of Chinese structured radiology reports. Firstly, a hybrid way to integrate Google neural machine translation and dictionary translation helps to optimize the translation of Chinese radiology phrases to English. The dictionary is made up of 21,863 Chinese–English radiological term pairs extracted from several free medical dictionaries. Secondly, four typical text similarity algorithms are introduced, which are Levenshtein distance, Jaccard similarity coefficient, Word2vec Continuous bag-of-words model, and WordNet Wup similarity algorithms. Lastly, the Multilayer Perceptron model has been used to synthesize the contextual, lexical, character and syntactical information of four text similarity algorithms to promote precision, in which four similarity scores of two terms are taken as input and the output presents whether the two terms are synonyms. Results The results show the effectiveness of the approach with an F1-score of 90.15%, a precision of 91.78% and a recall of 88.59%. The hybrid translation algorithm has no negative effect on the final coding, F1-score has increased by 21.44% and 8.12% compared with the GNMT algorithm and dictionary translation. Compared with the single similarity, the result of the MLP weighting similarity algorithm is satisfactory that has a 4.48% increase compared with the best single similarity algorithm, WordNet Wup. Conclusions The paper proposed an innovative automatic cross-language RadLex coding approach to solve the standardization of Chinese structured radiology reports, that can be taken as a reference to automatic cross-language coding.

Download Full-text

Bloom-Quotient Based Name Matching Technique in Content Centric Networks

Journal of Physics Conference Series ◽

10.1088/1742-6596/1818/1/012030 ◽

2021 ◽

Vol 1818 (1) ◽

pp. 012030

Author(s):

Mohammad Alhisnawi

Keyword(s):

Matching Technique ◽

Name Matching

Download Full-text

Distributions of cognates in Europe as based on Levenshtein distance

Bilingualism Language and Cognition ◽

10.1017/s1366728910000623 ◽

2011 ◽

Vol 15 (1) ◽

pp. 157-166 ◽

Cited By ~ 43

Author(s):

JOB SCHEPENS ◽

TON DIJKSTRA ◽

FRANC GROOTJEN

Keyword(s):

Distance Function ◽

Formal Model ◽

Experimental Studies ◽

Levenshtein Distance ◽

Semantic Equivalence ◽

European Languages ◽

Orthographic Similarity ◽

Similarity Ratings ◽

Cross Language ◽

Stimulus Materials

Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.

Download Full-text

Cross language name matching

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09 ◽

10.1145/1571941.1572065 ◽

2009 ◽

Author(s):

J. Scott McCarley

Keyword(s):

Name Matching ◽

Cross Language

Download Full-text

Psychometric Parameters of the Spanish Version of the Kuwait University Anxiety Scale (S-KUAS)1

European Journal of Psychological Assessment ◽

10.1027/1015-5759.20.4.349 ◽

2004 ◽

Vol 20 (4) ◽

pp. 349-357 ◽

Cited By ~ 28

Author(s):

Ahmed M. Abdel-Khalek ◽

Joaquin Tomás-Sabádo ◽

Juana Gómez-Benito

Keyword(s):

Reliability And Validity ◽

Anxiety Scale ◽

Spanish Version ◽

Good Reliability ◽

Kuwait University ◽

Males And Females ◽

Anxiety Levels ◽

Arabic And English ◽

Cross Language ◽

Language Equivalence

Summary: To construct a Spanish version of the Kuwait University Anxiety Scale (S-KUAS), the Arabic and English versions of the KUAS have been separately translated into Spanish. To check the comparability in terms of meaning, the two Spanish preliminary translations were thoroughly scrutinized vis-à-vis both the Arabic and English forms by several experts. Bilingual subjects served to explore the cross-language equivalence of the English and Spanish versions of the KUAS. The correlation between the total scores on both versions was .93, and the t value was .30 (n.s.), denoting good similarity. The Alphas and 4-week test-retest reliabilities were greater than .84, while the criterion-related validity was .70 against scores on the trait subscale of the STAI. These findings denote good reliability and validity of the S-KUAS. Factor analysis yielded three high-loaded factors of Behavioral/Subjective, Cognitive/Affective, and Somatic Anxiety, equivalent to the original Arabic version. Female (n = 210) undergraduates attained significantly higher mean scores than their male (n = 102) counterparts. For the combined group of males and females, the correlation between the total score on the S-KUAS and age was -.17 (p < .01). By and large, the findings of the present study provide evidence of the utility of the S-KUAS in assessing trait anxiety levels in the Spanish undergraduate context.

Download Full-text