Applying text similarity algorithm to analyze the triangular citation behavior of scientists

The research of text similarity, especially for rumor texts, which constructed the calculation model by known rumors and calculated its similarity. From which, people can recognize the rumor in advance, and improve their vigilance to effectively block and control rumors dissemination. Based on the Bayesian network, the similarity calculation model of microblog rumor texts was built. At the same time, taking into account not only the rumor texts have similar characters, but also the rumor producers have similar characters, and therefore the similarity calculation model of rumor texts makers was constructed. Then, the similarity between the text and the user was integrated, and the microblog similarity calculation model was established. Finally, also experimentally studied the performance of the proposed model on the microblog rumor text and the user data set. The experimental results indicated that the similarity algorithm proposed in this paper could be used to identify the rumors of texts and predict the characters of users more accurately and effectively

Download Full-text

Research and application of news-text similarity algorithm based on Chinese word segmentation

2013 3rd International Conference on Consumer Electronics, Communications and Networks ◽

10.1109/cecnet.2013.6703375 ◽

2013 ◽

Cited By ~ 2

Author(s):

Wei Guan ◽

Pengzhou Zhang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Similarity ◽

Similarity Algorithm

Download Full-text

Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01604-9 ◽

2021 ◽

Vol 21 (S9) ◽

Author(s):

Yani Chen ◽

Shan Nan ◽

Qi Tian ◽

Hailing Cai ◽

Huilong Duan ◽

...

Keyword(s):

Multilayer Perceptron ◽

Levenshtein Distance ◽

Text Similarity ◽

Neural Machine Translation ◽

Secondary Use ◽

Radiology Reports ◽

Similarity Algorithm ◽

Negative Effect ◽

Cross Language ◽

Dictionary Translation

Abstract Background Standardized coding of plays an important role in radiology reports’ secondary use such as data analytics, data-driven decision support, and personalized medicine. RadLex, a standard radiological lexicon, can reduce subjective variability and improve clarity in radiology reports. RadLex coding of radiology reports is widely used in many countries, but translation and localization of RadLex in China are far from being established. Although automatic RadLex coding is a common way for non-standard radiology reports, the high-accuracy cross-language RadLex coding is hardly achieved due to the limitation of up-to-date auto-translation and text similarity algorithms and still requires further research. Methods We present an effective approach that combines a hybrid translation and a Multilayer Perceptron weighting text similarity ensemble algorithm for automatic RadLex coding of Chinese structured radiology reports. Firstly, a hybrid way to integrate Google neural machine translation and dictionary translation helps to optimize the translation of Chinese radiology phrases to English. The dictionary is made up of 21,863 Chinese–English radiological term pairs extracted from several free medical dictionaries. Secondly, four typical text similarity algorithms are introduced, which are Levenshtein distance, Jaccard similarity coefficient, Word2vec Continuous bag-of-words model, and WordNet Wup similarity algorithms. Lastly, the Multilayer Perceptron model has been used to synthesize the contextual, lexical, character and syntactical information of four text similarity algorithms to promote precision, in which four similarity scores of two terms are taken as input and the output presents whether the two terms are synonyms. Results The results show the effectiveness of the approach with an F1-score of 90.15%, a precision of 91.78% and a recall of 88.59%. The hybrid translation algorithm has no negative effect on the final coding, F1-score has increased by 21.44% and 8.12% compared with the GNMT algorithm and dictionary translation. Compared with the single similarity, the result of the MLP weighting similarity algorithm is satisfactory that has a 4.48% increase compared with the best single similarity algorithm, WordNet Wup. Conclusions The paper proposed an innovative automatic cross-language RadLex coding approach to solve the standardization of Chinese structured radiology reports, that can be taken as a reference to automatic cross-language coding.

Download Full-text

Mapping texts into graphs: An improved text similarity algorithm

Proceedings of 2012 2nd International Conference on Computer Science and Network Technology ◽

10.1109/iccsnt.2012.6526173 ◽

2012 ◽

Author(s):

Zuoguo Liu ◽

Xiaorong Chen

Keyword(s):

Text Similarity ◽

Similarity Algorithm

Download Full-text

Applying text similarity algorithm to analyze the triangular citation behavior of scientists

A Short Text Similarity Algorithm for Finding Similar Police 110 Incidents

A Framework System Using Word Mover’s Distance Text Similarity Algorithm for Assessing Privacy Policy Compliance

Text similarity algorithm based on semantic vector space model

An improved text similarity algorithm research for clinical decision support system

san_sim: Factual and efficient URL text similarity algorithm

Chinese Text Similarity Algorithm Based on Part-of-Speech Tagging and Word Vector Model

Text Similarity Computation Model for Identifying Rumor Based on Bayesian Network in Microblog

Research and application of news-text similarity algorithm based on Chinese word segmentation

Automatic RadLex coding of Chinese structured radiology reports based on text similarity ensemble

Mapping texts into graphs: An improved text similarity algorithm

Export Citation Format