Improving Text Similarity Measurement by Critical Sentence Vector Model

Author(s):  
Wei Li ◽  
Kam-Fai Wong ◽  
Chunfa Yuan ◽  
Wenjie Li ◽  
Yunqing Xia
Information ◽  
2020 ◽  
Vol 11 (9) ◽  
pp. 421
Author(s):  
Jiapeng Wang ◽  
Yihong Dong

Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.


2018 ◽  
Vol 10 (11) ◽  
pp. 4330 ◽  
Author(s):  
Xinglong Yuan ◽  
Wenbing Chang ◽  
Shenghan Zhou ◽  
Yang Cheng

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.


2021 ◽  
Author(s):  
Shanping Zhang ◽  
Xiaowei Xu ◽  
Ye Tao ◽  
Xiaodong Wang ◽  
Qiuchen Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document