short text similarity
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 17)

H-INDEX

5
(FIVE YEARS 3)

2022 ◽  
Vol 18 (2) ◽  
pp. 1-27
Author(s):  
Hang Cui ◽  
Tarek Abdelzaher

This article narrows the gap between physical sensing systems that measure physical signals and social sensing systems that measure information signals by (i) defining a novel algorithm for extracting information signals (building on results from text embedding) and (ii) showing that it increases the accuracy of truth discovery—the separation of true information from false/manipulated one. The work is applied in the context of separating true and false facts on social media, such as Twitter and Reddit, where users post predominantly short microblogs. The new algorithm decides how to aggregate the signal across words in the microblog for purposes of clustering the miscroblogs in the latent information signal space, where it is easier to separate true and false posts. Although previous literature extensively studied the problem of short text embedding/representation, this article improves previous work in three important respects: (1) Our work constitutes unsupervised truth discovery, requiring no labeled input or prior training. (2) We propose a new distance metric for efficient short text similarity estimation, we call Semantic Subset Matching , that improves our ability to meaningfully cluster microblog posts in the latent information signal space. (3) We introduce an iterative framework that jointly improves miscroblog clustering and truth discovery. The evaluation shows that the approach improves the accuracy of truth-discovery by 6.3%, 2.5%, and 3.8% (constituting a 38.9%, 14.2%, and 18.7% reduction in error, respectively) in three real Twitter data traces.


Author(s):  
Shijie Qiu ◽  
Yan Niu ◽  
Jun Li ◽  
Xing Li

The research on semantic similarity of short text plays an important role in machine translation, emotion analysis, information retrieval and other AI business applications. However, according to existing short text similarity research, the characteristics of ambiguous vocabularies are difficult to be effectively analyzed, the solution of the problem caused by words order needs to be further optimized as well. This paper proposes a short text semantic similarity calculation method that combines BERT and time warping distance algorithm, in order to solve the problem of vocabulary ambiguity. The model first uses the pre trained Bert model to extract the semantic features of the short text from the whole level, and obtains a 768 dimensional short text feature vector. Then, it transforms the extracted feature vector into a point sequence in space, uses the CTW algorithm to calculate the time warping distance between the curves connected by the point sequence, and finally uses the weight function designed by the analysis, according to the smaller the time warpage distance is, the higher the degree of small similarity is, to calculate the similarity between short texts. The experimental results show that this model can mine the feature information of ambiguous words, and calculate the similarity of short texts with lexical ambiguity effectively. Compared with other models, it can distinguish the semantic features of ambiguous words more accurately.


2021 ◽  
Vol 114 ◽  
pp. 169-180
Author(s):  
Jiaqi Yang ◽  
Yongjun Li ◽  
Congjie Gao ◽  
Yinyin Zhang

Author(s):  
Shuiqiao Yang ◽  
Guangyan Huang ◽  
Bahadorreza Ofoghi ◽  
John Yearwood

Sign in / Sign up

Export Citation Format

Share Document