scholarly journals Research on Semantic Similarity of Short Text Based on Bert and Time Warping Distance

Author(s):  
Shijie Qiu ◽  
Yan Niu ◽  
Jun Li ◽  
Xing Li

The research on semantic similarity of short text plays an important role in machine translation, emotion analysis, information retrieval and other AI business applications. However, according to existing short text similarity research, the characteristics of ambiguous vocabularies are difficult to be effectively analyzed, the solution of the problem caused by words order needs to be further optimized as well. This paper proposes a short text semantic similarity calculation method that combines BERT and time warping distance algorithm, in order to solve the problem of vocabulary ambiguity. The model first uses the pre trained Bert model to extract the semantic features of the short text from the whole level, and obtains a 768 dimensional short text feature vector. Then, it transforms the extracted feature vector into a point sequence in space, uses the CTW algorithm to calculate the time warping distance between the curves connected by the point sequence, and finally uses the weight function designed by the analysis, according to the smaller the time warpage distance is, the higher the degree of small similarity is, to calculate the similarity between short texts. The experimental results show that this model can mine the feature information of ambiguous words, and calculate the similarity of short texts with lexical ambiguity effectively. Compared with other models, it can distinguish the semantic features of ambiguous words more accurately.

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yudong Liu ◽  
Wen Chen

In the field of information science, how to help users quickly and accurately find the information they need from a tremendous amount of short texts has become an urgent problem. The recommendation model is an important way to find such information. However, existing recommendation models have some limitations in case of short text recommendation. To address these issues, this paper proposes a recommendation model based on semantic features and a knowledge graph. More specifically, we first select DBpedia as a knowledge graph to extend short text features of items and get the semantic features of the items based on the extended text. And then, we calculate the item vector and further obtain the semantic similarity degrees of the users. Finally, based on the semantic features of the items and the semantic similarity of the users, we apply the collaborative filtering technology to calculate prediction rating. A series of experiments are conducted, demonstrating the effectiveness of our model in the evaluation metrics of mean absolute error (MAE) and root mean square error (RMSE) compared with those of some recommendation algorithms. The optimal MAE for the model proposed in this paper is 0.6723, and RMSE is 0.8442. The promising results show that the recommendation effect of the model on the movie field is significantly better than those of these existing algorithms.


2012 ◽  
Vol 170-173 ◽  
pp. 3711-3714 ◽  
Author(s):  
Pei Ying Zhang

Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern is this paper is to improve the accuracy of text classification system combined an improved CHI method and semantic similarity metric. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, calculates the semantic distance between text feature vector and categorization feature vector so as to determine the document categorization. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.


2020 ◽  
Author(s):  
M Krishna Siva Prasad ◽  
Poonam Sharma

Abstract Short text or sentence similarity is crucial in various natural language processing activities. Traditional measures for sentence similarity consider word order, semantic features and role annotations of text to derive the similarity. These measures do not suit short texts or sentences with negation. Hence, this paper proposes an approach to determine the semantic similarity of sentences and also presents an algorithm to handle negation. In sentence similarity, word pair similarity plays a significant role. Hence, this paper also discusses the similarity between word pairs. Existing semantic similarity measures do not handle antonyms accurately. Hence, this paper proposes an algorithm to handle antonyms. This paper also presents an antonym dataset with 111-word pairs and corresponding expert ratings. The existing semantic similarity measures are tested on the dataset. The results of the correlation proved that the expert ratings are in order with the correlation obtained from the semantic similarity measures. The sentence similarity is handled by proposing two algorithms. The first algorithm deals with the typical sentences, and the second algorithm deals with contradiction in the sentences. SICK dataset, which has sentences with negation, is considered for handling the sentence similarity. The algorithm helped in improving the results of sentence similarity.


Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 285
Author(s):  
Wenjing Yang ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Yongming Li ◽  
Anyu Du

Recently, deep learning to hash has extensively been applied to image retrieval, due to its low storage cost and fast query speed. However, there is a defect of insufficiency and imbalance when existing hashing methods utilize the convolutional neural network (CNN) to extract image semantic features and the extracted features do not include contextual information and lack relevance among features. Furthermore, the process of the relaxation hash code can lead to an inevitable quantization error. In order to solve these problems, this paper proposes deep hash with improved dual attention for image retrieval (DHIDA), which chiefly has the following contents: (1) this paper introduces the improved dual attention mechanism (IDA) based on the ResNet18 pre-trained module to extract the feature information of the image, which consists of the position attention module and the channel attention module; (2) when calculating the spatial attention matrix and channel attention matrix, the average value and maximum value of the column of the feature map matrix are integrated in order to promote the feature representation ability and fully leverage the features of each position; and (3) to reduce quantization error, this study designs a new piecewise function to directly guide the discrete binary code. Experiments on CIFAR-10, NUS-WIDE and ImageNet-100 show that the DHIDA algorithm achieves better performance.


Sign in / Sign up

Export Citation Format

Share Document