Research on Semantic Similarity of Short Text Based on Bert and Time Warping Distance

The research on semantic similarity of short text plays an important role in machine translation, emotion analysis, information retrieval and other AI business applications. However, according to existing short text similarity research, the characteristics of ambiguous vocabularies are difficult to be effectively analyzed, the solution of the problem caused by words order needs to be further optimized as well. This paper proposes a short text semantic similarity calculation method that combines BERT and time warping distance algorithm, in order to solve the problem of vocabulary ambiguity. The model first uses the pre trained Bert model to extract the semantic features of the short text from the whole level, and obtains a 768 dimensional short text feature vector. Then, it transforms the extracted feature vector into a point sequence in space, uses the CTW algorithm to calculate the time warping distance between the curves connected by the point sequence, and finally uses the weight function designed by the analysis, according to the smaller the time warpage distance is, the higher the degree of small similarity is, to calculate the similarity between short texts. The experimental results show that this model can mine the feature information of ambiguous words, and calculate the similarity of short texts with lexical ambiguity effectively. Compared with other models, it can distinguish the semantic features of ambiguous words more accurately.

Download Full-text

Recommendation Model Based on Semantic Features and a Knowledge Graph

Wireless Communications and Mobile Computing ◽

10.1155/2021/2382892 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yudong Liu ◽

Wen Chen

Keyword(s):

Semantic Similarity ◽

Information Science ◽

Absolute Error ◽

Knowledge Graph ◽

Semantic Features ◽

Short Text ◽

Model Based ◽

Recommendation Algorithms ◽

Tremendous Amount ◽

Series Of Experiments

In the field of information science, how to help users quickly and accurately find the information they need from a tremendous amount of short texts has become an urgent problem. The recommendation model is an important way to find such information. However, existing recommendation models have some limitations in case of short text recommendation. To address these issues, this paper proposes a recommendation model based on semantic features and a knowledge graph. More specifically, we first select DBpedia as a knowledge graph to extend short text features of items and get the semantic features of the items based on the extended text. And then, we calculate the item vector and further obtain the semantic similarity degrees of the users. Finally, based on the semantic features of the items and the semantic similarity of the users, we apply the collaborative filtering technology to calculate prediction rating. A series of experiments are conducted, demonstrating the effectiveness of our model in the evaluation metrics of mean absolute error (MAE) and root mean square error (RMSE) compared with those of some recommendation algorithms. The optimal MAE for the model proposed in this paper is 0.6723, and RMSE is 0.8442. The promising results show that the recommendation effect of the model on the movie field is significantly better than those of these existing algorithms.

Download Full-text

Semantic Similarity Metric and its Application in Text Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.170-173.3711 ◽

2012 ◽

Vol 170-173 ◽

pp. 3711-3714 ◽

Cited By ~ 1

Author(s):

Pei Ying Zhang

Keyword(s):

Semantic Similarity ◽

Text Classification ◽

Feature Vector ◽

Semantic Distance ◽

Main Concern ◽

Similarity Metric ◽

Important Improvement ◽

Text Feature ◽

Series Of Experiments ◽

Document Categorization

Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern is this paper is to improve the accuracy of text classification system combined an improved CHI method and semantic similarity metric. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, calculates the semantic distance between text feature vector and categorization feature vector so as to determine the document categorization. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.

Download Full-text

Similarity of Sentences With Contradiction Using Semantic Similarity Measures

The Computer Journal ◽

10.1093/comjnl/bxaa100 ◽

2020 ◽

Author(s):

M Krishna Siva Prasad ◽

Poonam Sharma

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Word Order ◽

Similarity Measures ◽

Semantic Features ◽

Short Text ◽

Sentence Similarity ◽

Expert Ratings

Abstract Short text or sentence similarity is crucial in various natural language processing activities. Traditional measures for sentence similarity consider word order, semantic features and role annotations of text to derive the similarity. These measures do not suit short texts or sentences with negation. Hence, this paper proposes an approach to determine the semantic similarity of sentences and also presents an algorithm to handle negation. In sentence similarity, word pair similarity plays a significant role. Hence, this paper also discusses the similarity between word pairs. Existing semantic similarity measures do not handle antonyms accurately. Hence, this paper proposes an algorithm to handle antonyms. This paper also presents an antonym dataset with 111-word pairs and corresponding expert ratings. The existing semantic similarity measures are tested on the dataset. The results of the correlation proved that the expert ratings are in order with the correlation obtained from the semantic similarity measures. The sentence similarity is handled by proposing two algorithms. The first algorithm deals with the typical sentences, and the second algorithm deals with contradiction in the sentences. SICK dataset, which has sentences with negation, is considered for handling the sentence similarity. The algorithm helped in improving the results of sentence similarity.

Download Full-text

Combining Statistical Information and Semantic Similarity for Short Text Feature Extension

Intelligent Information Processing VIII - IFIP Advances in Information and Communication Technology ◽

10.1007/978-3-319-48390-0_21 ◽

2016 ◽

pp. 205-210 ◽

Cited By ~ 2

Author(s):

Xiaohong Li ◽

Yun Su ◽

Huifang Ma ◽

Lin Cao

Keyword(s):

Semantic Similarity ◽

Statistical Information ◽

Short Text ◽

Text Feature

Download Full-text

The Differential Role of the Cerebral Hemispheres in the Retrieval of the Semantic Features of Ambiguous Words

PsycEXTRA Dataset ◽

10.1037/e413782005-530 ◽

1999 ◽

Author(s):

Heather Humphrey ◽

Ruth Ann Atchley ◽

Michael Wilson

Keyword(s):

Cerebral Hemispheres ◽

Semantic Features ◽

Ambiguous Words

Download Full-text

Deep Hash with Improved Dual Attention for Image Retrieval

Information ◽

10.3390/info12070285 ◽

2021 ◽

Vol 12 (7) ◽

pp. 285

Author(s):

Wenjing Yang ◽

Liejun Wang ◽

Shuli Cheng ◽

Yongming Li ◽

Anyu Du

Keyword(s):

Image Retrieval ◽

Contextual Information ◽

Quantization Error ◽

Feature Representation ◽

Semantic Features ◽

Average Value ◽

Hash Code ◽

Feature Information ◽

Learning To Hash ◽

Study Designs

Recently, deep learning to hash has extensively been applied to image retrieval, due to its low storage cost and fast query speed. However, there is a defect of insufficiency and imbalance when existing hashing methods utilize the convolutional neural network (CNN) to extract image semantic features and the extracted features do not include contextual information and lack relevance among features. Furthermore, the process of the relaxation hash code can lead to an inevitable quantization error. In order to solve these problems, this paper proposes deep hash with improved dual attention for image retrieval (DHIDA), which chiefly has the following contents: (1) this paper introduces the improved dual attention mechanism (IDA) based on the ResNet18 pre-trained module to extract the feature information of the image, which consists of the position attention module and the channel attention module; (2) when calculating the spatial attention matrix and channel attention matrix, the average value and maximum value of the column of the feature map matrix are integrated in order to promote the feature representation ability and fully leverage the features of each position; and (3) to reduce quantization error, this study designs a new piecewise function to directly guide the discrete binary code. Experiments on CIFAR-10, NUS-WIDE and ImageNet-100 show that the DHIDA algorithm achieves better performance.

Download Full-text