Strategies for Short Text Representation in the Word Vector Space

Author(s):  
Marcelo Pita ◽  
Gisele L. Pappa
IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 166578-166592
Author(s):  
Surender Singh Samant ◽  
N. L. Bhanu Murthy ◽  
Aruna Malapati

Author(s):  
Ming Hao ◽  
Weijing Wang ◽  
Fang Zhou

Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.


2020 ◽  
Vol 10 (14) ◽  
pp. 4893 ◽  
Author(s):  
Wenfeng Hou ◽  
Qing Liu ◽  
Longbing Cao

Short text is widely seen in applications including Internet of Things (IoT). The appropriate representation and classification of short text could be severely disrupted by the sparsity and shortness of short text. One important solution is to enrich short text representation by involving cognitive aspects of text, including semantic concept, knowledge, and category. In this paper, we propose a named Entity-based Concept Knowledge-Aware (ECKA) representation model which incorporates semantic information into short text representation. ECKA is a multi-level short text semantic representation model, which extracts the semantic features from the word, entity, concept and knowledge levels by CNN, respectively. Since word, entity, concept and knowledge entity in the same short text have different cognitive informativeness for short text classification, attention networks are formed to capture these category-related attentive representations from the multi-level textual features, respectively. The final multi-level semantic representations are formed by concatenating all of these individual-level representations, which are used for text classification. Experiments on three tasks demonstrate our method significantly outperforms the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Hu Wang ◽  
Tianbao Liang ◽  
Yanxia Cheng

Perceived value is the customer’s subjective understanding of the value they obtain and is their subjective evaluation of the product or service they enjoy. This value is deducted from the cost of the product or service. In order to understand and predict the specific cognition of consumers on the value of products or services and distinguish it from the objective value of products or services in the general sense, this paper uses the in-depth learning method based on LSTM to establish a model to predict the perceived benefits of consumers. It is a challenging task to analyze the emotion of consumers or recognize the perceived value of consumers from various texts of online trading platforms. This paper proposes a new short-text representation method based on bidirectional LSTM. This method is very effective for forecasting research. In addition, we also use the attention mechanism to learn the specific emotional vocabulary. Short-text representation can be used for emotion classification and emotion intensity prediction. This paper evaluates the proposed classification model and regression data set. Compared with the baseline of the corresponding data set, the contrast of the results was 93%. The research shows that using deep neural network to predict the perceived utility of consumer comments can reduce the intervention of artificial features and labor costs and help predict the perceived utility of products to consumers.


Sign in / Sign up

Export Citation Format

Share Document