Short-text representation using diffusion wavelets

Author(s):  
Vidit Jain ◽  
Jay Mahadeokar
Author(s):  
Ming Hao ◽  
Weijing Wang ◽  
Fang Zhou

Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.


2020 ◽  
Vol 10 (14) ◽  
pp. 4893 ◽  
Author(s):  
Wenfeng Hou ◽  
Qing Liu ◽  
Longbing Cao

Short text is widely seen in applications including Internet of Things (IoT). The appropriate representation and classification of short text could be severely disrupted by the sparsity and shortness of short text. One important solution is to enrich short text representation by involving cognitive aspects of text, including semantic concept, knowledge, and category. In this paper, we propose a named Entity-based Concept Knowledge-Aware (ECKA) representation model which incorporates semantic information into short text representation. ECKA is a multi-level short text semantic representation model, which extracts the semantic features from the word, entity, concept and knowledge levels by CNN, respectively. Since word, entity, concept and knowledge entity in the same short text have different cognitive informativeness for short text classification, attention networks are formed to capture these category-related attentive representations from the multi-level textual features, respectively. The final multi-level semantic representations are formed by concatenating all of these individual-level representations, which are used for text classification. Experiments on three tasks demonstrate our method significantly outperforms the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Hu Wang ◽  
Tianbao Liang ◽  
Yanxia Cheng

Perceived value is the customer’s subjective understanding of the value they obtain and is their subjective evaluation of the product or service they enjoy. This value is deducted from the cost of the product or service. In order to understand and predict the specific cognition of consumers on the value of products or services and distinguish it from the objective value of products or services in the general sense, this paper uses the in-depth learning method based on LSTM to establish a model to predict the perceived benefits of consumers. It is a challenging task to analyze the emotion of consumers or recognize the perceived value of consumers from various texts of online trading platforms. This paper proposes a new short-text representation method based on bidirectional LSTM. This method is very effective for forecasting research. In addition, we also use the attention mechanism to learn the specific emotional vocabulary. Short-text representation can be used for emotion classification and emotion intensity prediction. This paper evaluates the proposed classification model and regression data set. Compared with the baseline of the corresponding data set, the contrast of the results was 93%. The research shows that using deep neural network to predict the perceived utility of consumer comments can reduce the intervention of artificial features and labor costs and help predict the perceived utility of products to consumers.


Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3728 ◽  
Author(s):  
Zhou ◽  
Wang ◽  
Sun ◽  
Sun

Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method.


Mathematics ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. 1129
Author(s):  
Shihong Chen ◽  
Tianjiao Xu

QA matching is a very important task in natural language processing, but current research on text matching focuses more on short text matching rather than long text matching. Compared with short text matching, long text matching is rich in information, but distracting information is frequent. This paper extracted question-and-answer pairs about psychological counseling to research long text QA-matching technology based on deep learning. We adjusted DSSM (Deep Structured Semantic Model) to make it suitable for the QA-matching task. Moreover, for better extraction of long text features, we also improved DSSM by enriching the text representation layer, using a bidirectional neural network and attention mechanism. The experimental results show that BiGRU–Dattention–DSSM performs better at matching questions and answers.


Sign in / Sign up

Export Citation Format

Share Document