A Sentiment Classification Model Using Group Characteristics of Writing Style Features

Author(s):  
Huan Zhao ◽  
Xixiang Zhang ◽  
Keqin Li

Sentiment analysis is becoming increasingly important mainly because of the growth of web comments. Sentiment polarity classification is a popular process in this field. Writing style features, such as lexical and word-based features, are often used in the authorship identification and gender classification of online messages. However, writing style features were only used in feature selection for sentiment classification. This research presents an exploratory study of the group characteristics of writing style features on the Internet Movie Database (IMDb) movie sentiment data set. Furthermore, this study utilizes the specific group characteristics of writing style in improving the performance of sentiment classification. We determine the optimum clustering number of user reviews based on writing style features distribution. According to the classification model trained on a training subset with specific writing style clustering tags, we determine that the model trained on the data set of a specific writing style group has an optimal effect on the classification accuracy, which is better than the model trained on the entire data set in a particular positive or negative polarity. Through the polarity characteristics of specific writing style groups, we propose a general model in improving the performance of the existing classification approach. Results of the experiments on sentiment classification using the IMDb data set demonstrate that the proposed model improves the performance in terms of classification accuracy.

2018 ◽  
Vol 13 (3) ◽  
pp. 408-428 ◽  
Author(s):  
Phu Vo Ngoc

We have already survey many significant approaches for many years because there are many crucial contributions of the sentiment classification which can be applied in everyday life, such as in political activities, commodity production, and commercial activities. We have proposed a novel model using a Latent Semantic Analysis (LSA) and a Dennis Coefficient (DNC) for big data sentiment classification in English. Many LSA vectors (LSAV) have successfully been reformed by using the DNC. We use the DNC and the LSAVs to classify 11,000,000 documents of our testing data set to 5,000,000 documents of our training data set in English. This novel model uses many sentiment lexicons of our basis English sentiment dictionary (bESD). We have tested the proposed model in both a sequential environment and a distributed network system. The results of the sequential system are not as good as that of the parallel environment. We have achieved 88.76% accuracy of the testing data set, and this is better than the accuracies of many previous models of the semantic analysis. Besides, we have also compared the novel model with the previous models, and the experiments and the results of our proposed model are better than that of the previous model. Many different fields can widely use the results of the novel model in many commercial applications and surveys of the sentiment classification.


2020 ◽  
Vol 39 (4) ◽  
pp. 4935-4945
Author(s):  
Qiuyun Cheng ◽  
Yun Ke ◽  
Ahmed Abdelmouty

Aiming at the limitation of using only word features in traditional deep learning sentiment classification, this paper combines topic features with deep learning models to build a topic-fused deep learning sentiment classification model. The model can fuse topic features to obtain high-quality high-level text features. Experiments show that in binary sentiment classification, the highest classification accuracy of the model can reach more than 90%, which is higher than that of commonly used deep learning models. This paper focuses on the combination of deep neural networks and emerging text processing technologies, and improves and perfects them from two aspects of model architecture and training methods, and designs an efficient deep network sentiment analysis model. A CNN (Convolutional Neural Network) model based on polymorphism is proposed. The model constructs the CNN input matrix by combining the word vector information of the text, the emotion information of the words, and the position information of the words, and adjusts the importance of different feature information in the training process by means of weight control. The multi-objective sample data set is used to verify the effectiveness of the proposed model in the sentiment analysis task of related objects from the classification effect and training performance.


2020 ◽  
Vol 63 (11) ◽  
pp. 1775-1787
Author(s):  
Yong Fang ◽  
Yue Yang ◽  
Cheng Huang

Abstract Emails are often used to illegal cybercrime today, so it is important to verify the identity of the email author. This paper proposes a general model for solving the problem of anonymous email author attribution, which can be used in email authorship identification and email authorship verification. The first situation is to find the author of an anonymous email among the many suspected targets. Another situation is to verify if an email was written by the sender. This paper extracts features from the email header and email body and analyzes the writing style and other behaviors of email authors. The behaviors of email authors are extracted through a statistical algorithm from email headers. Moreover, the author’s writing style in the email body is extracted by a sequence-to-sequence bidirectional long short-term memory (BiLSTM) algorithm. This model combines multiple factors to solve the problem of anonymous email author attribution. The experiments proved that the accuracy and other indicators of proposed model are better than other methods. In email authorship verification experiment, our average accuracy, average recall and average F1-score reached 89.9%. In email authorship identification experiment, our model’s accuracy rate is 98.9% for 10 authors, 92.9% for 25 authors and 89.5% for 50 authors.


2019 ◽  
Vol 16 (1) ◽  
pp. 93-113 ◽  
Author(s):  
Shengye Pang ◽  
Guobing Zou ◽  
Yanglan Gan ◽  
Sen Niu ◽  
Bofeng Zhang

Web service classification has become an urgent demand on service-oriented applications. Most existing classification algorithms mainly rely on the original service descriptions. That leads to low classification accuracy, since it cannot fully reflect the semantic feature specific to a service category. To solve the issue, this article proposes a novel approach for web service classification, including service topic feature extraction, service functionality augmentation, and service classification model learning. The characteristic is that the original service descriptions can be semantically augmented, which is fed to deriving a service classifier via labeled probabilistic topic model. A benefit from this approach is that it can be applied to an online service management platform, where it assists service providers to facilitate the registration process. Extensive experiments have been conducted on a large-scale real-world data set crawled from ProgrammableWeb. The results demonstrate that it outperforms state-of-the-art methods in terms of service classification accuracy and convergence speed.


Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1336
Author(s):  
Gihyeon Choi ◽  
Shinhyeok Oh ◽  
Harksoo Kim

Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other words, they have not considered the importance of each sentence in the document. To effectively determine polarity of a document, each sentence in the document should be dealt with different degrees of importance. To address this problem, we propose a document-level sentence classification model based on deep neural networks, in which the importance degrees of sentences in documents are automatically determined through gate mechanisms. To verify our new sentiment analysis model, we conducted experiments using the sentiment datasets in the four different domains such as movie reviews, hotel reviews, restaurant reviews, and music reviews. In the experiments, the proposed model outperformed previous state-of-the-art models that do not consider importance differences of sentences in a document. The experimental results show that the importance of sentences should be considered in a document-level sentiment classification task.


Author(s):  
Philippe Schwaller ◽  
Daniel Probst ◽  
Alain C. Vaucher ◽  
Vishnu H Nair ◽  
Teodoro Laino ◽  
...  

<p>Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. The classification process is a tedious task, requiring first an accurate mapping of the reaction (atom mapping) followed by the identification of the corresponding reaction class template. In this work, we present two transformer-based models that infer reaction classes from the SMILES representation of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We study the incorrect predictions of the models and show that they reveal different biases and mistakes in the underlying data set. Using the embeddings of our classification model, we introduce reaction fingerprints that do not require knowing the reaction center or distinguishing between reactants and reagents. This conversion from chemical reactions to feature vectors enables efficient clustering and similarity search in the reaction space. We compare the reaction clustering for combinations of self-supervised, supervised, and molecular shingle-based reaction representations.</p>


2021 ◽  
Vol 23 (06) ◽  
pp. 10-22
Author(s):  
Ms. Anshika Shukla ◽  
◽  
Mr. Sanjeev Kumar Shukla ◽  

In recent years, there are various methods for source code classification using deep learning approaches have been proposed. The classification accuracy of the method using deep learning is greatly influenced by the training data set. Therefore, it is possible to create a model with higher accuracy by improving the construction method of the training data set. In this study, we propose a dynamic learning data set improvement method for source code classification using deep learning. In the proposed method, we first train and verify the source code classification model using the training data set. Next, we reconstruct the training data set based on the verification result. We create a high-precision model by repeating this learning and reconstruction and improving the learning data set. In the evaluation experiment, the source code classification model was learned using the proposed method, and the classification accuracy was compared with the three baseline methods. As a result, it was found that the model learned using the proposed method has the highest classification accuracy. We also confirmed that the proposed method improves the classification accuracy of the model from 0.64 to 0.96


2020 ◽  
Vol 309 ◽  
pp. 03015
Author(s):  
Wenbin Liu ◽  
Bojian Wen ◽  
Shang Gao ◽  
Jiesheng Zheng ◽  
Yinlong Zheng

Text classification is a common application in natural language processing. We proposed a multi-label text classification model based on ELMo and attention mechanism which help solve the problem for the sentiment classification task that there is no grammar or writing convention in power supply related text and the sentiment related information disperses in the text. Firstly, we use pre-trained word embedding vector to extract the feature of text from the Internet. Secondly, the analyzed deep information features are weighted according to the attention mechanism. Finally, an improved ELMo model in which we replace the LSTM module with GRU module is used to characterize the text and information is classified. The experimental results on Kaggle’s toxic comment classification data set show that the accuracy of sentiment classification is as high as 98%.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
P. Padmavathy ◽  
S. Pakkir Mohideen ◽  
Zameer Gulzar

PurposeThe purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.Design/methodology/approachRecently, in domains like social media(SM), healthcare, hotel, car, product data, etc., research on sentiment analysis (SA) has massively increased. In addition, there is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set, their occurrence signifies a strong inclination with the other sentiment class. Hence, this paper chiefly concentrates on the drawbacks of adapting domain-dependent sentiment lexicon (DDSL) from a collection of labeled user reviews and domain-independent lexicon (DIL) for proposing a framework centered on the information theory that could predict the correct polarity of the words (positive, neutral and negative). The proposed work initially performs SWN- and PMI-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed. Finally, the predicted polarity is inputted to the mtf-idf-based SVM-NN classifier for the SC of reviews. The outcomes are examined and contrasted to the other existing techniques to verify that the proposed work has predicted the class of the reviews more effectually for different datasets.FindingsThere is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set their occurrence signifies a strong inclination with the other sentiment class.Originality/valueThe proposed work initially performs SWN- and PMI-based polarity computation, and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.


Author(s):  
Yadala Sucharitha ◽  
Y. Vijayalata ◽  
V. Kamakshi Prasad

Introduction: In the present scenario, social media network plays a significant role in sharing information between individuals. This incorporates information about news and events that are presently occurring in the real world. Anticipating election results is presently turning in to a fascinating research topic through social media. In this article, we proposed a strategy to anticipate election results by consolidating sub-event discovery and sentimental analysis in micro blogs to break down as well as imagine political inclinations un covered by those social media users Methodology: This approach discovers and investigates sentimental data from micro-blogs to anticipate the popularity of contestants. In general, many organizations and media houses conduct prepoll review and expert’s perspectives to anticipate the result of the election, but for our model, we use twitter data to predict the result of an election by gathering twitter information and evaluate it to anticipate the result of the election by analyzing the sentiment of twitter information about the contestants. Results: The number of seats won by the first, second and the third party in AP Assembly Election 2019 has been deter-mined by utilizing PSS’s of these parties by means of equation(2),(3), and(4), respectively. In Table 2 actual results of the election and our model prediction results are shown and these outcomes are very close to actual results. We utilized SVM with 15-fold cross-validation, for sentiment polarity classification utilizing our training set, which gives us the precision of 94.2%. There are 7500 tuples in our training data set, with 3750 positive tweets and 3750 negative tweets. Conclusions: Our outcomes state that the proposed model can precisely forecast the election results with accuracy (94.2 %) over the given baselines. The experimental outcomes are very closer to actual election results and contrasted with conventional strategies utilized by various survey agencies for exit polls and approval of results demonstrated that social media data can foresee with better exactness. Discussion: In the future we might want to expand this work into different areas and nations of the reality where Twitter is picking up prevalence as a political battling tool and where politicians and individuals are turning towards micro-blogs for political communicates and data. We would likewise expand this research into various fields other than general elections and from politicians to state organizations.


Sign in / Sign up

Export Citation Format

Share Document