PMI-based polarity computation for SVM-NN-based sentiment classification from user-generated reviews

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
P. Padmavathy ◽  
S. Pakkir Mohideen ◽  
Zameer Gulzar

PurposeThe purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.Design/methodology/approachRecently, in domains like social media(SM), healthcare, hotel, car, product data, etc., research on sentiment analysis (SA) has massively increased. In addition, there is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set, their occurrence signifies a strong inclination with the other sentiment class. Hence, this paper chiefly concentrates on the drawbacks of adapting domain-dependent sentiment lexicon (DDSL) from a collection of labeled user reviews and domain-independent lexicon (DIL) for proposing a framework centered on the information theory that could predict the correct polarity of the words (positive, neutral and negative). The proposed work initially performs SWN- and PMI-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed. Finally, the predicted polarity is inputted to the mtf-idf-based SVM-NN classifier for the SC of reviews. The outcomes are examined and contrasted to the other existing techniques to verify that the proposed work has predicted the class of the reviews more effectually for different datasets.FindingsThere is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set their occurrence signifies a strong inclination with the other sentiment class.Originality/valueThe proposed work initially performs SWN- and PMI-based polarity computation, and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.

2019 ◽  
Vol 120 (3) ◽  
pp. 508-525
Author(s):  
Futao Zhao ◽  
Zhong Yao ◽  
Jing Luan ◽  
Hao Liu

Purpose The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets. Design/methodology/approach This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons. Findings The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks. Originality/value This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.


2018 ◽  
Vol 6 ◽  
pp. 269-285 ◽  
Author(s):  
Andrius Mudinas ◽  
Dell Zhang ◽  
Mark Levene

There is often the need to perform sentiment classification in a particular domain where no labeled document is available. Although we could make use of a general-purpose off-the-shelf sentiment classifier or a pre-built one for a different domain, the effectiveness would be inferior. In this paper, we explore the possibility of building domain-specific sentiment classifiers with unlabeled documents only. Our investigation indicates that in the word embeddings learned from the unlabeled corpus of a given domain, the distributed word representations (vectors) for opposite sentiments form distinct clusters, though those clusters are not transferable across domains. Exploiting such a clustering structure, we are able to utilize machine learning algorithms to induce a quality domain-specific sentiment lexicon from just a few typical sentiment words (“seeds”). An important finding is that simple linear model based supervised learning algorithms (such as linear SVM) can actually work better than more sophisticated semi-supervised/transductive learning algorithms which represent the state-of-the-art technique for sentiment lexicon induction. The induced lexicon could be applied directly in a lexicon-based method for sentiment classification, but a higher performance could be achieved through a two-phase bootstrapping method which uses the induced lexicon to assign positive/negative sentiment scores to unlabeled documents first, a nd t hen u ses those documents found to have clear sentiment signals as pseudo-labeled examples to train a document sentiment classifier v ia supervised learning algorithms (such as LSTM). On several benchmark datasets for document sentiment classification, our end-to-end pipelined approach which is overall unsupervised (except for a tiny set of seed words) outperforms existing unsupervised approaches and achieves an accuracy comparable to that of fully supervised approaches.


Author(s):  
Jalel Akaichi

In this work, we focus on the application of text mining and sentiment analysis techniques for analyzing Tunisian users' statuses updates on Facebook. We aim to extract useful information, about their sentiment and behavior, especially during the “Arabic spring” era. To achieve this task, we describe a method for sentiment analysis using Support Vector Machine and Naïve Bayes algorithms, and applying a combination of more than two features. The output of this work consists, on one hand, on the construction of a sentiment lexicon based on the Emoticons and Acronyms' lexicons that we developed based on the extracted statuses updates; and on the other hand, it consists on the realization of detailed comparative experiments between the above algorithms by creating a training model for sentiment classification.


Author(s):  
S Joshika

Yelp connects people to great local businesses in USA which maintains a site to search and find any business in USA. This helps user to compare the businesses based on the star ratings and reviews given by other users to identify the best company among the available according to their need. The data-set provided in Yelp challenge contains tip, review, users, check-in, and business details which is shortly called as TURBO set was used by the participants in various ways to find interesting patterns. This paper focuses various surveys made on pre-processing; sentiment analysis; sentiment classification techniques and various classification algorithms proposed that results better performance than the other existing algorithms. The survey papers have mostly applied the algorithms on yelp data-set and other papers have applied on different data’s. 


2019 ◽  
Vol 27 (1) ◽  
pp. 43-69 ◽  
Author(s):  
Syed Moudud-Ul-Huq

Purpose This paper aims to empirically investigate the impact of bank diversification on performance and risk-taking behavior. The analysis uses an unbalanced panel data set covering the period between 2007 and 2015 for a total of 1,397 banks from ASEAN-5 and BRICS economies. Design/methodology/approach Dynamic panel generalized method of moments (GMM) has been used primarily to examine the relationship between bank diversification on performance and risk-taking and later, validate the core results by incorporating two-stage least squares (2SLS). Findings Similar to the results of previous studies based on the developed economy, this study also confirms the hypothesis of the portfolio diversification. The key robust result is that the benefits from revenue and assets diversification are heterogeneous and the BRICS banks achieve higher benefit from using both diversification strategies. On the other hand, ASEAN-5 banks fail to show the significant advantage from assets diversification. Among the diverse sources of income, interest is not a major determinant of efficiency and bank’s stability, while ASEAN-5 banks should foster commission and others income as mechanisms for diversification benefit in the region. Originality/value A few studies are available in the current literature which examines the impact of revenue and assets diversification on either bank performance or risk-taking in the developed economy’s context. However, very few studies are found that examine the relationship between bank diversification, performance and risk-taking together. Moreover, to the best of the author’s knowledge, there is a dearth of literature on this topic that built on the comparative analysis between two regions, i.e. ASEAN-5 and BRICS. As a result, the empirical results of this research provide useful information to the stakeholders so that they can enhance bank diversification strategy and implement them successfully by considering the other factors.


2018 ◽  
Vol 12 (1) ◽  
pp. 45-78 ◽  
Author(s):  
Jianjun Zhu ◽  
David K.C. Tse ◽  
Qiang Fei

PurposeTo explain and empirically test how different marketing communication channels interact with each other and contribute to brands’ diverging marketplace performance.Design/methodology/approachWith a unique data set combining key variables of major passenger car brands, the paper takes a source-based perspective to investigate how firm-based communications, expert opinions and online consumer reviews interact and affect brands’ marketplace performance. Then the paper studies the three special boundary conditions under which online consumer reviews’ influence varies in competition with the other two established information sources. Lastly, a study was done to demonstrate the financial significance of investing in different information sources.FindingsThe results show that online consumer reviews mitigate the effectiveness of the other two information sources in driving brand sales. This mitigation effect is also magnified when the brand is weak, firm-based communications are modest and expert opinions are less favorable. The findings further suggest that in the emerging communication enterprise, firm-based and expert-based communications remain the core while user-based communication plays an indispensable competing and complementary role.Practical implicationsIn the new digital era, firms are facing the daunting task of understanding and integrating multiple communication channels. The study provides important implications for both researchers and practitioners with respect to brand management and integrated communications.Originality/valueExisting studies have demonstrated that each of the three communication efforts (by firms, experts and consumers) exerts a significant influence on product sales, but few studies have been conducted in settings marked by the coexistence of these efforts. In addition, the three communication efforts are likely to have different effects on brands with different market positions. The current study is contributing to the literature by filling the above gaps.


2015 ◽  
Vol 39 (3) ◽  
pp. 326-345 ◽  
Author(s):  
David Martín-Moncunill ◽  
Miguel-Ángel Sicilia-Urban ◽  
Elena García-Barriocanal ◽  
Salvador Sánchez-Alonso

Purpose – Large terminologies usually contain a mix of terms that are either generic or domain specific, which makes the use of the terminology itself a difficult task that may limit the positive effects of these systems. The purpose of this paper is to systematically evaluate the degree of domain specificity of the AGROVOC controlled vocabulary terms as a representative of a large terminology in the agricultural domain and discuss the generic/specific boundaries across its hierarchy. Design/methodology/approach – A user-oriented study with domain-experts in conjunction with quantitative and systematic analysis. First an in-depth analysis of AGROVOC was carried out to make a proper selection of terms for the experiment. Then domain-experts were asked to classify the terms according to their domain specificity. An evaluation was conducted to analyse the domain-experts’ results. Finally, the resulting data set was automatically compared with the terms in SUMO, an upper ontology and MILO, a mid-level ontology; to analyse the coincidences. Findings – Results show the existence of a high number of generic terms. The motivation for several of the unclear cases is also depicted. The automatic evaluation showed that there is not a direct way to assess the specificity degree of a term by using SUMO and MILO ontologies, however, it provided additional validation of the results gathered from the domain-experts. Research limitations/implications – The “domain-analysis” concept has long been discussed and it could be addressed from different perspectives. A resume of these perspectives and an explanation of the approach followed in this experiment is included in the background section. Originality/value – The authors propose an approach to identify the domain specificity of terms in large domain-specific terminologies and a criterion to measure the overall domain specificity of a knowledge organisation system, based on domain-experts analysis. The authors also provide a first insight about using automated measures to determine the degree to which a given term can be considered domain specific. The resulting data set from the domain-experts’ evaluation can be reused as a gold standard for further research about these automatic measures.


Author(s):  
Huan Zhao ◽  
Xixiang Zhang ◽  
Keqin Li

Sentiment analysis is becoming increasingly important mainly because of the growth of web comments. Sentiment polarity classification is a popular process in this field. Writing style features, such as lexical and word-based features, are often used in the authorship identification and gender classification of online messages. However, writing style features were only used in feature selection for sentiment classification. This research presents an exploratory study of the group characteristics of writing style features on the Internet Movie Database (IMDb) movie sentiment data set. Furthermore, this study utilizes the specific group characteristics of writing style in improving the performance of sentiment classification. We determine the optimum clustering number of user reviews based on writing style features distribution. According to the classification model trained on a training subset with specific writing style clustering tags, we determine that the model trained on the data set of a specific writing style group has an optimal effect on the classification accuracy, which is better than the model trained on the entire data set in a particular positive or negative polarity. Through the polarity characteristics of specific writing style groups, we propose a general model in improving the performance of the existing classification approach. Results of the experiments on sentiment classification using the IMDb data set demonstrate that the proposed model improves the performance in terms of classification accuracy.


2018 ◽  
Vol 3 (2) ◽  
pp. 236-254
Author(s):  
Rohit Bansal ◽  
Arun Singh ◽  
Sushil Kumar ◽  
Rajni Gupta

Purpose The purpose of this paper is to quantify several measures to examine the determinants of profitability for the listed Indian banks. The authors include both public sector (PSUs) and private sector’s banks in the study. The authors have taken all the banks that are registered on the Bombay stock exchange (BSE) in the sample. This paper also intends to identify the association between the net profit margin (PM) and return on assets (ROA) with the several other independent variables of the Indian banking sector including private banks and public banks over the past six years starting from April 1, 2012 to March 31, 2017. Therefore, a sample of 39 listed banking companies and total 195 balanced observations are selected for the analysis purpose. Design/methodology/approach The authors have used profitability as a dependent variable represented by net PM, ROA and several financial ratios as independent variables. Financial statement and income statement of all listed banks were obtained from BSE and particular company’s website. Panel data regression has been analyzed with both the descriptive research techniques, i.e., fixed effects and random effects. The authors also verified both panel techniques with Hausman’s specification test, which is a widely used procedure for selecting a panel effect. The authors applied PP – Fisher χ2, PP – Choi Z-statistics and Hadri to testing whether the data set is free from unit root problem and data set is a stationary series. Findings Results imply that interest expended interest earned (IEIE) and credit deposit ratio (CRDR) reduced the profitability of private banks in India. IEIE, CRDR and quick ratio (QR) reduced the profitability of public banks in India, while cash deposit ratio (CDR) and Advances to Loan Funds (ALF) increased the effectiveness of public banks. Under the total banks IEIE, CRDR reduced the profitability, on the other side, CDR, ALF and Total Debt to Owners Fund (TDOF) increased the profitability of total banks in India. Under the dependency of ROA, CRDR and TDOF reduced the return of private banks in India, while CDR, ALF and QR enhanced the profitability of private banks. Originality/value No variables found significant under public banks while taking ROA as a dependent variable. Under the overall banking data, CRDR reduced the profitability. On the other side, capital adequacy ratio and ALF increased the profitability of total banks in India. The findings of this study will support policy creators, financial executives and investors in constructing investment decisions.


Author(s):  
Thomas Bos ◽  
Flavius Frasincar

AbstractFinancial investors make trades based on available information. Previous research has proved that microblogs are a useful source for supporting stock market decisions. However, the financial domain lacks specific sentiment lexicons that could be utilized to extract the sentiment from these microblogs. In this research, we investigate automatic approaches that can be used to build financial sentiment lexicons. We introduce weighted versions of the Pointwise Mutual Information approaches to build sentiment lexicons automatically. Furthermore, existing sentiment lexicons often neglect negation while building the sentiment lexicons. In this research, we also propose two methods (Negated Word and Flip Sentiment) to extend the sentiment building approaches to take into account negation when constructing a sentiment lexicon. We build the financial sentiment lexicons by leveraging 200,000 messages from StockTwits. We evaluate the constructed financial sentiment lexicons in two different sentiment classification tasks (unsupervised and supervised). In addition, the created financial sentiment lexicons are compared with each other and with other existing sentiment lexicons. The best performing financial sentiment lexicon is built by combining our Weighted Normalized Pointwise Mutual Information approach with the Negated Word approach. It outperforms all the other sentiment lexicons in the two sentiment classification tasks. In the unsupervised sentiment classification task, it has, on average, a balanced accuracy of 69.4%, and in the supervised setting, a balanced accuracy of 75.1%. Moreover, the various sentiment classification tasks confirm that the sentiment lexicons could be improved by taking into account negation while building the sentiment lexicons. The improvement could be made by using one of the proposed methods to incorporate negation in the sentiment lexicon construction process.


Sign in / Sign up

Export Citation Format

Share Document