scholarly journals Building Large Scale Cloud System for Product Sentiment Analysis using Hybrid Group Search Optimization Based Feature Selection

A very powerful technology that performs complex computing in a massive scale is known as Cloud computing. There has been a massive growth that has been observed in the data scale which may also be big data which is generated by means of cloud computing which is observed. Sentiment Analysis, on the other hand, denotes the opinion extraction of users from the documents used for review. A sentiment classification that makes use of methods of Machine Learning (ML) can face problems in high dimensionality for a feature vector. Thus, the method of feature selection is needed for the elimination of all noisy and irrelevant features from a feature vector for efficiently working the ML algorithms. All chosen features will be sub-optimal owing to a Non-Deterministic Polynomial (NP) hard type of technique that was used. The Group Search Optimization (GSO) based algorithm which was on the basis of a method of feature selection will find some optimal feature subsets through the elimination of all redundant features. For this work, the method of feature selection based on the GSO was applied to the sentiment classification. There was also a method of feature selection which was hybrid and based on the GSO and Local Beam Search (LBS) that has been proposed for a sentiment classification. The methods proposed were evaluated based on the product review dataset of Amazon. The results of the experiment proved that this method of a hybrid feature selection can outperform all other methods of feature selection for a sentiment classification.

Author(s):  
Midde Venkateswarlu Naik ◽  
D. Vasumathi ◽  
A.P. Siva Kumar

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.


2018 ◽  
Vol 2018 ◽  
pp. 1-5 ◽  
Author(s):  
Asriyanti Indah Pratiwi ◽  
Adiwijaya

Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.


Twitter sentiment analysis is a vital concept in determining the public opinions about products, services, events or personality. Analyzing the medical tweets on a specific topic can provide immense benefits in medical industry. However, the medical tweets require efficient feature selection approach to produce significantly accurate results. Penguin search optimization algorithm (PeSOA) has the ability to resolve NP-hard problems. This paper aims at developing an automated opinion mining framework by modeling the feature selection problem as NP-hard optimization problem and using PeSOA based feature selection approach to solve it. Initially, the medical tweets based on cancer and drugs keywords are extracted and pre-processed to filter the relevant informative tweets. Then the features are extracted based on the Natural Language Processing (NLP) concepts and the optimal features are selected using PeSOA whose results are fed as input to three baseline classifiers to achieve optimal and accurate sentiment classification. The experimental results obtained through MATLAB simulations on cancer and drug tweets using k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) indicate that the proposed PeSOA feature selection based tweet opinion mining has improved the classification performance significantly. It shows that the PeSOA feature selection with the SVM classifier provides superior sentiment classification than the other classifiers


2018 ◽  
Vol 15 (2) ◽  
pp. 437-445 ◽  
Author(s):  
S. Radha ◽  
C. Nelson Kennedy Babu

At present, the cloud computing is emerging technology to run the large set of data capably, and due to fast data growth, processing of large scale data is becoming a main point of information method and customers can estimate the quality of brands of products employing the information given by new digital marketing channels in social media. Thus, every enterprise requires finding and analyzing a big amount of digital data in order to develop their reputation among the customers. Therefore, in this paper, SLA (Service Level Agreement) based BDAAs (Big Data Analytic Applications) using Adaptive Resource Scheduling and big data with cloud based sentiment analysis is proposed to provide the deep web mining, QoS and to analyze the customer behaviors about the product. In this process, the spatio-temporal compression technique can be applied to data compression for reduction of big data. The data is classified in to positive, negative or neutral by employing the SVM with lexicon dictionary based on the customers' behaviors about brand or products. In cloud computing environment, complex to the reduction of resources cost and fluctuation of resource requirements with BDAAs. As a result, it is needed to have a common Analytics as a Service (AaaS) platform that provides a BDAAs to customers in different fields as unpreserved services in a simple to utilize a way with lower cost. Therefore, SLA based BDAAs is developed to utilize the adaptive resource scheduling depending on the customer behaviors and it can provide visualization and data integrity. Our method can give privacy of cloud owner's information with help of data integrity and authentication process. Experimental results of proposed system shows that the sentiment analysis method for online product using cloud based big data is able to classify the opinions of customers accurately and effective of the algorithm in guarantee of SLA.


Author(s):  
Manitosh Chourasiya ◽  
Prof. Devendra Singh Rathod

Sentiment analysis is called detecting emotions extracted from text features and is known as one of the most important parts of opinion extraction. Through this process, we can determine if a script is positive, negative or neutral. In this research, sentiment analysis is performed with textual data. A text feeling analyzer combines natural language processing (NLP) and machine learning techniques to assign weighted assessment scores to entities, subjects, subjects, and categories within a sentence or phrase. In expressing mood, the polarity of text reviews could be graded on a negative to positive scale using a learning algorithm. The current decade has seen significant developments in artificial intelligence, and the machine learning revolution has changed the entire AI industry. After all, machine learning techniques have become an integral part of any model in today's computing world. However, the ensemble to learning techniques is promise a high level of automation with the extraction of generalized rules for text and sentiment classification activities. This thesis aims to design and implement an optimized functionality matrix using to the ensemble learning for the sentiment classification and its applications.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Monalisa Ghosh ◽  
Goutam Sanyal

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.


Author(s):  
Vimal Kumar Stephen K ◽  
V Mathivanan ◽  
Anoud Saleh Rashid Al-Alawi ◽  
Sausan Shinoon Al-Sulti

With the growing popularity of big data analytics in the area of online product review, the biggest issue is voluminous data. Sentiment analysis and opinion mining are useful for solving text and web based issues. For sentiment analysis, this work makes use of the Hadoop framework. The Hadoop is not only reliable but also a fault immune model for processing huge amounts of data. There is a critical role that is played by sentiment analysis in text mining purposes such as in consumer attitude recognition, trade name and product spotting, customer relationship management, and market research. Data is labelled either as subjective or objective based on the subjectivity classification. This subjective classification is further divided as positive, negative or neutral by sentiment classification. The sentiment is classified based on the features which are taken from the data. As feature selection contributes in conserving the classification expense with regard to time and computation load, feature selection has gained a lot of prominence. This work uses the Term Frequency (TF) feature extraction. The objective here is using feature selection based on information Gain (IG) and Particle Swarm Optimization (PSO) for feature selection in sentiment classification. These schemes can decrease the features in the original set as they eliminate redundant features for text sentiment categorization and thus improvise the accuracy of classification. Also, the running time of the learning algorithms is decreased. K-nearest neighbour (KNN) classifier is used for evaluating the suggested scheme. It has been shown by empirical outcomes that compared to the IG based feature selection; the PSO based feature selection scheme attains better and more robust performance.


Sign in / Sign up

Export Citation Format

Share Document