scholarly journals A HYBRID SENTIMENT ANALYSIS APPROACH USING BLACK WIDOW OPTIMIZATION BASED FEATURE SELECTION

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.

Author(s):  
Anand Joseph Daniel ◽  
◽  
M Janaki Meena ◽  

With the massive development of Internet technologies and e-commerce technology, people rely on the product reviews provided by users through web. Sentiment analysis of online reviews has become a mainstream way for businesses on e-commerce platforms to satisfy the customers. This paper proposes a novel hybrid framework with Black Widow Optimization (BWO) based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) based feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analyzed using performance metrics such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2021 ◽  
Vol 22 (1) ◽  
pp. 53-66
Author(s):  
D. Anand Joseph Daniel ◽  
M. Janaki Meena

Sentiment analysis of online product reviews has become a mainstream way for businesses on e-commerce platforms to promote their products and improve user satisfaction. Hence, it is necessary to construct an automatic sentiment analyser for automatic identification of sentiment polarity of the online product reviews. Traditional lexicon-based approaches used for sentiment analysis suffered from several accuracy issues while machine learning techniques require labelled training data. This paper introduces a hybrid sentiment analysis framework to bond the gap between both machine learning and lexicon-based approaches. A novel tunicate swarm algorithm (TSA) based feature reduction is integrated with the proposed hybrid method to solve the scalability issue that arises due to a large feature set. It reduces the feature set size to 43% without changing the accuracy (93%). Besides, it improves the scalability, reduces the computation time and enhances the overall performance of the proposed framework. From experimental analysis, it can be observed that TSA outperforms existing feature selection techniques such as particle swarm optimization and genetic algorithm. Moreover, the proposed approach is analysed with performance metrics such as recall, precision, F1-score, feature size and computation time.


2019 ◽  
Vol 8 (3) ◽  
pp. 2138-2143

Aspect-oriented sentiment analysis is done in two phases like aspect term identification from review and determining related opinion. To carry out this analysis, features play an important role to determine the accuracy of the model. Feature extraction and feature selection techniques contribute to increase the classification accuracy. Feature selection strategies reduce computation time, improve prediction performance, and provides a higher understanding of the information in machine learning and pattern recognition applications etc. This work specifically focuses on aspect extraction from restaurant review dataset but can also be used for other datasets. In this system, we proposed a multivariate filter strategy of feature selection which works on lemma features. This method helps to select relevant features and avoid redundant ones. Initially, the extracted features undergo preprocessing and then the “term-frequency matrix” is generated which contains the occurrence count of features with respect to aspect category. In the next phase, different feature selection strategies are applied which includes selecting features based on correlation, weighted term frequency and weighted term frequency with the correlation coefficient. The performance of weighted term frequency with correlation coefficient approach is compared with the existing system and shows significant improvement in F1 score


The analization of cancer data and normal data for the predication of somatic mu-tation occurrences in the data set plays an important role and several challenges persist in detectingsomatic mutations which leads to complexity of handling large volumes of data in classifi-cation with good accuracy. In many situations the dataset may consist of redundant and less significant features and there is a need to remove insignificant features in order to improve the performance of classification. Feature selection techniques are useful for dimensionality reduction purpose. PCA is one type of feature selection technique to identify significant attributes and is adopted in this paper. A novel technique, PCA based regression decision tree is proposed for classification of somatic mutations data in this paper.The performance analysis of this clas-sification process for the detection of somatic mutation is compared with existing algorithms and satisfactory results are obtained with the proposed model.


2021 ◽  
pp. 2796-2812
Author(s):  
Nishath Ansari

     Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar ◽  
Jared M. Peterson

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.


2019 ◽  
Vol 23 (1) ◽  
pp. 159-189 ◽  
Author(s):  
Siti Rohaidah Ahmad ◽  
Azuraliza Abu Bakar ◽  
Mohd Ridzwan Yaakub

Author(s):  
Abhishek Bhattacharya ◽  
Radha Tamal Goswami ◽  
Kuntal Mukherjee ◽  
Nhu Gia Nguyen

Each Android application requires accumulations of permissions in installation time and they are considered as the features which can be utilized in permission-based identification of Android malwares. Recently, ensemble feature selection techniques have received increasing attention over conventional techniques in different applications. In this work, a cluster based voted ensemble voted feature selection technique combining five base wrapper approaches of R libraries is projected for identifying most prominent set of features in the predictive modeling of Android malwares. The proposed method preserves both the desirable features of an ensemble feature selector, accuracy and diversity. Moreover, in this work, five different data partitioning ratios are considered and the impact of those ratios on predictive model are measured using coefficient of determination (r-square) and root mean square error. The proposed strategy has created significant better outcome in term of the number of selected features and classification accuracy.


Sign in / Sign up

Export Citation Format

Share Document