Analysis of the Effect of Feature Reduction on Accuracy and Computational Time in Mushroom Dataset Classification

Agus Prayogo; I Gede Santi Astawa

doi:10.24843/jlk.2021.v10.i01.p15

Analysis of the Effect of Feature Reduction on Accuracy and Computational Time in Mushroom Dataset Classification

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p15 ◽

2021 ◽

Vol 10 (1) ◽

pp. 117

Author(s):

Agus Prayogo ◽

I Gede Santi Astawa

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computation Time ◽

Feature Reduction ◽

Computational Time ◽

Test Scenario ◽

Classification Result ◽

Distant Relationship ◽

Significant Difference ◽

Feature Values

Classification is a technique to mapping the class of a certain data from its attribute or feature values. One of things that affects the classification result is the correlation of its features to the class classification results. Research conducted to determine the effect of the reduction in features that are least correlated or have a distant relationship with the classification result class (dependent variable). Because features that do not have much correlation, have no effect on the classification results. From the research, the accuracy of the reduction of each feature per test scenario has a range between 83% -88% higher than the initial accuracy without feature selection at 82% accuracy. Meanwhile, the computation time obtained does not have a significant difference in changing compared to without feature reduction, in the range of 2.3-2.7. For the data used is the Mushroom dataset obtained from the UCI Machine Learning Repository

Get full-text (via PubEx)

A Novel Sentiment Analysis for Amazon Data with TSA based Feature Selection

Scalable Computing Practice and Experience ◽

10.12694/scpe.v22i1.1839 ◽

2021 ◽

Vol 22 (1) ◽

pp. 53-66

Author(s):

D. Anand Joseph Daniel ◽

M. Janaki Meena

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

User Satisfaction ◽

Performance Metrics ◽

Computation Time ◽

Feature Reduction ◽

Training Data ◽

Product Reviews ◽

Online Product Reviews

Sentiment analysis of online product reviews has become a mainstream way for businesses on e-commerce platforms to promote their products and improve user satisfaction. Hence, it is necessary to construct an automatic sentiment analyser for automatic identification of sentiment polarity of the online product reviews. Traditional lexicon-based approaches used for sentiment analysis suffered from several accuracy issues while machine learning techniques require labelled training data. This paper introduces a hybrid sentiment analysis framework to bond the gap between both machine learning and lexicon-based approaches. A novel tunicate swarm algorithm (TSA) based feature reduction is integrated with the proposed hybrid method to solve the scalability issue that arises due to a large feature set. It reduces the feature set size to 43% without changing the accuracy (93%). Besides, it improves the scalability, reduces the computation time and enhances the overall performance of the proposed framework. From experimental analysis, it can be observed that TSA outperforms existing feature selection techniques such as particle swarm optimization and genetic algorithm. Moreover, the proposed approach is analysed with performance metrics such as recall, precision, F1-score, feature size and computation time.

Get full-text (via PubEx)

A hybrid sentiment analysis approach using black widow optimization based feature selection

Journal of Engineering Research ◽

10.36909/jer.12039 ◽

2021 ◽

Author(s):

Anand Joseph Daniel ◽

◽

M Janaki Meena ◽

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Performance Metrics ◽

Computation Time ◽

Online Reviews ◽

Reduction Technique ◽

Feature Reduction ◽

Analysis Approach ◽

Black Widow ◽

Feature Selection Technique

With the massive development of Internet technologies and e-commerce technology, people rely on the product reviews provided by users through web. Sentiment analysis of online reviews has become a mainstream way for businesses on e-commerce platforms to satisfy the customers. This paper proposes a novel hybrid framework with Black Widow Optimization (BWO) based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) based feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analyzed using performance metrics such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.

Get full-text (via PubEx)

Recent Neuro-Fuzzy Approaches for Feature Selection and Classification

Exploring Critical Approaches of Evolutionary Computation - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-5832-3.ch001 ◽

2019 ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Heisnam Rohen Singh ◽

Saroj Kr Biswas ◽

Monali Bordoloi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computation Time ◽

Prediction Performance ◽

Learning Problems ◽

Fuzzy Approach ◽

Symbolic Forms ◽

Redundant Data ◽

Neuro Fuzzy ◽

And Performance

Classification is the task of assigning objects to one of several predefined categories. However, developing a classification system is mostly hampered by the size of data. With the increase in the dimension of data, the chance of irrelevant, redundant, and noisy features or attributes also increases. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy, and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and classification with better insight by representing knowledge in symbolic forms. The neuro-fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied to standard datasets to demonstrate their applicability and performance.

Get full-text (via PubEx)

Feature Entropy Estimation (FEE) for Malicious IoT Traffic and Detection Using Machine Learning

Mobile Information Systems ◽

10.1155/2021/8091363 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Tarun Dhar Diwan ◽

Siddartha Choubey ◽

H. S. Hota ◽

S. B Goyal ◽

Sajjad Shaukat Jamal ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Low Cost ◽

Pearson Correlation ◽

Low Complexity ◽

Computational Time ◽

Chi Square ◽

Feature Selection Technique ◽

Detection Techniques ◽

Entropy Estimation

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.

Get full-text (via PubEx)

Review on Feature Selection and Classification using Neuro-Fuzzy Approaches

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2016100102 ◽

2016 ◽

Vol 7 (4) ◽

pp. 28-44 ◽

Cited By ~ 4

Author(s):

Saroj Biswas ◽

Monali Bordoloi ◽

Biswajit Purkayastha

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computation Time ◽

Recent Survey ◽

Learning Problems ◽

Fuzzy Approach ◽

Redundant Data ◽

Classification Feature ◽

Research Article ◽

Neuro Fuzzy

This research article attempts to provide a recent survey on neuro-fuzzy approaches for feature selection and classification. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and for providing some insight to the user about the symbolic knowledge embedded within the network. The neuro–fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied on standard datasets to demonstrate the applicability of neuro-fuzzy approaches.

Get full-text (via PubEx)

Feature Reduction with Inconsistency

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/jcini.2010040106 ◽

2010 ◽

Vol 4 (2) ◽

pp. 77-87 ◽

Cited By ~ 6

Author(s):

Yong Liu ◽

Yunliang Jiang ◽

Jianhua Yang

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rough Set ◽

Attribute Reduction ◽

Classical Problem ◽

Feature Reduction ◽

Temporal Complexity ◽

Semantic Correlation ◽

Positive Region ◽

Original Feature

Feature selection is a classical problem in machine learning, and how to design a method to select the features that can contain all the internal semantic correlation of the original feature set is a challenge. The authors present a general approach to select features via rough set based reduction, which can keep the selected features with the same semantic correlation as the original feature set. A new concept named inconsistency is proposed, which can be used to calculate the positive region easily and quickly with only linear temporal complexity. Some properties of inconsistency are also given, such as the monotonicity of inconsistency and so forth. The authors also propose three inconsistency based attribute reduction generation algorithms with different search policies. Finally, a “mini-saturation” bias is presented to choose the proper reduction for further predictive designing.

Get full-text (via PubEx)

Optimized Breast Cancer Classification using Feature Selection and Outliers Detection

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.331 ◽

2021 ◽

pp. 298-307

Author(s):

A. B Yusuf ◽

R. M Dima ◽

S. K Aina

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Breast Cancer Dataset ◽

The Impact

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.

Get full-text (via PubEx)

Handwritten English Digit Recognition: A Machine Learning Formulation

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8634.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6055-6058

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Handwriting Recognition ◽

Computation Time ◽

Learning Task ◽

Computational Time ◽

Wide Range ◽

Potential Applications ◽

Numeral Recognition ◽

Handwritten Recognition

Handwriting recognition is a challenging machine learning task. Handwritten Recognition (HR) systems have become commercially popular due to their potential applications. The challenges that arise due to wide range of variations in shape, structure ,size and individual writing style can be handled with the combination of a powerful feature extraction technique and an efficient classifier. In this paper, an attempt has been made to compare four different feature extraction cum classifier schemes for English handwritten numeral recognition in terms of computational time and accuracy of recognition. Observations show that single decision tree requires less computation time while SVM yields better accuracy.

Get full-text (via PubEx)

A HYBRID SENTIMENT ANALYSIS APPROACH USING BLACK WIDOW OPTIMIZATION BASED FEATURE SELECTION

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289955 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Computation Time ◽

Online Reviews ◽

Reduction Technique ◽

Feature Reduction ◽

Analysis Approach ◽

Feature Selection Technique ◽

Set Size ◽

Feature Selection Techniques

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.

Get full-text (via PubEx)

Exploring the Time-efficient Evolutionary-based Feature Selection Algorithms for Speech Data under Stressful Work Condition

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v9i1.571 ◽

2021 ◽

Vol 9 (1) ◽

pp. 60-74

Author(s):

Derry Pramono Adi ◽

Lukman Junaedi ◽

Frismanda ◽

Agustinus Bimo Gumelar ◽

Andreas Agung Kristanto

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Customer Service ◽

Harmony Search ◽

Computation Time ◽

Principal Component ◽

Support Vector ◽

Bee Colony ◽

Speech Data ◽

Selection Algorithms

Initially, the goal of Machine Learning (ML) advancements is faster computation time and lower computation resources, while the curse of dimensionality burdens both computation time and resource. This paper describes the benefits of the Feature Selection Algorithms (FSA) for speech data under workload stress. FSA contributes to reducing both data dimension and computation time and simultaneously retains the speech information. We chose to use the robust Evolutionary Algorithm, Harmony Search, Principal Component Analysis, Genetic Algorithm, Particle Swarm Optimization, Ant Colony Optimization, and Bee Colony Optimization, which are then to be evaluated using the hierarchical machine learning models. These FSAs are explored with the conversational workload stress data of a Customer Service hotline, which has daily complaints that trigger stress in speaking. Furthermore, we employed precisely 223 acoustic-based features. Using Random Forest, our evaluation result showed computation time had improved 3.6 faster than the original 223 features employed. Evaluation using Support Vector Machine beat the record with 0.001 seconds of computation time.

Get full-text (via PubEx)