Analysis of the Effect of Feature Reduction on Accuracy and Computational Time in Mushroom Dataset Classification

2021 ◽  
Vol 10 (1) ◽  
pp. 117
Author(s):  
Agus Prayogo ◽  
I Gede Santi Astawa

Classification is a technique to mapping the class of a certain data from its attribute or feature values. One of things that affects the classification result is the correlation of its features to the class classification results. Research conducted to determine the effect of the reduction in features that are least correlated or have a distant relationship with the classification result class (dependent variable). Because features that do not have much correlation, have no effect on the classification results. From the research, the accuracy of the reduction of each feature per test scenario has a range between 83% -88% higher than the initial accuracy without feature selection at 82% accuracy. Meanwhile, the computation time obtained does not have a significant difference in changing compared to without feature reduction, in the range of 2.3-2.7. For the data used is the Mushroom dataset obtained from the UCI Machine Learning Repository

2021 ◽  
Vol 22 (1) ◽  
pp. 53-66
Author(s):  
D. Anand Joseph Daniel ◽  
M. Janaki Meena

Sentiment analysis of online product reviews has become a mainstream way for businesses on e-commerce platforms to promote their products and improve user satisfaction. Hence, it is necessary to construct an automatic sentiment analyser for automatic identification of sentiment polarity of the online product reviews. Traditional lexicon-based approaches used for sentiment analysis suffered from several accuracy issues while machine learning techniques require labelled training data. This paper introduces a hybrid sentiment analysis framework to bond the gap between both machine learning and lexicon-based approaches. A novel tunicate swarm algorithm (TSA) based feature reduction is integrated with the proposed hybrid method to solve the scalability issue that arises due to a large feature set. It reduces the feature set size to 43% without changing the accuracy (93%). Besides, it improves the scalability, reduces the computation time and enhances the overall performance of the proposed framework. From experimental analysis, it can be observed that TSA outperforms existing feature selection techniques such as particle swarm optimization and genetic algorithm. Moreover, the proposed approach is analysed with performance metrics such as recall, precision, F1-score, feature size and computation time.


Author(s):  
Anand Joseph Daniel ◽  
◽  
M Janaki Meena ◽  

With the massive development of Internet technologies and e-commerce technology, people rely on the product reviews provided by users through web. Sentiment analysis of online reviews has become a mainstream way for businesses on e-commerce platforms to satisfy the customers. This paper proposes a novel hybrid framework with Black Widow Optimization (BWO) based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) based feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analyzed using performance metrics such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


Author(s):  
Heisnam Rohen Singh ◽  
Saroj Kr Biswas ◽  
Monali Bordoloi

Classification is the task of assigning objects to one of several predefined categories. However, developing a classification system is mostly hampered by the size of data. With the increase in the dimension of data, the chance of irrelevant, redundant, and noisy features or attributes also increases. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy, and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and classification with better insight by representing knowledge in symbolic forms. The neuro-fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied to standard datasets to demonstrate their applicability and performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Tarun Dhar Diwan ◽  
Siddartha Choubey ◽  
H. S. Hota ◽  
S. B Goyal ◽  
Sajjad Shaukat Jamal ◽  
...  

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.


2016 ◽  
Vol 7 (4) ◽  
pp. 28-44 ◽  
Author(s):  
Saroj Biswas ◽  
Monali Bordoloi ◽  
Biswajit Purkayastha

This research article attempts to provide a recent survey on neuro-fuzzy approaches for feature selection and classification. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and for providing some insight to the user about the symbolic knowledge embedded within the network. The neuro–fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied on standard datasets to demonstrate the applicability of neuro-fuzzy approaches.


Author(s):  
Yong Liu ◽  
Yunliang Jiang ◽  
Jianhua Yang

Feature selection is a classical problem in machine learning, and how to design a method to select the features that can contain all the internal semantic correlation of the original feature set is a challenge. The authors present a general approach to select features via rough set based reduction, which can keep the selected features with the same semantic correlation as the original feature set. A new concept named inconsistency is proposed, which can be used to calculate the positive region easily and quickly with only linear temporal complexity. Some properties of inconsistency are also given, such as the monotonicity of inconsistency and so forth. The authors also propose three inconsistency based attribute reduction generation algorithms with different search policies. Finally, a “mini-saturation” bias is presented to choose the proper reduction for further predictive designing.


Author(s):  
A. B Yusuf ◽  
R. M Dima ◽  
S. K Aina

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.


2019 ◽  
Vol 8 (4) ◽  
pp. 6055-6058

Handwriting recognition is a challenging machine learning task. Handwritten Recognition (HR) systems have become commercially popular due to their potential applications. The challenges that arise due to wide range of variations in shape, structure ,size and individual writing style can be handled with the combination of a powerful feature extraction technique and an efficient classifier. In this paper, an attempt has been made to compare four different feature extraction cum classifier schemes for English handwritten numeral recognition in terms of computational time and accuracy of recognition. Observations show that single decision tree requires less computation time while SVM yields better accuracy.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2021 ◽  
Vol 9 (1) ◽  
pp. 60-74
Author(s):  
Derry Pramono Adi ◽  
Lukman Junaedi ◽  
Frismanda ◽  
Agustinus Bimo Gumelar ◽  
Andreas Agung Kristanto

Initially, the goal of Machine Learning (ML) advancements is faster computation time and lower computation resources, while the curse of dimensionality burdens both computation time and resource. This paper describes the benefits of the Feature Selection Algorithms (FSA) for speech data under workload stress. FSA contributes to reducing both data dimension and computation time and simultaneously retains the speech information. We chose to use the robust Evolutionary Algorithm, Harmony Search, Principal Component Analysis, Genetic Algorithm, Particle Swarm Optimization, Ant Colony Optimization, and Bee Colony Optimization, which are then to be evaluated using the hierarchical machine learning models. These FSAs are explored with the conversational workload stress data of a Customer Service hotline, which has daily complaints that trigger stress in speaking. Furthermore, we employed precisely 223 acoustic-based features. Using Random Forest, our evaluation result showed computation time had improved 3.6 faster than the original 223 features employed. Evaluation using Support Vector Machine beat the record with 0.001 seconds of computation time.


Sign in / Sign up

Export Citation Format

Share Document