Mobile Phone Price Class Prediction Using Different Classification Algorithms with Feature Selection and Parameter Optimization

Rapid advances in information and communication technology have made ubiquitous computing and the Internet of Things popular and practicable. These applications create enormous volumes of data, which are available for analysis and classification as an aid to decision-making. Among the classification methods used to deal with big data, feature selection has proven particularly effective. One common approach involves searching through a subset of the features that are the most relevant to the topic or represent the most accurate description of the dataset. Unfortunately, searching through this kind of subset is a combinatorial problem that can be very time consuming. Meaheuristic algorithms are commonly used to facilitate the selection of features. The artificial fish swarm algorithm (AFSA) employs the intelligence underlying fish swarming behavior as a means to overcome optimization of combinatorial problems. AFSA has proven highly successful in a diversity of applications; however, there remain shortcomings, such as the likelihood of falling into a local optimum and a lack of multiplicity. This study proposes a modified AFSA (MAFSA) to improve feature selection and parameter optimization for support vector machine classifiers. Experiment results demonstrate the superiority of MAFSA in classification accuracy using subsets with fewer features for given UCI datasets, compared to the original FASA.

Download Full-text

A Survey on Phishing Detection and The Importance of Feature Selection In Data Mining Classification Algorithms

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i6.pp11-18 ◽

2020 ◽

pp. 11-18

Keyword(s):

Data Mining ◽

Feature Selection ◽

Support Vector ◽

Classification Algorithms ◽

End User ◽

Preparation Methods ◽

Survey Paper ◽

Vector Machines ◽

Feature Selection Techniques ◽

Phishing Detection

: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.

Download Full-text

A Review on Time-domain Peak Detection and Classification Algorithms for Electroencephalogram Signals

Mekatronika ◽

10.15282/mekatronika.v1i2.4995 ◽

2019 ◽

Vol 1 (2) ◽

pp. 115-121

Author(s):

Asrul Adam ◽

Ammar Faiz Zainal Abidin ◽

Zulkifli Md Yusof ◽

Norrima Mokhtar ◽

Mohd Ibrahim Shapiai

Keyword(s):

Feature Selection ◽

Time Domain ◽

Time Domain Analysis ◽

Domain Analysis ◽

Peak Detection ◽

Classification Algorithm ◽

Classification Algorithms ◽

Classification Methods ◽

Eeg Signals ◽

Selection Algorithms

In this paper, the developments in the field of EEG signals peaks detection and classification methods based on time-domain analysis have been discussed. The use of peak classification algorithm has end up the most significant approach in several applications. Generally, the peaks detection and classification algorithm is a first step in detecting any event-related for the variation of signals. A review based on the variety of peak models on their respective classification methods and applications have been investigated. In addition, this paper also discusses on the existing feature selection algorithms in the field of peaks classification.

Download Full-text

Development of a Machine Learning Model for Optimal Applicator Selection in High-Dose-Rate Cervical Brachytherapy

Frontiers in Oncology ◽

10.3389/fonc.2021.611437 ◽

2021 ◽

Vol 11 ◽

Author(s):

Kailyn Stenhouse ◽

Michael Roumeliotis ◽

Robyn Banerjee ◽

Svetlana Yanushkevich ◽

Philip McGeachy

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dose Rate ◽

High Dose Rate ◽

Gradient Boosting ◽

Classification Algorithms ◽

High Dose ◽

Discriminative Performance ◽

The Individual ◽

Voting Model

PurposeTo develop and validate a preliminary machine learning (ML) model aiding in the selection of intracavitary (IC) versus hybrid interstitial (IS) applicators for high-dose-rate (HDR) cervical brachytherapy.MethodsFrom a dataset of 233 treatments using IC or IS applicators, a set of geometric features of the structure set were extracted, including the volumes of OARs (bladder, rectum, sigmoid colon) and HR-CTV, proximity of OARs to the HR-CTV, mean and maximum lateral and vertical HR-CTV extent, and offset of the HR-CTV centre-of-mass from the applicator tandem axis. Feature selection using an ANOVA F-test and mutual information removed uninformative features from this set. Twelve classification algorithms were trained and tested over 100 iterations to determine the highest performing individual models through nested 5-fold cross-validation. Three models with the highest accuracy were combined using soft voting to form the final model. This model was trained and tested over 1,000 iterations, during which the relative importance of each feature in the applicator selection process was determined.ResultsFeature selection indicated that the mean and maximum lateral and vertical extent, volume, and axis offset of the HR-CTV were the most informative features and were thus provided to the ML models. Relative feature importances indicated that the HR-CTV volume and mean lateral extent were most important for applicator selection. From the comparison of the individual classification algorithms, it was found that the highest performing algorithms were tree-based ensemble methods – AdaBoost Classifier (ABC), Gradient Boosting Classifier (GBC), and Random Forest Classifier (RFC). The accuracy of the individual models was compared to the voting model for 100 iterations (ABC = 91.6 ± 3.1%, GBC = 90.4 ± 4.1%, RFC = 89.5 ± 4.0%, Voting Model = 92.2 ± 1.8%) and the voting model was found to have superior accuracy. Over the final 1,000 evaluation iterations, the final voting model demonstrated a high predictive accuracy (91.5 ± 0.9%) and F1 Score (90.6 ± 1.1%).ConclusionThe presented model demonstrates high discriminative performance, highlighting the potential for utilization in informing applicator selection prospectively following further clinical validation.

Download Full-text

Evolutionary Machine Learning for Classification with Incomplete Data

10.26686/wgtn.17072123 ◽

2021 ◽

Author(s):

◽

Cao Truong Tran

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Genetic Programming ◽

Incomplete Data ◽

Missing Values ◽

Machine Learning Techniques ◽

Feature Construction ◽

Classification Algorithms ◽

Learning Techniques ◽

Effectiveness And Efficiency

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values. The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers. The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data. The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data. The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers. The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data. The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data. In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>

Download Full-text

Classification of Imaginary motor task from Electroencephalographic Signals: A Comparison of Feature Selection Methods and Classification Algorithms

10.17488/rmib.39.1.8 ◽

2017 ◽

Author(s):

H. J. Vélez-Lora

Keyword(s):

Feature Selection ◽

Motor Task ◽

Classification Algorithms ◽

Selection Methods ◽

Electroencephalographic Signals

Download Full-text

Fault diagnosis of bearing based on relevance vector machine classifier with improved binary bat algorithm for feature selection and parameter optimization

Advances in Mechanical Engineering ◽

10.1177/1687814016685294 ◽

2017 ◽

Vol 9 (1) ◽

pp. 168781401668529 ◽

Cited By ~ 4

Author(s):

Sheng-wei Fei

Keyword(s):

Feature Selection ◽

Fault Diagnosis ◽

Parameter Optimization ◽

Bat Algorithm ◽

Relevance Vector Machine ◽

Experimental Results ◽

Machine Method ◽

Training Samples ◽

Kernel Parameter

In this article, fault diagnosis of bearing based on relevance vector machine classifier with improved binary bat algorithm is proposed, and the improved binary bat algorithm is used to select the appropriate features and kernel parameter of relevance vector machine. In the improved binary bat algorithm, the new velocities updating method of the bats is presented in order to ensure the decreasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are equal to the current best location’s element, and the increasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are unequal to the current best location’s element, which are helpful to strengthen the optimization ability of binary bat algorithm. The traditional relevance vector machine trained by the training samples with the unreduced features can be used to compare with the proposed improved binary bat algorithm–relevance vector machine method. The experimental results indicate that improved binary bat algorithm–relevance vector machine has a stronger fault diagnosis ability of bearing than the traditional relevance vector machine trained by the training samples with the unreduced features, and fault diagnosis of bearing based on improved binary bat algorithm–relevance vector machine is feasible.

Download Full-text