ATS Drug Classification by Using Whale Optimization Based Descriptors

Author(s):  
Chiranjeevi Karri ◽  
M.S.R. Naidu ◽  
Vuppula Manohar ◽  
B. Suribabu Naick ◽  
G Rameshbabu

To improve the wrapper feature selection technique, swarm intelligence (SI) has been a preferred choice. The use of a binary whale optimization algorithm (BWOA) to handle the moleular descriptors selection problem in AMPHETAMINE-TYPE STIMULANTS (ATS) drug categorization has attracted this research. This work aims to improve the classifier's learning and prediction abilities in order to produce better classification results. BWOA are generated using S-shaped transfer functions, which are subsequently consolidated using a k-Nearest Neighbor (k-NN) classifier in the wrapper feature selection. Our goal is to see how different sigmoid transfer functions affect the significant feature selection and classification in BWOA. For performance assessment, several indicators and Wilcoxon's rank-sum test are used. The BWOA-S3 delivers performance improvements with the lowest fitness value, fast convergence, good classification accuracy, and a compact feature subset, according to experimental data. Three distinct classifiers also ratify the generalization of the best feature subset.

2017 ◽  
Vol 25 (4) ◽  
pp. 103-124 ◽  
Author(s):  
Le Nguyen Bao ◽  
Dac-Nhuong Le ◽  
Gia Nhu Nguyen ◽  
Le Van Chung ◽  
Nilanjan Dey

Face recognition is an importance step which can affect the performance of the system. In this paper, the authors propose a novel Max-Min Ant System algorithm to optimal feature selection based on Discrete Wavelet Transform feature for Video-based face recognition. The length of the culled feature vector is adopted as heuristic information for ant's pheromone in their algorithm. They selected the optimal feature subset in terms of shortest feature length and the best performance of classifier used k-nearest neighbor classifier. The experiments were analyzed on face recognition show that the authors' algorithm can be easily implemented and without any priori information of features. The evaluated performance of their algorithm is better than previous approaches for feature selection.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


2019 ◽  
Vol 10 (3) ◽  
pp. 667-678 ◽  
Author(s):  
Jalil Nourmohammadi-Khiarak ◽  
Mohammad-Reza Feizi-Derakhshi ◽  
Khadijeh Behrouzi ◽  
Samaneh Mazaheri ◽  
Yashar Zamani-Harghalani ◽  
...  

AbstractThe number and size of medical databases are rapidly increasing, and the advanced models of data mining techniques could help physicians to make efficient and applicable decisions. The challenges of heart disease data include the feature selection, the number of the samples; imbalance of the samples, lack of magnitude for some features, etc. This study mainly focuses on the feature selection improvement and decreasing the numbers of the features. In this study, imperialist competitive algorithm with meta-heuristic approach is suggested in order to select prominent features of the heart disease. This algorithm can provide a more optimal response for feature selection toward genetic in compare with other optimization algorithms. Also, the K-nearest neighbor algorithm is used for the classification. Evaluation result shows that by using the proposed algorithm, the accuracy of feature selection technique has been improved.


2020 ◽  
Vol 39 (5) ◽  
pp. 6205-6216
Author(s):  
Ramazan Algin ◽  
Ali Fuat Alkaya ◽  
Mustafa Agaoglu

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.


2019 ◽  
Vol 29 (1) ◽  
pp. 1453-1467 ◽  
Author(s):  
Ritam Guha ◽  
Manosij Ghosh ◽  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

Abstract The feature selection process is very important in the field of pattern recognition, which selects the informative features so as to reduce the curse of dimensionality, thus improving the overall classification accuracy. In this paper, a new feature selection approach named Memory-Based Histogram-Oriented Multi-objective Genetic Algorithm (M-HMOGA) is introduced to identify the informative feature subset to be used for a pattern classification problem. The proposed M-HMOGA approach is applied to two recently used feature sets, namely Mojette transform and Regional Weighted Run Length features. The experimentations are carried out on Bangla, Devanagari, and Roman numeral datasets, which are the three most popular scripts used in the Indian subcontinent. In-house Bangla and Devanagari script datasets and Competition on Handwritten Digit Recognition (HDRC) 2013 Roman numeral dataset are used for evaluating our model. Moreover, as proof of robustness, we have applied an innovative approach of using different datasets for training and testing. We have used in-house Bangla and Devanagari script datasets for training the model, and the trained model is then tested on Indian Statistical Institute numeral datasets. For Roman numerals, we have used the HDRC 2013 dataset for training and the Modified National Institute of Standards and Technology dataset for testing. Comparison of the results obtained by the proposed model with existing HMOGA and MOGA techniques clearly indicates the superiority of M-HMOGA over both of its ancestors. Moreover, use of K-nearest neighbor as well as multi-layer perceptron as classifiers speaks for the classifier-independent nature of M-HMOGA. The proposed M-HMOGA model uses only about 45–50% of the total feature set in order to achieve around 1% increase when the same datasets are partitioned for training-testing and a 2–3% increase in the classification ability while using only 35–45% features when different datasets are used for training-testing with respect to the situation when all the features are used for classification.


Extraction and analysis of public opinions from social network data can provide interesting outcomes and inferences about product, service, event or personality. Twitter is one of the most popular medium for analyzing the public sentiment through user tweets. Feature specific opinion analysis provides highly accurate and effective classification and categorization of public opinions. This paper focuses on developing an opinion mining framework for automated analysis of tweet opinions using efficient feature selection and classification algorithms. For this purpose, an Improved Dolphin Echolocation Algorithm (IDEA) is developed by enhancing the optimization performance of the Dolphin Echolocation Algorithm (DEA). The limitations of DEA are the insufficient exploration and exploitation properties in local optimum solutions and also impact the convergence rate. These shortcomings are overcome by the proposed IDEA algorithm. In this work, first the tweets are collected and pre-processed to extract the features using Part-of-Speech (POS) tagging and n-grams aided by a dictionary. Using IDEA, the feature subset candidates are selected and the outcomes are fed as input to the baseline classifiers to obtain highly accurate opinion classification. The evaluation of the k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) classifiers using the two feature selection approaches of DEA and IDEA are performed over cancer and drug tweets datasets and the results illustrate that the classification accuracy of opinions is enhanced significantly through the IDEA based feature selection than the traditional DEA algorithm. These results justify the utilization of the proposed IDEA algorithm for improving the opinion mining applications in different fields.


2021 ◽  
Vol 11 (21) ◽  
pp. 10237
Author(s):  
Thaer Thaher ◽  
Atef Zaguia ◽  
Sana Al Azwari ◽  
Majdi Mafarja ◽  
Hamouda Chantar ◽  
...  

The students’ performance prediction (SPP) problem is a challenging problem that managers face at any institution. Collecting educational quantitative and qualitative data from many resources such as exam centers, virtual courses, e-learning educational systems, and other resources is not a simple task. Even after collecting data, we might face imbalanced data, missing data, biased data, and different data types such as strings, numbers, and letters. One of the most common challenges in this area is the large number of attributes (features). Determining the highly valuable features is needed to improve the overall students’ performance. This paper proposes an evolutionary-based SPP model utilizing an enhanced form of the Whale Optimization Algorithm (EWOA) as a wrapper feature selection to keep the most informative features and enhance the prediction quality. The proposed EWOA combines the Whale Optimization Algorithm (WOA) with Sine Cosine Algorithm (SCA) and Logistic Chaotic Map (LCM) to improve the overall performance of WOA. The SCA will empower the exploitation process inside WOA and minimize the probability of being stuck in local optima. The main idea is to enhance the worst half of the population in WOA using SCA. Besides, LCM strategy is employed to control the population diversity and improve the exploration process. As such, we handled the imbalanced data using the Adaptive Synthetic (ADASYN) sampling technique and converting WOA to binary variant employing transfer functions (TFs) that belong to different families (S-shaped and V-shaped). Two real educational datasets are used, and five different classifiers are employed: the Decision Trees (DT), k-Nearest Neighbors (k-NN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), and LogitBoost (LB). The obtained results show that the LDA classifier is the most reliable classifier with both datasets. In addition, the proposed EWOA outperforms other methods in the literature as wrapper feature selection with selected transfer functions.


2015 ◽  
Vol 83 ◽  
pp. 81-91 ◽  
Author(s):  
Aiguo Wang ◽  
Ning An ◽  
Guilin Chen ◽  
Lian Li ◽  
Gil Alterovitz

Sign in / Sign up

Export Citation Format

Share Document