Comparative analysis of proficiencies of various textures and geometric features in breast mass classification using k-nearest neighbor

AbstractThis paper introduces a comparative analysis of the proficiencies of various textures and geometric features in the diagnosis of breast masses on mammograms. An improved machine learning-based framework was developed for this study. The proposed system was tested using 106 full field digital mammography images from the INbreast dataset, containing a total of 115 breast mass lesions. The proficiencies of individual and various combinations of computed textures and geometric features were investigated by evaluating their contributions towards attaining higher classification accuracies. Four state-of-the-art filter-based feature selection algorithms (Relief-F, Pearson correlation coefficient, neighborhood component analysis, and term variance) were employed to select the top 20 most discriminative features. The Relief-F algorithm outperformed other feature selection algorithms in terms of classification results by reporting 85.2% accuracy, 82.0% sensitivity, and 88.0% specificity. A set of nine most discriminative features were then selected, out of the earlier mentioned 20 features obtained using Relief-F, as a result of further simulations. The classification performances of six state-of-the-art machine learning classifiers, namely k-nearest neighbor (k-NN), support vector machine, decision tree, Naive Bayes, random forest, and ensemble tree, were investigated, and the obtained results revealed that the best classification results (accuracy = 90.4%, sensitivity = 92.0%, specificity = 88.0%) were obtained for the k-NN classifier with the number of neighbors having k = 5 and squared inverse distance weight. The key findings include the identification of the nine most discriminative features, that is, FD26 (Fourier Descriptor), Euler number, solidity, mean, FD14, FD13, periodicity, skewness, and contrast out of a pool of 125 texture and geometric features. The proposed results revealed that the selected nine features can be used for the classification of breast masses in mammograms.

Download Full-text

A Comparative Analysis of Machine Learning Algorithms Modeled from Machine Vision-Based Lettuce Growth Stage Classification in Smart Aquaponics

International Journal of Environmental Science and Development ◽

10.18178/ijesd.2020.11.9.1288 ◽

2020 ◽

Vol 11 (9) ◽

pp. 442-449 ◽

Cited By ~ 1

Author(s):

Sandy C. Lauguico ◽

◽

Ronnie S. Concepcion II ◽

Jonnel D. Alejandrino ◽

Rogelio Ruzcko Tobias ◽

...

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Machine Vision ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Urban Farming ◽

K Nearest Neighbor ◽

Lettuce Growth

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

Adaptive Multi-level Backward Tracking for Sequential Feature Selection

Journal of ICT Research and Applications ◽

10.5614/itbj.ict.res.appl.2021.15.1.1 ◽

2021 ◽

Vol 15 (1) ◽

pp. 1-20

Author(s):

Knitchepon Chotchantarakun ◽

Ohm Sornil

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Second Phase ◽

K Nearest Neighbor ◽

Backward Tracking ◽

Sequential Feature Selection ◽

Multi Level ◽

A New Technique ◽

Two Phases

In the past few decades, the large amount of available data has become a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models. In this research, we propose a new technique called One-level Forward Multi-level Backward Selection (OFMB). The proposed algorithm consists of two phases. The first phase aims to create preliminarily selected subsets. The second phase provides an improvement on the previous result by an adaptive multi-level backward searching technique. Hence, the idea is to apply an improvement step during the feature addition and an adaptive search method on the backtracking step. We have tested our algorithm on twelve standard UCI datasets based on k-nearest neighbor and naive Bayes classifiers. Their accuracy was then compared with some popular methods. OFMB showed better results than the other sequential forward searching techniques for most of the tested datasets.

Download Full-text

Predicting heart failure using a wrapper-based feature selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i3.pp1530-1539 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1530

Author(s):

Minh Tuan Le ◽

Minh Thanh Vo ◽

Nhat Tan Pham ◽

Son V.T Dao

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Medical Practitioners ◽

Machine Learning Model ◽

Heart Contraction ◽

Artificial Neural Network Ann

In the current health system, it is very difficult for medical practitioners/physicians to diagnose the effectiveness of heart contraction. In this research, we proposed a machine learning model to predict heart contraction using an artificial neural network (ANN). We also proposed a novel wrapper-based feature selection utilizing a grey wolf optimization (GWO) to reduce the number of required input attributes. In this work, we compared the results achieved using our method and several conventional machine learning algorithms approaches such as support vector machine, decision tree, K-nearest neighbor, naïve bayes, random forest, and logistic regression. Computational results show not only that much fewer features are needed, but also higher prediction accuracy can be achieved around 87%. This work has the potential to be applicable to clinical practice and become a supporting tool for doctors/physicians.

Download Full-text

Feature selection via computational intelligence techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189090 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6205-6216

Author(s):

Ramazan Algin ◽

Ali Fuat Alkaya ◽

Mustafa Agaoglu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Naive Bayes ◽

Search Algorithms ◽

Naïve Bayes ◽

Feature Subset ◽

K Nearest Neighbor ◽

Migrating Birds Optimization

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.

Download Full-text

YouTube Spam Comment Detection Using Support Vector Machine and K–Nearest Neighbor

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i2.pp612-619 ◽

2018 ◽

Vol 12 (2) ◽

pp. 612 ◽

Cited By ~ 2

Author(s):

Aqliima Aziz ◽

Cik Feresa Mohd Foozy ◽

Palaniappan Shamala ◽

Zurinah Suradi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Social Networking ◽

Nearest Neighbor ◽

Good Accuracy ◽

Support Vector ◽

Learning Tools ◽

K Nearest Neighbor ◽

Accuracy Result

Social networking such as YouTube, Facebook and others are very popular nowadays. The best thing about YouTube is user can subscribe also giving opinion on the comment section. However, this attract the spammer by spamming the comments on that videos. Thus, this study develop a YouTube detection framework by using Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). There are five (5) phases involved in this research such as Data Collection, Pre-processing, Feature Selection, Classification and Detection. The experiments is done by using Weka and RapidMiner. The accuracy result of SVM and KNN by using both machine learning tools show good accuracy result. Others solution to avoid spam attack is trying not to click the link on comments to avoid any problems.

Download Full-text

PENGARUH SELEKSI FITUR CITRA TERHADAP KLASIFIKASI TINGKAT KESEGARAN DAGING SAPI LOKAL

Jurnal Teknik Pertanian Lampung (Journal of Agricultural Engineering) ◽

10.23960/jtep-l.v10i1.85-95 ◽

2021 ◽

Vol 10 (1) ◽

pp. 85

Author(s):

Titin Yulianti ◽

Mareli Telaumbanua ◽

Hery Dian Septama ◽

Helmy Fitriawan ◽

Afri Yudamson

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Selection Process ◽

Human Perception ◽

Image Feature ◽

K Nearest Neighbor ◽

Color Intensity ◽

Sensitivity Specificity ◽

Selection Algorithms

Identifying beef manually has some drawbacks because human visual has limitations and there are differences of human perception in assessing object quality. Several researches developed beef quality assessment methods based on image feature extraction. However, not all features support for obtaining the classification results that have high accuracy. The efficiency will be achieved if the classification analyzes only the relevant features. Therefore, a feature selection process is required to select relevant features and to eliminate irrelevant features to obtain more accurate and faster classification results. One of the feature selection algorithms is the F-Score which is a simple technique that measures the discrimination of two sets of real numbers. The features with the lowest ranking from the F-Score will be eliminated one by one until the most relevant features are obtained. The test is carried out by analyzing the classification results in the form of sensitivity, specificity, and accuracy values. The results of this research showed that by using the F-Score feature, the most relevant features for the classification of freshness level of local beef are obtained using the K-Nearest Neighbor (KNN) method. These features include the average color intensity R and standard deviation with a sensitivity of 0.8, a specificity of 0.93, and an accuracy of 86%. Keywords: Classification, Fiture Selection, F-Score, K-Nearest Neighbor, Local beef

Download Full-text

K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1204 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Zoelkarnain Rinanda Tembusai ◽

Herman Mawengkang ◽

Muhammad Zarlis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Analytic Hierarchy Process ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Analytic Hierarchy ◽

Machine Learning Model ◽

Hierarchy Process ◽

Fold Cross Validation

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

Download Full-text

Genetic Algorithm to Optimize k-Nearest Neighbor Parameter for Benchmarked Medical Datasets Classification

Jurnal Online Informatika ◽

10.15575/join.v5i2.656 ◽

2020 ◽

Vol 5 (2) ◽

pp. 153

Author(s):

Rizki Tri Prasetio

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Feature Selection ◽

Nearest Neighbor ◽

Learning Algorithm ◽

P Value ◽

Computer Assisted ◽

K Nearest Neighbor ◽

Forward Selection ◽

Backward Elimination

Computer assisted medical diagnosis is a major machine learning problem being researched recently. General classifiers learn from the data itself through training process, due to the inexperience of an expert in determining parameters. This research proposes a methodology based on machine learning paradigm. Integrates the search heuristic that is inspired by natural evolution called genetic algorithm with the simplest and the most used learning algorithm, k-nearest Neighbor. The genetic algorithm were used for feature selection and parameter optimization while k-nearest Neighbor were used as a classifier. The proposed method is experimented on five benchmarked medical datasets from University California Irvine Machine Learning Repository and compared with original k-NN and other feature selection algorithm i.e., forward selection, backward elimination and greedy feature selection. Experiment results show that the proposed method is able to achieve good performance with significant improvement with p value of t-Test is 0.0011.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text