BREAST CANCER DETECTION USING RSFS-BASED FEATURE SELECTION ALGORITHMS IN THERMAL IMAGES

Breast cancer is a common cancer in female. Accurate and early detection of breast cancer can play a vital role in treatment. This paper presents and evaluates a thermogram based Computer-Aided Detection (CAD) system for the detection of breast cancer. In this CAD system, the Random Subset Feature Selection (RSFS) algorithm and hybrid of minimum Redundancy Maximum Relevance (mRMR) algorithm and Genetic Algorithm (GA) with RSFS algorithm are utilized for feature selection. In addition, the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) algorithms are utilized as classifier algorithm. The proposed CAD system is verified using MATLAB 2017 and a dataset that is composed of breast images from 78 patients. The implementation results demonstrate that using RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 85.36% and 75%, and sensitivity of 94.11% and 79.31%, respectively. In addition, using hybrid GA and RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 83.87% and 69.56%, and sensitivity of 96% and 81.81%, respectively, and using hybrid mRMR and RSFS algorithms for feature selection and kNN and SVM algorithms as classifier have accuracy of 77.41% and 73.07%, and sensitivity of 98% and 72.72%, respectively.

Download Full-text

The Impact of Feature Selection Methods for Classifying Arabic Textual Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7163.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1333-1338

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Space ◽

Support Vector ◽

Selection Methods ◽

K Nearest Neighbors ◽

Chi Square ◽

Selection Algorithms ◽

The Impact

Text classification is a vital process due to the large volume of electronic articles. One of the drawbacks of text classification is the high dimensionality of feature space. Scholars developed several algorithms to choose relevant features from article text such as Chi-square (x2 ), Information Gain (IG), and Correlation (CFS). These algorithms have been investigated widely for English text, while studies for Arabic text are still limited. In this paper, we investigated four well-known algorithms: Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Decision Tree against benchmark Arabic textual datasets, called Saudi Press Agency (SPA) to evaluate the impact of feature selection methods. Using the WEKA tool, we have experimented the application of the four mentioned classification algorithms with and without feature selection algorithms. The results provided clear evidence that the three feature selection methods often improves classification accuracy by eliminating irrelevant features.

Download Full-text

Breast Cancer Detection Using Random Forest Classifier

Handbook of Research on Deep Learning-Based Image Analysis Under Constrained and Unconstrained Environments - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6690-9.ch005 ◽

2021 ◽

pp. 85-98

Author(s):

Pavithra Suchindran ◽

Vanithamani R. ◽

Judith Justin

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Cancer Detection ◽

Region Of Interest ◽

Texture Features ◽

Speckle Noise ◽

Random Forest Classifier ◽

Breast Cancer Detection ◽

K Nearest Neighbors ◽

Cad System

Breast cancer is the second most prevalent type of cancer among women. Breast ultrasound (BUS) imaging is one of the most frequently used diagnostic tools to detect and classify abnormalities in the breast. To improve the diagnostic accuracy, computer-aided diagnosis (CAD) system is helpful for breast cancer detection and classification. Normally, a CAD system consists of four stages: pre-processing, segmentation, feature extraction, and classification. In this chapter, the pre-processing step includes speckle noise removal using speckle reducing anisotropic diffusion (SRAD) filter. The goal of segmentation is to locate the region of interest (ROI) and active contour-based segmentation and fuzzy C means segmentation (FCM) are used in this work. The texture features are extracted and fed to a classifier to categorize the images as normal, benign, and malignant. In this work, three classifiers, namely k-nearest neighbors (KNN) algorithm, decision tree algorithm, and random forest classifier, are used and the performance is compared based on the accuracy of classification.

Download Full-text

Efficiency and Scalability Methods in Cancer Detection Problems

Efficiency and Scalability Methods for Computational Intellect ◽

10.4018/978-1-4666-3942-3.ch004 ◽

2013 ◽

pp. 75-94

Author(s):

Inna Stainvas ◽

Alexandra Manevitch

Keyword(s):

Feature Selection ◽

Cancer Detection ◽

Learning Algorithms ◽

Computer Aided Detection ◽

Large Dataset ◽

Cad Systems ◽

X Ray ◽

Cad System ◽

Computer Aided ◽

New Challenges

Computer aided detection (CAD) system for cancer detection from X-ray images is highly requested by radiologists. For CAD systems to be successful, a large amount of data has to be collected. This poses new challenges for developing learning algorithms that are efficient and scalable to large dataset sizes. One way to achieve this efficiency is by using good feature selection.

Download Full-text

Breast Cancer Detection, Diagnosis, and Prediction

International Journal of Information Systems and Computer Sciences ◽

10.30534/ijiscs/2020/01962020 ◽

2020 ◽

Vol 9 (6) ◽

pp. 38-42

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

Breast Imaging ◽

Malignant Tumors ◽

Region Of Interest ◽

Breast Cancer Detection ◽

Support Vector ◽

Svm Classifier ◽

Cad System ◽

Mammogram Images

The early detection, diagnosis, prediction, and treatment of breast cancer are challenginghealthcare problems. This study focuses on outlining the traditional and trending techniques used for breast cancer detection, diagnosis, and prediction, including trending noninvasive, nonionizing, and biomarker genetic techniques.In addition, a Computer Aided Detection (CAD) is introduced to classify benign and malignant tumors in mammograms. This CAD system involves three steps. First, the Region of Interest (ROI) that includesthe tumor is identified using a threshold-based method. Second, a deep learning Convolutional Neural Network (CNN) processes the ROI to extract relevant mammogram features. Finally, a Support Vector Machine (SVM) classifier is used to decode two classes of mammogram structures (i.e., Benign (B), and Malignant (M) nodules). The training processes and implementations were carried out using 2800 mammogram images taken from the Curated Breast Imaging Subset of DDSM (CBIS-DDSM). Results have shown that the accuracy of CNN-SVM system achieves 85.1% using AlexNet CNN. Comparison with related work shows the promise of the proposed CAD system

Download Full-text

KLASIFIKASI MASSA PADA CITRA MAMMOGRAM MENGGUNAKAN KOMBINASI SELEKSI FITUR F-SCORE DAN LS-SVM

Teknologi ◽

10.26594/teknologi.v6i1.558 ◽

2016 ◽

Vol 6 (1) ◽

pp. 27

Author(s):

Muhammad I. Rosadi ◽

Agus Z. Arifin ◽

Anny Yuniarti

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Feature Extraction ◽

Feature Selection ◽

Support Vector ◽

Gray Level ◽

Computer Aided Detection ◽

Computer Aided ◽

Occurrence Matrix ◽

Kernel Parameters

ABSTRAKKanker payudara adalah penyakit yang paling umum diderita oleh perempuan pada banyak negara. Pemeriksaan kanker payudara dapat dilakukan menggunakan citra Mammogram dengan teknologi sistem Computer-Aided Detection (CAD). Analisis CAD yang telah dikembangkan adalah ekstraksi fitur GLCM, reduksi/seleksi fitur, dan SVM. Pada SVM (Support Vector Machine) maupun LS-SVM (Least Square Support Vector Machine) terdapat tiga masalah yang muncul, yaitu: Bagaimana memilih fungsi kernel, berapa jumlah fitur input yang dioptimalkan, dan bagaimana menentukan parameter kernel terbaik. Jumlah fitur dan nilai parameter kernel yang diperlukan saling mempengaruhi, sehingga seleksi fitur diperlukan dalam membangun sistem klasifikasi. Pada penelitian ini bertujuan untuk mengklasifikasi massa pada citra Mammogram berdasarkan dua kelas yaitu kelas kanker jinak dan kelas kanker ganas. Ekstraksi fitur menggunakan Gray Level Co-occurrence Matrix (GLCM). Hasil proses ekstraksi fitur tersebut kemudian diseleksi mengunakan metode F-Score. F-Score diperoleh dengan menghitung nilai diskriminan data hasil ekstraksi fitur di antara data dua kelas pada data training. Nilai F-Score masing-masing fitur kemudian diurutkan secara descending. Hasil pengurutan tersebut digunakan untuk membuat kombinasi fitur. Kombinasi fitur tersebut digunakan sebagai input LS-SVM. Dari hasil uji coba penelitian ini didapatkan, bahwa menggunakan kombinasi seleksi fitur sangat berpengaruh terhadap tingkat akurasi. Akurasi terbaik didapat dengan menggunakan LS-SVM RBF dan SVM RBF baik dengan kombinasi seleksi fitur, maupun tanpa kombinasi seleksi fitur dengan nilai akurasi yaitu 97,5%. Selain itu juga seleksi fitur mampu mengurangi waktu komputasi.Kata Kunci: F-Score, GLCM, kanker payudara, LS-SVM.ABSTRACTBreast cancer is the most common disease suffered by women in many countries. Breast cancer screening can be done using a mammogram image. Computer-aided detection system (CAD). CAD analysis that has been developed is GLCM efficient feature extraction, reduction / feature selection and SVM. In SVM (Support Vector Machine) and LS-SVM (Support Vector Machine Square least) there are three problems that arise, namely; how to choose the kernel function, how many input fea-tures are optimal, and how to determine the best kernel parameters. The number of fea-tures and value required kernel parameters affect each other, so that the selection of the features needed to build a system of classification. In this study aims to classify image of masses on digital mammography based on two classes benign cancer and malignant cancer. Feature extraction using gray level co-occurrence matrix (GLCM). The results of the feature extraction process then selected using the method F-Score. F-Score is obtained by calculating the value of the discriminant feature extraction results data between two classes of data in the data training. Value F-Score of each feature and then sorted in descending order. The sequenc-ing results are used to make the combination of fea-tures. The combination of these features are used as input LS-SVM. From the experiments that use a combination of feature selection affects the accuracy ting-kat. Best accuracy obtained using LS-SVM and SVM RBF RBF with combi-nation or without the combination of feature selection with accuracy value is 97.5%. It also features a selection able to curate the computa-tion time.Keywords: Breast Cancer, F-Score, GLCM, LS-SVM.

Download Full-text

Predicting Breast Cancer: A Comparative Analysis of Machine Learning Algorithms

Proceeding International Conference on Science and Engineering ◽

10.14421/icse.v3.545 ◽

2020 ◽

Vol 3 ◽

pp. 455-459

Author(s):

Pulung Hendro Prastyo ◽

I Gede Yudi Paramartha ◽

Michael S. Moses Pakpahan ◽

Igi Ardiyanto

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Confusion Matrix ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors ◽

Common Cancer

Breast cancer is the most common cancer among women (43.3 incidents per 100.000 women), with the highest mortality (14.3 incidents per 100.000 women). Early detection is critical for survival. Using machine learning approaches, the problem can be effectively classified, predicted, and analyzed. In this study, we compared eight machine learning algorithms: Gaussian Naïve Bayes (GNB), k-Nearest Neighbors (K-NN), Support Vector Machine(SVM), Random Forest (RF), AdaBoost, Gradient Boosting (GB), XGBoost, and Multi-Layer Perceptron (MLP). The experiment is conducted using Breast Cancer Wisconsin datasets, confusion matrix, and 5-folds cross-validation. Experimental results showed that XGBoost provides the best performance. XGBoost obtained accuracy (97,19%), recall (96,75%), precision (97,28%), F1-score (96,99%), and AUC (99,61%). Our result showed that XGBoost is the most effective method to predict breast cancer in the Breast Cancer Wisconsin dataset.

Download Full-text

Breast Cancer Identification from Patients’ Tweet Streaming Using Machine Learning Solution on Spark

Complexity ◽

10.1155/2021/6653508 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Nahla F. Omran ◽

Sara F. Abd-el Ghany ◽

Hager Saleh ◽

Ayman Nabil

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Real Time ◽

Random Forest Classifier ◽

Streaming Data ◽

Support Vector ◽

Real Time System ◽

Selection Algorithms

Twitter integrates with streaming data technologies and machine learning to add new value to healthcare. This paper presented a real-time system to predict breast cancer based on streaming patient’s health data from Twitter. The proposed system consists of two major components: developing an offline building model and an online prediction pipeline. For the first component, we made a correlation between the features to determine the correlation between features and reduce the number of features from the Breast Cancer Wisconsin Diagnostic dataset. Two feature selection algorithms are recursive feature elimination and univariate feature selection algorithms which are applied to features after correlation to select the essential features. Four decision trees, logistic regression, support vector machine, and random forest classifier have been used on features after correlation and feature selection. Also, hyperparameter tuning and cross-validation have been applied with machine learning to optimize models and enhance accuracy. Apache Spark, Apache Kafka, and Twitter Streaming API are used to develop the second component. The best model with the highest accuracy obtained from the first component predicts breast cancer in real time from tweets’ streaming. The results showed that the best model is the random forest classifier which achieved the best accuracy.

Download Full-text

Effective Feature Set Selection and Centroid Classifier Algorithm for Web Services Discovery

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp441-450 ◽

2017 ◽

Vol 5 (2) ◽

pp. 441 ◽

Cited By ~ 2

Author(s):

Venkatachalam K ◽

Karthikeyan NK

Keyword(s):

Feature Selection ◽

Web Services ◽

Selection Procedure ◽

Vital Role ◽

Document Classification ◽

Support Vector ◽

Bayes Classifier ◽

K Nearest Neighbors ◽

Vector Machines ◽

Centroid Classifier

<p>Text preprocessing and document classification plays a vital role in web services discovery. Nearest centroid classifiers were mostly employed in high-dimensional application including genomics. Feature selection is a major problem in all classifiers and in this paper we propose to use an effective feature selection procedure followed by web services discovery through Centroid classifier algorithm. The task here in this problem statement is to effectively assign a document to one or more classes. Besides being simple and robust, the centroid classifier s not effectively used for document classification due to the computational complexity and larger memory requirements. We address these problems through dimensionality reduction and effective feature set selection before training and testing the classifier. Our preliminary experimentation and results shows that the proposed method outperforms other algorithms mentioned in the literature including K-Nearest neighbors, Naive Bayes classifier and Support Vector Machines.</p>

Download Full-text

Two way Threshold Based Intelligent Water Drops Feature Selection Algorithm for Accurate Detection of Breast Cancer

10.21203/rs.3.rs-613900/v1 ◽

2021 ◽

Author(s):

Dhruba Jyoti Kalita ◽

Vibhav Prakash Singh ◽

Vinay Kumar

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Early Stage ◽

System Optimization ◽

Support Vector ◽

Features Selection ◽

Cad System ◽

Optimal Subset ◽

Water Drops ◽

Intelligent Water Drops

Abstract Breast cancer is one of the common reasons for deaths of women over the globe. It has been found that a Computer- Aided Diagnosis (CAD) system can be designed using X-ray mammograms for early-stage detection of breast cancer, which can decrease the death rate to a large extend. This paper work proposes a novel 2-way threshold based Intelligent water drops (IWD) algorithm for feature selection to design an effective and efficient CAD system that can detect breast cancer in early stage. This approach first extracts the Local Binary Patterns (LBP) in wavelet domain from mammograms and then apply our introduced 2-way threshold based (IWD) algorithm to extract most important subset of features from the extracted features set. 2-way thresholding is a technique to find a lower bound (LB) and an upper bound (UB) on the number of features to be selected in the optimal subset. So, using these threshold values IWD is capable of producing multiple optimal subsets of features rather than producing a single optimal subset of features. The best subset among the above subsets is then used train and deploy Support Vector Machine (SVM) to classify new mammograms. The results have shown that the proposed model outperforms many of the existing CAD systems. Further we have compared our introduced feature selection technique with other meta heuristic features selection techniques such as Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Simulated Annealing (SA), Genetic Algorithm (GA), Gravitational Search Algorithm (GSA), Inclined Planes System Optimization (IPO) and Grey Wolf Optimization Algorithm (GWO) and found that it outperforms the others. The accuracy, precision, recall, specificity and F1-score of our proposed framework are measured as 99%, 98.7% ,98.123%, 96.2% and 98.4% respectively.

Download Full-text