Pendekatan Machine Learning yang Efisien untuk Prediksi Kanker Payudara

Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.

Download Full-text

Breast Cancer Prediction and Classification Using Supervised Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8924 ◽

2020 ◽

Vol 17 (6) ◽

pp. 2519-2522

Author(s):

Kalpna Guleria ◽

Avinash Sharma ◽

Umesh Kumar Lilhore ◽

Devendra Prasad

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Breast Cancer Dataset ◽

Cancer Awareness ◽

Cancer Dataset ◽

Cancer Prediction

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.

Download Full-text

Breast Cancer Prediction Using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206457 ◽

2020 ◽

pp. 278-284

Author(s):

Gaurav Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Implementation Phase ◽

Machine Learning Classification

Breast cancer may be a prevalent explanation for death, and it's the sole sort of cancer that's widespread among women worldwide. The prime objective of this paper creates the model for predicting breast cancer using various machine learning classification algorithms like k Nearest Neighbor (kNN), Support Vector Machine (SVM), Logistic Regression (LR), and Gaussian Naive Bayes (NB). And furthermore, assess and compare the performance of the varied classifiers as far as accuracy, precision, recall, f1-Score, and Jaccard index. The breast cancer dataset is publicly available on the UCI Machine Learning Repository and therefore the implementation phase dataset is going to be partitioned as 80% for the training phase and 20% for the testing phase then apply the machine learning algorithms. k Nearest Neighbors achieved a significant performance in respect of all parameters.

Download Full-text

Breast Cancer Prediction using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8292.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4879-4881

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Data Science ◽

Breast Cancer Dataset ◽

Random Forest Algorithm ◽

Medical Field ◽

Cancer Dataset ◽

Cancer Prediction ◽

Time Consumption ◽

Simulated Environment

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.

Download Full-text

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

Rainfall-runoff modelling using improved machine learning methods: Harris hawks optimizer vs. particle swarm optimization

Journal of Hydrology ◽

10.1016/j.jhydrol.2020.125133 ◽

2020 ◽

Vol 589 ◽

pp. 125133 ◽

Cited By ~ 7

Author(s):

Yazid Tikhamarine ◽

Doudja Souag-Gamane ◽

Ali Najah Ahmed ◽

Saad Sh. Sammen ◽

Ozgur Kisi ◽

...

Keyword(s):

Machine Learning ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Rainfall Runoff ◽

Learning Methods ◽

Swarm Optimization ◽

Machine Learning Methods

Download Full-text

Entropy Based k Nearest Neighbor Pattern Classification (EbkNN): En-route to Achieving a High Accuracy in Breast Cancer Diagnosis

Asian Journal of Applied Sciences ◽

10.24203/ajas.v8i6.6386 ◽

2020 ◽

Vol 8 (6) ◽

Author(s):

Pushpam Sinha ◽

Ankita Sinha

Keyword(s):

Breast Cancer ◽

Pattern Classification ◽

Test Data ◽

Nearest Neighbor ◽

Training Dataset ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Test Dataset ◽

Data Points

Entropy based k-Nearest Neighbor pattern classification (EbkNN) is a variation of the conventional k-Nearest Neighbor rule of pattern classification, which exclusively optimizes the value of k-neighbors for each test data based on the calculations of entropy. The formula for entropy used in EbkNN is the one that has been defined popularly in information theory for a set of n different types of information (class) attached to a total of m objects (data points) with each object defined by f features. In EbkNN that value of k is chosen for discrimination of given test data for which the entropy is the least non-zero value. Other rules of conventional kNN are retained in EbkNN. It is concluded that EbkNN works best for binary classification. It is computationally prohibitive to use EbkNN for discriminating the data points of the test dataset into number of classes greater than two. The biggest advantage of EbkNN vis-à-vis the conventional kNN is that in one single run of EbkNN algorithm we get optimum classification of test data. But conventional kNN algorithm has to be run separately for each of the selected range of values of k, and then the optimum k to be chosen from amongst them. We also tested our EbkNN method on WDBC (Wisconsin Diagnostic Breast Cancer) dataset. There are 569 instances in this dataset and we made a random choice of first 290 instances as training dataset and the rest 279 instances as test dataset. We got an exceptionally remarkable result with EbkNN method- accuracy close to 100% and better than the ones got by most of the other researchers who worked on WDBC dataset.

Download Full-text

Using Attribute Oriented Induction High Level Emerging Pattern (AOI-HEP) to Mine Frequent Patterns

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp3037-3046 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3037

Author(s):

Harco Leslie Hendric Spits Warnars

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Growth Rate ◽

Frequent Pattern ◽

Breast Cancer Dataset ◽

Frequent Patterns ◽

Cancer Dataset ◽

Extended Experiment ◽

High Level ◽

Emerging Pattern

Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting ⊂ target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.

Download Full-text

Comparative Study of Machine Learning Algorithms using a Breast Cancer Dataset

2020 IEEE International Conference on Electro Information Technology (EIT) ◽

10.1109/eit48999.2020.9208315 ◽

2020 ◽

Author(s):

Zaid A. El-Shair ◽

Luis A. Sanchez-Perez ◽

Samir A. Rawashdeh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Comparative Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Breast Cancer Dataset ◽

Cancer Dataset

Download Full-text

Multi-modal prediction of breast cancer using particle swarm optimization with non-dominating sorting

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720971505 ◽

2020 ◽

Vol 16 (11) ◽

pp. 155014772097150

Author(s):

Vijayalakshmi S ◽

John A ◽

Sunder R ◽

Senthilkumar Mohan ◽

Sweta Bhattacharya ◽

...

Keyword(s):

Breast Cancer ◽

Particle Swarm Optimization ◽

Density Estimation ◽

Kernel Density Estimation ◽

Particle Swarm ◽

Kernel Density ◽

Swarm Optimization ◽

Cancer Prediction ◽

Proposed Model ◽

Sensitivity Specificity

Cancer is enlisted as the second leading reason for death across the world wherein almost one person out of six dies of cancer. Breast cancer is one of the most common forms of cancer predominant in women having the second highest mortality rate in the world. Various scientific studies have been conducted to combat this disease, and machine learning approaches have been an extremely popular choice. Particle swarm optimization has been identified as one of the most powerful and efficient technique for the diagnosis of breast cancer guiding physicians towards timely and accurate treatment. It is also pertinent to mention that multi-modal prediction methods are used to make decisions depending upon different scenarios and aspects whereas the non-dominating sorting feature is useful to sort different objects based on differing requirements. The main novelty of this work is multi-modal prediction algorithm for breast cancer prediction is proposed. The work encompasses the use of particle swarm optimization, non-dominating sorting and multi-classifier techniques, namely, k-nearest neighbour method, fast decision tree and kernel density estimation. Finally, Bayes’ theorem is implemented for revising the results to achieve optimum accuracy in the breast cancer prediction. The proposed particle swarm optimization and non-domination sorting with classifier technique model helps to select the most significant features relevant to breast cancer predictions. The selected features design the objective of the problem model. The proposed model is implemented on the WBCD and WDBC breast cancer data sets publicly available from the UCI machine learning data repository. The metrics considered are sensitivity, specificity, accuracy and time complexity. The experimental results of the study using measures such as sensitivity, specificity, accuracy and time complexity. The experimental results of the study are evaluated against the state-of-the-art algorithms, namely, genetic algorithm kernel density estimation and particle swarm optimization kernel density estimation wherein the results justify the superiority of the proposed model.

Download Full-text

A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis

Journal of Public Health Research ◽

10.4081/jphr.2020.1772 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Zakia Salod ◽

Yashik Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Systematic Review ◽

Bibliometric Analysis ◽

The United States ◽

Blood Analysis ◽

Google Scholar ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Data Source

The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients’ fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising.

Download Full-text