Prediction of benign and malignant breast cancer using data mining techniques

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.

Download Full-text

Accuracy Analysis of K-Nearest Neighbor and Naïve Bayes Algorithm in the Diagnosis of Breast Cancer

JURNAL INFOTEL ◽

10.20895/infotel.v12i4.547 ◽

2020 ◽

Vol 12 (4) ◽

pp. 151-159

Author(s):

Irma Handayani ◽

Ikrimach Ikrimach

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Type ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Bayes Algorithm

In the medical field, there are many records of disease sufferers, one of which is data on breast cancer. An extraction process to fine information in previously unknown data is known as data mining. Data mining uses pattern recognition techniques such as statistics and mathematics to find patterns from old data or cases. One of the main roles of data mining is classification. In the classification dataset, there is one objective attribute or it can be called the label attribute. This attribute will be searched from new data on the basis of other attributes in the past. The number of attributes can affect the performance of an algorithm. This results in if the classification process is inaccurate, the researcher needs to double-check at each previous stage to look for errors. The best algorithm for one data type is not necessarily good for another data type. For this reason, the K-Nearest Neighbor and Naïve Bayes algorithms will be used as a solution to this problem. The research method used was to prepare data from the breast cancer dataset, conduct training and test the data, then perform a comparative analysis. The research target is to produce the best algorithm in classifying breast cancer, so that patients with existing parameters can be predicted which ones are malignant and benign breast cancer. This pattern can be used as a diagnostic measure so that it can be detected earlier and is expected to reduce the mortality rate from breast cancer. By making comparisons, this method produces 95.79% for K-Nearest Neighbor and 93.39% for Naïve Bayes

Download Full-text

Analysis of efficiency of classification and prediction algorithms (Naïve Bayes) for Breast Cancer dataset

2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT) ◽

10.1109/erect.2015.7498997 ◽

2015 ◽

Cited By ~ 2

Author(s):

Rashmi G D ◽

A Lekha ◽

Neelam Bawane

Keyword(s):

Breast Cancer ◽

Naive Bayes ◽

Naïve Bayes ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Prediction Algorithms

Download Full-text

Analisis Perbandingan Kinerja Algoritma Naïve Bayes, Decision Tree-J48 dan Lazy-IBK

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3055 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1038

Author(s):

Indra Rukmana ◽

Arvin Rasheda ◽

Faiz Fathulhuda ◽

Muh Rizky Cahyadi ◽

Fitriyani Fitriyani

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Thoracic Surgery ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Breast Cancer Dataset ◽

Decision Tree Algorithm ◽

K Nearest Neighbor ◽

Cancer Dataset

This research is focused on knowing the performance of the classification algorithms, namely Naïve Bayes, Decision Tree-J48 and K-Nearest Neighbor. The speed and the percentage of accuracy in this study are the benchmarks for the performance of the algorithm. This study uses the Breast Cancer and Thoracic Surgery dataset, which is downloaded on the UCI Machine Learning Repository website. Using the help of Weka software Version 3.8.5 to find out the classification algorithm testing. The results show that the J-48 Decision Tree algorithm has the best accuracy, namely 75.6% in the cross-validation test mode for the Breast Cancer dataset and 84.5% for the Thoracic Surgery dataset.

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

Perbandingan Teknik Klasifikasi Neural Network, Support Vector Machine, dan Naive Bayes dalam Mendeteksi Kanker Payudara

BINA INSANI ICT JOURNAL ◽

10.51211/biict.v7i1.1343 ◽

2020 ◽

Vol 7 (1) ◽

pp. 53

Author(s):

Derisma Derisma ◽

Fajri Febrian

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Data Mining ◽

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Accuracy Rate ◽

Cancer Disease ◽

Network Support

Abstrak: Kanker payudara merupakan jenis kanker yang sering ditemukan oleh kebanyakan wanita. Di Indonesia Kanker payudara menempati urutan pertama pada pasien rawat inap di seluruh rumah sakit. Tujuan dari penelitian ini adalah melakukan diagnosis penyakit kanker payudara berbasis komputasi yang dapat menghasilkan bagaimana kondisi kanker seseorang berdasarkan akurasi algoritma. Penelitian ini menggunakan pemrograman orange python dan dataset Wisconsin Breast Cancer untuk pemodelan klasifikasi kanker payudara. Metode data mining yang diterapkan yaitu Neural Network, Support Vector Machine, dan Naive Bayes. Dalam penelitian ini didapat algoritma klasifikasi terbaik yaitu algoritma Kernel SVM dengan tingkat akurasi sebesar 98.9 % dan algoritma terendah yaitu Naive Bayes senilai 96.1 %. Kata kunci: kanker payudara, neural network, support vector machine, naive bayes Abstract: Breast cancer is a type of cancer that mostly found in many women. In Indonesia, breast cancer ranks first in hospitalized patients at every hospital. This study aimed to conduct a computation-based diagnose of breast cancer disease that could produce the state of cancer of an individual based on the accuracy of algorithm. This study used python orange programming and Wisconsin Breast Cancer dataset for a modeling and application of breast cancer classification. The data mining methods that were applied in this study were Neural Network, Support Vector Machine, dan Naive Bayes. In this study, Kernel SVM’s algorithm was the best classification algorithm of breast cancer disease with 98.9 % accuracy rate and Naïve Beyes was the lowest with 96.1 % of accuracy rate. Keywords: breast cancer, neural network, support vector machine, naive bayes

Download Full-text

Optimasi Model Prediksi Kelulusan Mahasiswa Menggunakan Algoritma Naive Bayes

Indonesian Journal of Applied Informatics ◽

10.20961/ijai.v5i1.44379 ◽

2021 ◽

Vol 5 (1) ◽

pp. 32

Author(s):

Hartatik Hartatik

Keyword(s):

Data Mining ◽

Big Data ◽

Prediction Models ◽

Naive Bayes ◽

Program Planning ◽

Naïve Bayes ◽

Bayes Method ◽

Student Graduation ◽

Bayes Algorithm ◽

Naive Bayes Method

Abstrak :Prediksi tentang status kelulusan mahasiswa menjadi persoalan tersendiri di perguruan tinggi. Perguruan tinggi utamanya di era Big Data sangatlah penting untuk melakukan prediksi perilaku akademik mahasiswa aktif sehingga dapat di ketahui kemungkinan mahasiswa bisa studi secara tepat waktu serta dapat diketahui langkah preventive dalam membuat prpgram perencanaan. Salah satu cara yang digunakan adalah teknik data mining yaitu menggunakan Algoritma naive bayes. Algoritma Naive bayes merupakan salah satu metode yang digunakan untuk memprediksi kelulusan mahasiswa. Peneliti dalam hal ini menerapkan metode Naive bayes menggunakan parameter Indeks prestasi kumulatif( IPK) dan membandingkan dengan menggunakan prediksi naive bayes methods berdasarkan parameter IPK dan sosial parameter yaitu jenis kelamin dan status tinggal. Dalam penelitian ini menggunakan parameter akademis dan dilakukan optimasi menggunakan parameter sosial yang melekat pada mahasiswa. Berdasarkan hasil evaluasi untuk mendapatkan akurasi, hasil dari penelitian ini mendapatkan nilai akurasi untuk metode Naive bayes sebesar 75% dan akurasi untuk model prediksi dengan parameter sosial sebesar 85% dengan selisih akurasi 10%.__________________________Abstract : Predictions about a student's graduation status are a problem in college. Major tertiary institutions in the era of Big Data are very important to predict the behavior of active students so that they can find out the possibility of students in a timely manner and can determine preventive steps in making program planning. One method used is data mining techniques using the Naive bayes Algorithm. The Naive bayes algorithm is one of the methods used to predict student graduation. Researchers in this case applied the Naive bayes method using the cumulative achievement index (GPA) parameter and compared using the prediction of the Naive bayes method based on the GPA parameters and social parameters, namely gender and status. This study uses academic parameters and is carried out optimally using social parameters inherent in students. Based on the results of the evaluation to get an accuracy value, the results of this study get an accurate value for the Naive bayes method of 75% and accurate for prediction models with social parameters of 85% with a difference of 10%.

Download Full-text

Prediction and Classification into Benign and Malignant using the Clinical Testing Features

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7411.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 55-61

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Image Classification ◽

Naive Bayes ◽

Malignant Tumors ◽

Naïve Bayes ◽

Support Vector ◽

Natural Image ◽

Data Set ◽

Classification Techniques

Breast Cancer is the most often identified cancer among women and a major reason for the increased mortality rate among women. As the diagnosis of this disease manually takes long hours and the lesser availability of systems, there is a need to develop the automatic diagnosis system for early detection of cancer. The advanced engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. Data mining techniques contribute a lot to the development of such a system, Classification, and data mining methods are an effective way to classify data. For the classification of benign and malignant tumors, we have used classification techniques of machine learning in which the machine learns from the past data and can predict the category of new input. This study is a relative study on the implementation of models using Support Vector Machine (SVM), and Naïve Bayes on Breast cancer Wisconsin (Original) Data Set. With respect to the results of accuracy, precision, sensitivity, specificity, error rate, and f1 score, the efficiency of each algorithm is measured and compared. Our experiments have shown that SVM is the best for predictive analysis with an accuracy of 99.28% and naïve Bayes with an accuracy of 98.56%. It is inferred from this study that SVM is the well-suited algorithm for prediction.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text

Extracting Subset of Relevant Features for Breast Cancer to Improve Accuracy of Classifier

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1507.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1670-1674

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Feature Extraction ◽

Language Processing ◽

Clustering Algorithms ◽

Training Dataset ◽

Mining Machine ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Hidden Patterns

Data mining is the essential step which identifies hidden patterns from large repositories. Medical diagnosis became a major area of current research in data mining. Machine learning technique which use statistical methods to enable machine to improve with experiences and identify hidden patterns in data like regression algorithms, clustering algorithms, classification algorithms, neural networks(ANN,CNN,DL),recommender system algorithms, Apriori algorithms, page ranking algorithms, text search and NLP(natural language processing) etc.., but due to lack of evaluation, these algorithms are unsuccessful in finding a better classifier for images to estimate accuracy of classification in medical image processing. Classification is an supervised learning which predicts the future class for an unknown object. The main purpose is to identify an unknown class by consulting with the neighbor class characteristics. Clustering can be known as unsupervised learning as it label the objects based on the scale of similar characteristics without consulting its class label. Main principle of clustering is find the distance like nearby and faraway based on their similarities and dissimilarities and groups the objects and hence can be used to identify outliers (which are far away from from the object). Feature extraction, variable selection is a method of obtaining a subset of relevant characteristics from large dataset. Too many features of a class may affect the accuracy of classifier. Therefore, feature extraction technique can be used to eliminate irrelevant attributes and increases the accuracy of classifier. In this paper we performed an induction to increase the accuracy of classifier by applying mining techniques in WEKA tool. Breast Cancer dataset is chosen from learning repository to analyze and an experimental analysis was conducted with WEKA tool using training dataset by applying naïve bayes, bayesnet, and PART, ZeroR, J48 and Random Forest techniques on the Wisconsin's dataset on Breast cancer. Finally presented the best classifier where the accuracy is more

Download Full-text

Android Based Naive Bayes Probabilistic Detection Model for Breast Cancer and Mobile Cloud Computing: Design and Implementation

International Journal of Engineering Research in Africa ◽

10.4028/www.scientific.net/jera.21.197 ◽

2015 ◽

Vol 21 ◽

pp. 197-208 ◽

Cited By ~ 6

Author(s):

George Gatuha ◽

Tao Jiang

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Cloud Computing ◽

Health Services ◽

Data Storage ◽

Mobile Cloud Computing ◽

Naive Bayes ◽

Healthcare Delivery ◽

Naïve Bayes ◽

Mobile Cloud

Mobile phone technology initiatives are revolutionizing healthcare delivery in Africa and other developing countries. M-health services have transformed maternal health, management of communicable diseases such as Ebola and prevention of chronic diseases. Technological innovations in m-health have improved healthcare efficiency and effectiveness as well as extending health services to remote locations in rural African communities. This paper describes a ubiquitous m- health system that is based on the user centric paradigm of Mobile Cloud Computing (MCC) and android medical-data mining techniques. The development of ultra-fast 4G mobile networks and sophisticated smartphones and tablets has brought the cloud computing paradigm to the mobile domain.The system’s client side is based on an android platform for breast bio-data collection; a data mining technique based on Naïve Bayes probabilistic classifier (NBC) algorithm for predicting malignancy in breast tissue and the server-side MCC data storage. Experimental results indicate that the android Naïve Bayes classifier achieves 96.4% accuracy on Wisconsin Breast Cancer (WBC) data from UCI machine learning database.

Download Full-text