A Comparative Analysis and Predicting for Breast Cancer Detection Based on Data Mining Models

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.

Download Full-text

A Novel Approach for Improving Breast Cancer Risk Prediction using Machine Learning Algorithms : A Survey

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset196634 ◽

2019 ◽

pp. 113-118 ◽

Cited By ~ 1

Author(s):

Madhuri Maru ◽

Saket Swarndeep

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Predictive Analytics ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Predictive Analysis ◽

Support Vector ◽

K Nearest Neighbors

Breast cancer represents one of the diseases that make a high number of deaths every year. It is the most common type of all cancers and the main cause of women's deaths worldwide. Classification and data mining methods are an effective way to classify data. Especially in medical field, where those methods are widely used in diagnosis and analysis to make decisions. Here, a common misconception is that predictive analytics and machine learning are the same thing where in predictive analysis is a statistical learning and machine learning is pattern recognition and explores the notion that algorithms can learn from and make predictions on data. In this paper, we are addressing the problem of predictive analysis by adding machine learning techniques for better prediction of breast cancer. In this, a performance comparison between different machine learning algorithms: Support Vector Machine (SVM), Decision Tree (C4.5), Naive Bayes (NB) and k Nearest Neighbors (k-NN) on the Wisconsin Breast Cancer (original) datasets is conducted. The main objective is to assess the correctness in classifying data with respect to efficiency and effectiveness of hybrid algorithm in terms of accuracy, precision, sensitivity and specificity.

Download Full-text

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

PigLeg: prediction of swine phenotype using machine learning

PeerJ ◽

10.7717/peerj.8764 ◽

2020 ◽

Vol 8 ◽

pp. e8764 ◽

Cited By ~ 2

Author(s):

Siroj Bakoev ◽

Lyubov Getmantseva ◽

Maria Kolosova ◽

Olga Kostyunina ◽

Duane R. Chartier ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Average Daily Gain ◽

Nearest Neighbors ◽

The State ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Leg Weakness

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.

Download Full-text

SVM &Ga-clustering Based Feature Selection Approach for Breast Cancer Detection

International Journal on Soft Computing Artificial Intelligence and Applications ◽

10.5121/ijscai.2020.9401 ◽

2020 ◽

Vol 1 (10) ◽

pp. 1-10

Author(s):

Rashmi Priya ◽

Syed Wajahat Abbas Rizvi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Genetic Programming ◽

Developed Countries ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Data Set ◽

Malignant Breast ◽

Intelligent Methods

Mortality leading among women in developed countries is breast cancer. Breast cancer is women's second most prominent cause of cancer mortality worldwide. In recent decades, women's high prevalence of breast cancer has risen dramatically. This paper discussed several data analysis methods used to detect breast cancer early. Breast cancer diagnosis distinguishes benign and malignant breast lumps. Using data processing tools, we tackled this disease analysis. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. Several clinical breast cancer studies were conducted using soft computing and machine learning techniques. Sometimes their algorithms are easier, easier, or more comprehensive than others. This research is focused on genetic programming and machine learning algorithms to reliably identify benign and malignant breast cancer. This study aimed to optimise the testing algorithm. We used genetic programming methods to choose classification machines' best features and parameter values. Data mining is an important step of library discovery where intelligent methods are used to detect patterns. We are analysing data accessible from the U.C.I. deep-learning data set in Wisconsin. In this experiment, we equate four Weka clustering strategies with genetic clustering. A comparison of results reveals that sequential minimal optimization (S.M.O.) is better than I.B.K. and B.F. Tree processes, i.e. 97.71%.

Download Full-text

APTITUDE Framework for Learning Data Classification Based on Machine Learning

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.51 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Data Classification ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Course Content ◽

Applied Model ◽

Vector Machines ◽

Learning Data

Learning analytics refers to the machine learning to provide predictions of learner success and prescriptions to learners and teachers. The main goal of paper is to proposed APTITUDE framework for learning data classification in order to achieve an adaptation and recommendations a course content or flow of course activities. This framework has applied model for student learning prediction based on machine learning. The five machine learning algorithms are used to provide learning data classification: random forest, Naïve Bayes, k-nearest neighbors, logistic regression and support vector machines

Download Full-text

Using Support Vector Machine Detection of Breast Cancer in Early stage

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0465 ◽

2020 ◽

pp. 213-216

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Early Stage ◽

Breast Cancer Diagnosis ◽

Support Vector ◽

Svm Classifier ◽

K Nearest Neighbors ◽

Data Set ◽

Sensitivity Specificity

The Breast Cancer is disease which tremendously increased in women’s nowadays. Mammography is technique of low-powered X-ray diagnosis approach for detection and diagnosis of cancer diseases at early stage. The proposed system shows the solution of two problems. First shows to detect tumors as suspicious regions with a weak contrast to their background and second shows way to extract features which categorize tumors. Hence this classification can be done with SVM, a great method of statistical learning has made significant achievement in various field. Discovered in the early 90’s, which led to an interest in machine learning? Here the different types of tumor like Benign, Malignant, or Normal image are classified using the SVM classifier. This techniques shows how easily we can detect region of tumor is present in mammogram images with more than 80% of accuracy rates for linear classification using SVM. The 10-fold cross validation to get an accurate outcome is been used by proposed system. The Wisconsin breast cancer diagnosis data set is referred from UCI machine learning repository. The considering accuracy, sensitivity, specificity, false discovery rate, false omission rate and Matthews’s correlation coefficient is appraised in the proposed system. This Provides good result for both training and testing phase. The techniques also shows accuracy of 98.57% and 97.14% by use of Support Vector Machine and K-Nearest Neighbors

Download Full-text

Using Data Mining to Identify COSMIC Function Point Measurement Competence

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp5253-5259 ◽

2018 ◽

Vol 8 (6) ◽

pp. 5253

Author(s):

Selami Bagriyanik ◽

Adem Karahoca

Keyword(s):

Machine Learning ◽

Data Mining ◽

Measurement Errors ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Point Measurement ◽

Function Point ◽

Training Need

Cosmic Function Point (CFP) measurement errors leads budget, schedule and quality problems in software projects. Therefore, it’s important to identify and plan requirements engineers’ CFP training need quickly and correctly. The purpose of this paper is to identify software requirements engineers’ COSMIC Function Point measurement competence development need by using machine learning algorithms and requirements artifacts created by engineers. Used artifacts have been provided by a large service and technology company ecosystem in Telco. First, feature set has been extracted from the requirements model at hand. To do the data preparation for educational data mining, requirements and COSMIC Function Point (CFP) audit documents have been converted into CFP data set based on the designed feature set. This data set has been used to train and test the machine learning models by designing two different experiment settings to reach statistically significant results. Ten different machine learning algorithms have been used. Finally, algorithm performances have been compared with a baseline and each other to find the best performing models on this data set. In conclusion, REPTree, OneR, and Support Vector Machines (SVM) with Sequential Minimal Optimization (SMO) algorithms achieved top performance in forecasting requirements engineers’ CFP training need.

Download Full-text

APLICAÇÃO DE MACHINE LEARNING NA IDENTIFICAÇÃO DE E-MAILS COMO SPAM

Colloquium Exactarum ◽

10.5747/ce.2020.v12.n3.e327 ◽

2021 ◽

Vol 12 (3) ◽

pp. 31-38

Author(s):

Michelle Tais Garcia Furuya ◽

Danielle Elis Garcia Furuya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

The Other ◽

Support Vector ◽

K Nearest Neighbors ◽

Mail Service ◽

E Mail

The e-mail service is one of the main tools used today and is an example that technology facilitates the exchange of information. On the other hand, one of the biggest obstacles faced by e-mail services is spam, the name given to the unsolicited message received by a user. The machine learning application has been gaining prominence in recent years as an alternative for efficient identification of spam. In this area, different algorithms can be evaluated to identify which one has the best performance. The aim of the study is to identify the ability of machine learning algorithms to correctly classify e-mails and also to identify which algorithm obtained the greatest accuracy. The database used was taken from the Kaggle platform and the data were processed bythe Orange software with four algorithms: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naive Bayes (NB). The division of data in training and testing considers 80% of the data for training and 20% for testing. The results show that Random Forest was the best performing algorithm with 99% accuracy.

Download Full-text

PENERAPAN DECISION TREE C4.5 SEBAGAI SELEKSI FITUR DAN SUPPORT VECTOR MACHINE (SVM) UNTUK DIAGNOSA KANKER PAYUDARA

Jurnal Informatika ◽

10.30873/ji.v19i1.1442 ◽

2019 ◽

Vol 19 (1) ◽

pp. 54-61

Author(s):

Pakarti Riswanto ◽

RZ. Abdul Aziz ◽

Sriyanto -

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Cancer Cells ◽

Support Vector ◽

Data Set ◽

Advantages And Disadvantages ◽

New Findings ◽

Tree Classifier

In the field of medicine, the use of data mining has a quite important and evolutionary role that can change the perspective of doctors, practitioners and health researchers in the process of detecting breast cancer in a patient. There are 2 classification applications in it, namely the process of diagnosing (diagnosing) cancer cells that distinguishes between tumors (benign cancer) or malignant cancer and prognosis (prognosis) to determine the possibility of reappearance of cancer cells in patients who have been operated on in the future. Data mining aims to describe new findings in the dataset and explain a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to extract and identify useful information and related knowledge from the database.Classification with data mining can be done using several methods, namely Decision Tree, K-Nearest Neighbor, Naive Bayes, ID3, CART, Linear Discriminant Analysis, etc., which certainly have advantages and disadvantages of each. But in this study, the author focuses on the classification of data mining using the Support Vector Mechine and Deccision Tree algorithms.This study will analyze the Breast Cancer Wisconsin Original data set obtained from the UCI Machine Learning Repository (repository of research data) to classify breast cancer malignancies. This time the author correlates between the Decision Tree classifier algorithm which has good ability to process large databases as a feature selection, then with a proper and relevant SVM Method used in analyzing and diagnosing breast breast cancer patients because it has accurate results for existing problems and several bases . Keywords— Data Mining, diagnosis, Decision Tree, SVM Method

Download Full-text

Miss Predicting Readability of Health Educational Resources for Children Using Semantic Features

International Linguistics Research ◽

10.30560/ilr.v4n2p10 ◽

2021 ◽

Vol 4 (2) ◽

pp. p10

Author(s):

Yanmeng Liu

Keyword(s):

Machine Learning ◽

Health Education ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Semantic Features ◽

K Nearest Neighbors ◽

Education Resources

The success of health education resources largely depends on their readability, as the health information can only be understood and accepted by the target readers when the information is uttered with proper reading difficulty. Unlike other populations, children feature limited knowledge and underdeveloped reading comprehension, which poses more challenges for the readability research on health education resources. This research aims to explore the readability prediction of health education resources for children by using semantic features to develop machine learning algorithms. A data-driven method was applied in this research:1000 health education articles were collected from international health organization websites, and they were grouped into resources for kids and resources for non-kids according to their sources. Moreover, 73 semantic features were used to train five machine learning algorithms (decision tree, support vector machine, k-nearest neighbors algorithm, ensemble classifier, and logistic regression). The results showed that the k-nearest neighbors algorithm and ensemble classifier outperformed in terms of area under the operating characteristic curve sensitivity, specificity, and accuracy and achieved good performance in predicting whether the readability of health education resources is suitable for children or not.

Download Full-text