scholarly journals BREAST CANCER DETECTION USING MAMMOGRAM FEATURES USING RANDOM FOREST ALGORITHM

Breast Cancer is one of the most dangerous diseases for women. This cancer occurs when some breast cells begin to grow abnormally. Machine learning is the subfield of computer science that studies programs that generalize from past experience. This project looks at classification, where an algorithm tries to predict the label for a sample. The machine learning algorithm takes many of these samples, called the training set, and builds an internal model. This built model is used to classify and predict the data. There are two classes, benign and malignant. Random Forest classifier is used to predict whether the cancer is benign or malignant. Training and testing of the model are done by Wisconsin Diagnosis Breast Cancer dataset.

2019 ◽  
Vol 8 (4) ◽  
pp. 4879-4881

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.


2020 ◽  
Vol 14 ◽  

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.


Author(s):  
Jain Aditi

Breast Cancer is a global problem currently. It is a disease which is a common cause of death for women worldwide. Earlier doctors used Mammography to find out that whether the person is suffering from it or not. But sometimes even Mammography was not able to detect whether the result is a yes or a no. for the detection of breast cancer, Machine Learning language can do wonders. Hence, we are making a web application which will tell whether the person is suffering from Breast Cancer in just a minute by giving the input. The input we have used is the Wisconsin Breast Cancer Dataset (WBCD). This web application will be very helpful for doctors and radiologists. Initially, we researched about Breast Cancer and its different treatments. But could not get any idea related to its detection process as any process did not surety. Hence, we came up with the Breast Cancer Detection Web Application.


2020 ◽  
Vol 17 (6) ◽  
pp. 2519-2522
Author(s):  
Kalpna Guleria ◽  
Avinash Sharma ◽  
Umesh Kumar Lilhore ◽  
Devendra Prasad

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.


2019 ◽  
Vol 3 (3) ◽  
pp. 458-469
Author(s):  
Azminuddin I. S. Azis ◽  
Irma Surya Kumala Idris ◽  
Budy Santoso ◽  
Yasin Aril Mustofa

Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.  


Author(s):  
Harco Leslie Hendric Spits Warnars

<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>


2021 ◽  
Vol 8 (3) ◽  
pp. 209-221
Author(s):  
Li-Li Wei ◽  
Yue-Shuai Pan ◽  
Yan Zhang ◽  
Kai Chen ◽  
Hao-Yu Wang ◽  
...  

Abstract Objective To study the application of a machine learning algorithm for predicting gestational diabetes mellitus (GDM) in early pregnancy. Methods This study identified indicators related to GDM through a literature review and expert discussion. Pregnant women who had attended medical institutions for an antenatal examination from November 2017 to August 2018 were selected for analysis, and the collected indicators were retrospectively analyzed. Based on Python, the indicators were classified and modeled using a random forest regression algorithm, and the performance of the prediction model was analyzed. Results We obtained 4806 analyzable data from 1625 pregnant women. Among these, 3265 samples with all 67 indicators were used to establish data set F1; 4806 samples with 38 identical indicators were used to establish data set F2. Each of F1 and F2 was used for training the random forest algorithm. The overall predictive accuracy of the F1 model was 93.10%, area under the receiver operating characteristic curve (AUC) was 0.66, and the predictive accuracy of GDM-positive cases was 37.10%. The corresponding values for the F2 model were 88.70%, 0.87, and 79.44%. The results thus showed that the F2 prediction model performed better than the F1 model. To explore the impact of sacrificial indicators on GDM prediction, the F3 data set was established using 3265 samples (F1) with 38 indicators (F2). After training, the overall predictive accuracy of the F3 model was 91.60%, AUC was 0.58, and the predictive accuracy of positive cases was 15.85%. Conclusions In this study, a model for predicting GDM with several input variables (e.g., physical examination, past history, personal history, family history, and laboratory indicators) was established using a random forest regression algorithm. The trained prediction model exhibited a good performance and is valuable as a reference for predicting GDM in women at an early stage of pregnancy. In addition, there are certain requirements for the proportions of negative and positive cases in sample data sets when the random forest algorithm is applied to the early prediction of GDM.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Zakia Salod ◽  
Yashik Singh

The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients’ fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising.


2020 ◽  
Vol 13 (5) ◽  
pp. 901-908
Author(s):  
Somil Jain ◽  
Puneet Kumar

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.


Sign in / Sign up

Export Citation Format

Share Document