scholarly journals A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis

2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Zakia Salod ◽  
Yashik Singh

The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients’ fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising.

Author(s):  
P. Hamsagayathri ◽  
P. Sampath

Breast cancer is one of the dangerous cancers among world’s women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in the United States. Also, 246,660 new cases of women with cancer are estimated for the year 2016. Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification plays an important role in breast cancer detection and used by researchers to analyse and classify the medical data. In this research work, priority-based decision tree classifier algorithm has been implemented for Wisconsin Breast cancer dataset. This paper analyzes the different decision tree classifier algorithms for Wisconsin original, diagnostic and prognostic dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.


2020 ◽  
Vol 14 ◽  

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.


2020 ◽  
Vol 17 (6) ◽  
pp. 2519-2522
Author(s):  
Kalpna Guleria ◽  
Avinash Sharma ◽  
Umesh Kumar Lilhore ◽  
Devendra Prasad

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.


Author(s):  
Krishnaveni Arumugam, Et. al.

Objective: 1 of every 3 individuals will be determined to have malignancy in the course of their life. Currently, there are more than 3.8 million ladies who have been determined to have breast malignancy in the United States. 2021 is practically around the bend, yet there's still an ideal opportunity to help ladies confronting breast malignancy in 2020. In this paper, chaotic based duck travel optimization (cDTO) meta-heuristic algorithm is introduced to classifying the input images from Mammogram Image Analysis Society (MIAS) database. Methods: Linear Discriminant Analysis is used to extract the mammogram image features. (cDTO-LDA) is an intrinsic algorithm to remove irrelevant features and select the optimal features by using wavelet families Haar (harr), db4 (daubechies), bior4.4 (Biorthogonal), Symlets (SYM8), “Discrete” FIR approximation of Meyer wavelet (dmey) features. Results: These selected features are evaluated by the quality measures such as accuracy, sensitivity, specificity, error rate that are clearly shows the high exactness of cDTO classifier is 98.5%. CSA-LDA classifier has the minimum exactness. Conclusion: Algorithm efficiency is proved by the promising results achieved by the proposed algorithm for selecting the best feature of breast cancer classification.


2019 ◽  
Vol 3 (3) ◽  
pp. 458-469
Author(s):  
Azminuddin I. S. Azis ◽  
Irma Surya Kumala Idris ◽  
Budy Santoso ◽  
Yasin Aril Mustofa

Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.  


Author(s):  
Harco Leslie Hendric Spits Warnars

<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>


Author(s):  
Harco Leslie Hendric Spits Warnars

<p><span lang="EN-US">Frequent patterns in Attribute Oriented Induction High level Emerging Pattern (AOI-HEP), are recognized when have maximum subsumption target (superset) into contrasting (subset) datasets (contrasting </span><span lang="EN-US">⊂</span><span lang="EN-US"> target) and having large High Emerging Pattern (HEP) growth rate and support in target dataset. HEP Frequent patterns had been successful mined with AOI-HEP upon 4 UCI machine learning datasets such as adult, breast cancer, census and IPUMS with the number of instances of 48842, 569, 2458285 and 256932 respectively and each dataset has concept hierarchies built from its five chosen attributes. There are 2 and 1 finding frequent patterns from adult and breast cancer datasets, while there is no frequent pattern from census and IPUMS datasets. The finding HEP frequent patterns from adult dataset are adult which have government workclass with an intermediate education (80.53%) and America as native country(33%). Meanwhile, the only 1 HEP frequent pattern from breast cancer dataset is breast cancer which have clump thickness type of AboutAverClump with cell size of VeryLargeSize(3.56%). Finding HEP frequent patterns with AOI-HEP are influenced by learning on high level concept in one of chosen attribute and extended experiment upon adult dataset where learn on marital-status attribute showed that there is no finding frequent pattern.</span></p>


Sign in / Sign up

Export Citation Format

Share Document