A new generalization of Weibull distribution with application to a breast cancer data set

Abdus S. Wahed; The Minh Luong; Jong-Hyeon Jeong

doi:10.1002/sim.3598

Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.20.22115 ◽

2018 ◽

Vol 7 (4.20) ◽

pp. 22 ◽

Cited By ~ 4

Author(s):

Jabeen Sultana ◽

Abdul Khader Jilani ◽

. .

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Regression Method ◽

Breast Cancer Dataset ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data ◽

Logistic Regression Method ◽

Simple Logistic

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree. 10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.

Download Full-text

Using new artificial bee colony as probabilistic neural network for breast cancer data classification

Frontiers in Engineering and Built Environment ◽

10.1108/febe-03-2021-0015 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Habib Shah

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Artificial Bee Colony ◽

Probabilistic Neural Network ◽

Breast Cancers ◽

Breast Cancer Data ◽

Data Set ◽

Content Type ◽

Cancer Data ◽

Bee Colony

PurposeBreast cancer is an important medical disorder, which is not a single disease but a cluster more than 200 different serious medical complications.Design/methodology/approachThe new artificial bee colony (ABC) implementation has been applied to probabilistic neural network (PNN) for training and testing purpose to classify the breast cancer data set.FindingsThe new ABC algorithm along with PNN has been successfully applied to breast cancers data set for prediction purpose with minimum iteration consuming.Originality/valueThe new implementation of ABC along PNN can be easily applied to times series problems for accurate prediction or classification.

Download Full-text

A Markov chain model of a longitudinal breast cancer data set.

Journal of Clinical Oncology ◽

10.1200/jco.2014.32.15_suppl.11040 ◽

2014 ◽

Vol 32 (15_suppl) ◽

pp. 11040-11040

Author(s):

Paul K. Newton ◽

Jorge J. Nieva ◽

Peter Kuhn ◽

Larry Norton ◽

Elizabeth Anne Comen ◽

...

Keyword(s):

Breast Cancer ◽

Markov Chain ◽

Markov Chain Model ◽

Chain Model ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

Download Full-text

Supervised Learning Breast Cancer Data Set Analysis in MATLAB Using Novel SVM Classifier

Advances in Intelligent Systems and Computing - Machine Intelligence and Soft Computing ◽

10.1007/978-981-15-9516-5_22 ◽

2021 ◽

pp. 255-263

Author(s):

Prasanna Priya Golagani ◽

Tummala Sita Mahalakshmi ◽

Shaik Khasim Beebi

Keyword(s):

Breast Cancer ◽

Supervised Learning ◽

Svm Classifier ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

Download Full-text

Synthesising artificial patient-level data for Open Science - an evaluation of five methods

10.1101/2020.10.09.20210138 ◽

2020 ◽

Author(s):

Michael Allen ◽

Andrew Salmon

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Synthetic Data ◽

Original Data ◽

Classification Model ◽

Data Sets ◽

List Type ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

ABSTRACTBackgroundOpen science is a movement seeking to make scientific research accessible to all, including publication of code and data. Publishing patient-level data may, however, compromise the confidentiality of that data if there is any significant risk that data may later be associated with individuals. Use of synthetic data offers the potential to be able to release data that may be used to evaluate methods or perform preliminary research without risk to patient confidentiality.MethodsWe have tested five synthetic data methods:A technique based on Principal Component Analysis (PCA) which samples data from distributions derived from the transformed data.Synthetic Minority Oversampling Technique, SMOTE which is based on interpolation between near neighbours.Generative Adversarial Network, GAN, an artificial neural network approach with competing networks - a discriminator network trained to distinguish between synthetic and real data., and a generator network trained to produce data that can fool the discriminator network.CT-GAN, a refinement of GANs specifically for the production of structured tabular synthetic data.Variational Auto Encoders, VAE, a method of encoding data in a reduced number of dimensions, and sampling from distributions based on the encoded dimensions.Two data sets are used to evaluate the methods:The Wisconsin Breast Cancer data set, a histology data set where all features are continuous variables.A stroke thrombolysis pathway data set, a data set describing characteristics for patients where a decision is made whether to treat with clot-busting medication. Features are mostly categorical, binary, or integers.Methods are evaluated in three ways:The ability of synthetic data to train a logistic regression classification model.A comparison of means and standard deviations between original and synthetic data.A comparison of covariance between features in the original and synthetic data.ResultsUsing the Wisconsin Breast Cancer data set, the original data gave 98% accuracy in a logistic regression classification model. Synthetic data sets gave between 93% and 99% accuracy. Performance (best to worst) was SMOTE > PCA > GAN > CT-GAN = VAE. All methods produced a high accuracy in reproducing original data means and stabdard deviations (all R-square > 0.96 for all methods and data classes). CT-GAN and VAE suffered a significant loss of covariance between features in the synthetic data sets.Using the Stroke Pathway data set, the original data gave 82% accuracy in a logistic regression classification model. Synthetic data sets gave between 66% and 82% accuracy. Performance (best to worst) was SMOTE > PCA > CT-GAN > GAN > VAE. CT-GAN and VAE suffered loss of covariance between features in the synthetic data sets, though less pronounced than with the Wisconsin Breast Cancer data set.ConclusionsThe pilot work described here shows, as proof of concept, that synthetic data may be produced, which is of sufficient quality to publish with open methodology, to allow people to better understand and test methodology. The quality of the synthetic data also gives promise of data sets that may be used for screening of ideas, or for research project (perhaps especially in an education setting).More work is required to further refine and test methods across a broader range of patient-level data sets.

Download Full-text

A Comparative Analysis of Breast Cancer Data Set Using Different Classification Methods

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1921-1_17 ◽

2018 ◽

pp. 175-181 ◽

Cited By ~ 2

Author(s):

M. Navya Sri ◽

J. S. V. S. Hari Priyanka ◽

D. Sailaja ◽

M. Ramakrishna Murthy

Keyword(s):

Breast Cancer ◽

Comparative Analysis ◽

Classification Methods ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

Download Full-text

Detection of Breast Cancer Using Machine Learning Support Vector Machine Algorithm

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7747 ◽

2019 ◽

Vol 16 (2) ◽

pp. 441-444

Author(s):

D. V. Soundari ◽

R. Padmapriya ◽

C. Thirumariselvi ◽

N. Nanthini ◽

K. Priyadharsini

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Support Vector ◽

Learning Support ◽

Support Vector Machine Algorithm ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data ◽

Hormone Imbalance

A woman majorly suffers due to breast cancer which is due to hormone imbalance. It leads to huge death in recent years. Early detection of the breast cancer is more important to prevent human lives. Image Processing plays an important to classify and detect the same. So this paper proposes machine learning based cancer classification using support vector machine with Wisconsin breast cancer data set.

Download Full-text

Survival Analysis for a Breast Cancer Data Set

Advances in Breast Cancer Research ◽

10.4236/abcr.2017.61001 ◽

2017 ◽

Vol 06 (01) ◽

pp. 1-15

Author(s):

Hong Li

Keyword(s):

Breast Cancer ◽

Survival Analysis ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

Download Full-text

TP8.1.7 The utility of a timeline and episode structured breast cancer data system to study outcomes following neoadjuvant chemotherapy for cases stratified by HER2 status

British Journal of Surgery ◽

10.1093/bjs/znab362.069 ◽

2021 ◽

Vol 108 (Supplement_7) ◽

Author(s):

Fatima Rahman ◽

Ellen Copson ◽

Alan Hales ◽

David Rew

Keyword(s):

Breast Cancer ◽

Neoadjuvant Chemotherapy ◽

Disease Progression ◽

Data System ◽

Record Management ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data ◽

Wide Range ◽

Her 2

Abstract Background Breast neoplasia displays complex patterns of whole-of-life disease progression, which are difficult to study using legacy data systems. Our timeline- and episode-structured breast cancer data set of 20,000 records allows direct visualisation of the entire documentary record of every patient. The embedded data mining module permits research into a wide range of patient cohorts by pathology, treatment and outcome. Methods We selected the cohort of patients aged between 15 and 75 with HER-2 –ve and HER-2 +ve breast cancer who were treated with neoadjuvant chemotherapy (NAC), with or without anti-HER2 therapy between 2002 and 2019. We also studied the patterns and time intervals (in months) of disease progression and response to treatment from primary diagnosis, through loco-regional recurrence and distant metastasis to final outcome. Results Of 301 women with confirmed early stage breast cancer were treated with NAC over that time, 186 had HER2- and 115 had HER2+ tumours. The patterns and intervals of disease progression, as displayed on the Master Lifetrack, were mapped and measured for every patient. The proportions of patients with Her2+ve tumours receiving trastuzumab and analogues, and the tumour responses to treatment, were audited. The underlying data set was validated by review of the original records. Conclusions The whole-of-life timeline structured cancer data system introduces a new direction for clinical data visualisation, record management and user utility in surgical practice. This study validates the model as a tool for the better understanding of treatment effects and longitudinal behaviours in any selected range of cancer phenotypes.

Download Full-text

Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction

Journal of Personalized Medicine ◽

10.3390/jpm11100978 ◽

2021 ◽

Vol 11 (10) ◽

pp. 978

Author(s):

Siti Fairuz Mat Radzi ◽

Muhammad Khalis Abdul Karim ◽

M Iqbal Saripan ◽

Mohd Amiruddin Abdul Rahman ◽

Iza Nurzawani Che Isa ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Model Selection ◽

Principal Component ◽

Receiver Operating Curve ◽

Support Vector ◽

Grid Search ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data

Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis.

Download Full-text