scholarly journals A new generalization of Weibull distribution with application to a breast cancer data set

2009 ◽  
Vol 28 (16) ◽  
pp. 2077-2094 ◽  
Author(s):  
Abdus S. Wahed ◽  
The Minh Luong ◽  
Jong-Hyeon Jeong
2018 ◽  
Vol 7 (4.20) ◽  
pp. 22 ◽  
Author(s):  
Jabeen Sultana ◽  
Abdul Khader Jilani ◽  
. .

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree.  10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.  


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Habib Shah

PurposeBreast cancer is an important medical disorder, which is not a single disease but a cluster more than 200 different serious medical complications.Design/methodology/approachThe new artificial bee colony (ABC) implementation has been applied to probabilistic neural network (PNN) for training and testing purpose to classify the breast cancer data set.FindingsThe new ABC algorithm along with PNN has been successfully applied to breast cancers data set for prediction purpose with minimum iteration consuming.Originality/valueThe new implementation of ABC along PNN can be easily applied to times series problems for accurate prediction or classification.


2014 ◽  
Vol 32 (15_suppl) ◽  
pp. 11040-11040
Author(s):  
Paul K. Newton ◽  
Jorge J. Nieva ◽  
Peter Kuhn ◽  
Larry Norton ◽  
Elizabeth Anne Comen ◽  
...  

2020 ◽  
Author(s):  
Michael Allen ◽  
Andrew Salmon

ABSTRACTBackgroundOpen science is a movement seeking to make scientific research accessible to all, including publication of code and data. Publishing patient-level data may, however, compromise the confidentiality of that data if there is any significant risk that data may later be associated with individuals. Use of synthetic data offers the potential to be able to release data that may be used to evaluate methods or perform preliminary research without risk to patient confidentiality.MethodsWe have tested five synthetic data methods:A technique based on Principal Component Analysis (PCA) which samples data from distributions derived from the transformed data.Synthetic Minority Oversampling Technique, SMOTE which is based on interpolation between near neighbours.Generative Adversarial Network, GAN, an artificial neural network approach with competing networks - a discriminator network trained to distinguish between synthetic and real data., and a generator network trained to produce data that can fool the discriminator network.CT-GAN, a refinement of GANs specifically for the production of structured tabular synthetic data.Variational Auto Encoders, VAE, a method of encoding data in a reduced number of dimensions, and sampling from distributions based on the encoded dimensions.Two data sets are used to evaluate the methods:The Wisconsin Breast Cancer data set, a histology data set where all features are continuous variables.A stroke thrombolysis pathway data set, a data set describing characteristics for patients where a decision is made whether to treat with clot-busting medication. Features are mostly categorical, binary, or integers.Methods are evaluated in three ways:The ability of synthetic data to train a logistic regression classification model.A comparison of means and standard deviations between original and synthetic data.A comparison of covariance between features in the original and synthetic data.ResultsUsing the Wisconsin Breast Cancer data set, the original data gave 98% accuracy in a logistic regression classification model. Synthetic data sets gave between 93% and 99% accuracy. Performance (best to worst) was SMOTE > PCA > GAN > CT-GAN = VAE. All methods produced a high accuracy in reproducing original data means and stabdard deviations (all R-square > 0.96 for all methods and data classes). CT-GAN and VAE suffered a significant loss of covariance between features in the synthetic data sets.Using the Stroke Pathway data set, the original data gave 82% accuracy in a logistic regression classification model. Synthetic data sets gave between 66% and 82% accuracy. Performance (best to worst) was SMOTE > PCA > CT-GAN > GAN > VAE. CT-GAN and VAE suffered loss of covariance between features in the synthetic data sets, though less pronounced than with the Wisconsin Breast Cancer data set.ConclusionsThe pilot work described here shows, as proof of concept, that synthetic data may be produced, which is of sufficient quality to publish with open methodology, to allow people to better understand and test methodology. The quality of the synthetic data also gives promise of data sets that may be used for screening of ideas, or for research project (perhaps especially in an education setting).More work is required to further refine and test methods across a broader range of patient-level data sets.


2019 ◽  
Vol 16 (2) ◽  
pp. 441-444
Author(s):  
D. V. Soundari ◽  
R. Padmapriya ◽  
C. Thirumariselvi ◽  
N. Nanthini ◽  
K. Priyadharsini

A woman majorly suffers due to breast cancer which is due to hormone imbalance. It leads to huge death in recent years. Early detection of the breast cancer is more important to prevent human lives. Image Processing plays an important to classify and detect the same. So this paper proposes machine learning based cancer classification using support vector machine with Wisconsin breast cancer data set.


2021 ◽  
Vol 108 (Supplement_7) ◽  
Author(s):  
Fatima Rahman ◽  
Ellen Copson ◽  
Alan Hales ◽  
David Rew

Abstract Background Breast neoplasia displays complex patterns of whole-of-life disease progression, which are difficult to study using legacy data systems. Our timeline- and episode-structured breast cancer data set of 20,000 records allows direct visualisation of the entire documentary record of every patient. The embedded data mining module permits research into a wide range of patient cohorts by pathology, treatment and outcome. Methods We selected the cohort of patients aged between 15 and 75 with HER-2 –ve and HER-2 +ve breast cancer who were treated with neoadjuvant chemotherapy (NAC), with or without anti-HER2 therapy between 2002 and 2019. We also studied the patterns and time intervals (in months) of disease progression and response to treatment from primary diagnosis, through loco-regional recurrence and distant metastasis to final outcome. Results Of 301 women with confirmed early stage breast cancer were treated with NAC over that time, 186 had HER2- and 115 had HER2+ tumours. The patterns and intervals of disease progression, as displayed on the Master Lifetrack, were mapped and measured for every patient. The proportions of patients with Her2+ve tumours receiving trastuzumab and analogues, and the tumour responses to treatment, were audited. The underlying data set was validated by review of the original records. Conclusions The whole-of-life timeline structured cancer data system introduces a new direction for clinical data visualisation, record management and user utility in surgical practice. This study validates the model as a tool for the better understanding of treatment effects and longitudinal behaviours in any selected range of cancer phenotypes.


2021 ◽  
Vol 11 (10) ◽  
pp. 978
Author(s):  
Siti Fairuz Mat Radzi ◽  
Muhammad Khalis Abdul Karim ◽  
M Iqbal Saripan ◽  
Mohd Amiruddin Abdul Rahman ◽  
Iza Nurzawani Che Isa ◽  
...  

Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis.


Sign in / Sign up

Export Citation Format

Share Document