Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.

Download Full-text

Ensemble Comparative Study for Diagnosis of Breast Cancer Datasets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.15.23007 ◽

2018 ◽

Vol 7 (4.15) ◽

pp. 281

Author(s):

Bibhuprasad Sahu ◽

Sujata Dash ◽

Sachi Nandan Mohanty ◽

Saroj Kumar Rout

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Early Stage ◽

Breast Cancer Dataset ◽

Classification Rate ◽

Cancer Dataset ◽

Cancer Data ◽

Cad System ◽

Result Analysis ◽

Sensitivity Specificity

Every disease is curable if a little amount of human effort is applied for early diagnosis. The death rate in world increases day by day as patient fail to detect it before it becomes chronic. Breast cancer is curable if detection is done at early stage before it spread across all part of body. Now-a-days computer aided diagnosis are automated assistance for the doctors to produce accurate prediction about the stage of disease. This study provided CAD system for diagnosis of breast cancer. This method uses Neural Network (NN) as a classifier model and PCA/LDA for dimension reduction method to attain higher classification rate. Multiple layers of neural network are applied to classify the breast cancer data. This system experiment done on Wisconsin breast cancer dataset (WBCD) from UCI repository. The dataset is divided into 2 parts train and test. With the result of accuracy, sensitivity, specificity, precision and recall the performance can be measured. The results obtained are this study is 97% using ANN and PCA-ANN, which is better than other state-of-art methods. As per the result analysis this system outperformed then the existing system.

Download Full-text

Image Descriptors for Weakly Annotated Histopathological Breast Cancer Data

Frontiers in Digital Health ◽

10.3389/fdgth.2020.572671 ◽

2020 ◽

Vol 2 ◽

Author(s):

Panagiotis Stanitsas ◽

Anoop Cherian ◽

Vassilios Morellas ◽

Resha Tejpaul ◽

Nikolaos Papanikolopoulos ◽

...

Keyword(s):

Breast Cancer ◽

Computer Vision ◽

State Of The Art ◽

Area Under The Curve ◽

Multiple Instance Learning ◽

Superior Performance ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Cancer Data ◽

Learning Framework

Introduction: Cancerous Tissue Recognition (CTR) methodologies are continuously integrating advancements at the forefront of machine learning and computer vision, providing a variety of inference schemes for histopathological data. Histopathological data, in most cases, come in the form of high-resolution images, and thus methodologies operating at the patch level are more computationally attractive. Such methodologies capitalize on pixel level annotations (tissue delineations) from expert pathologists, which are then used to derive labels at the patch level. In this work, we envision a digital connected health system that augments the capabilities of the clinicians by providing powerful feature descriptors that may describe malignant regions.Material and Methods: We start with a patch level descriptor, termed Covariance-Kernel Descriptor (CKD), capable of compactly describing tissue architectures associated with carcinomas. To leverage the recognition capability of the CKDs to larger slide regions, we resort to a multiple instance learning framework. In that direction, we derive the Weakly Annotated Image Descriptor (WAID) as the parameters of classifier decision boundaries in a Multiple Instance Learning framework. The WAID is computed on bags of patches corresponding to larger image regions for which binary labels (malignant vs. benign) are provided, thus obviating the necessity for tissue delineations.Results: The CKD was seen to outperform all the considered descriptors, reaching classification accuracy (ACC) of 92.83%. and area under the curve (AUC) of 0.98. The CKD captures higher order correlations between features and was shown to achieve superior performance against a large collection of computer vision features on a private breast cancer dataset. The WAID outperform all other descriptors on the Breast Cancer Histopathological database (BreakHis) where correctly classified malignant (CCM) instances reached 91.27 and 92.00% at the patient and image level, respectively, without resorting to a deep learning scheme achieves state-of-the-art performance.Discussion: Our proposed derivation of the CKD and WAID can help medical experts accomplish their work accurately and faster than the current state-of-the-art.

Download Full-text

Toward a direct and scalable identification of reduced models for categorical processes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1612619114 ◽

2017 ◽

Vol 114 (19) ◽

pp. 4863-4868 ◽

Cited By ~ 8

Author(s):

Susanne Gerber ◽

Illia Horenko

Keyword(s):

Breast Cancer ◽

A Priori ◽

Dynamical Model ◽

Breast Cancer Dataset ◽

Collective Variables ◽

Cancer Dataset ◽

Reduced Models ◽

Tuning Parameters ◽

Cancer Data ◽

Machine Learning Applications

The applicability of many computational approaches is dwelling on the identification of reduced models defined on a small set of collective variables (colvars). A methodology for scalable probability-preserving identification of reduced models and colvars directly from the data is derived—not relying on the availability of the full relation matrices at any stage of the resulting algorithm, allowing for a robust quantification of reduced model uncertainty and allowing us to impose a priori available physical information. We show two applications of the methodology: (i) to obtain a reduced dynamical model for a polypeptide dynamics in water and (ii) to identify diagnostic rules from a standard breast cancer dataset. For the first example, we show that the obtained reduced dynamical model can reproduce the full statistics of spatial molecular configurations—opening possibilities for a robust dimension and model reduction in molecular dynamics. For the breast cancer data, this methodology identifies a very simple diagnostics rule—free of any tuning parameters and exhibiting the same performance quality as the state of the art machine-learning applications with multiple tuning parameters reported for this problem.

Download Full-text

Proposal of new hybrid fuzzy clustering algorithms — Application to breast cancer dataset

2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI) ◽

10.1109/la-cci.2017.8285679 ◽

2017 ◽

Author(s):

Pedro Henrique S. Coutinho ◽

Thiago P. das Chagas

Keyword(s):

Breast Cancer ◽

Fuzzy Clustering ◽

Clustering Algorithms ◽

Breast Cancer Dataset ◽

Cancer Dataset

Download Full-text

Breast Cancer Diagnosis using Sequential Pattern Mining

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9171.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 5515-5519

Keyword(s):

Breast Cancer ◽

Pattern Mining ◽

Age Groups ◽

Experimental Studies ◽

Breast Cancer Diagnosis ◽

Sequential Pattern ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Cancer Data ◽

Abnormal Growth

Cancer is a disease very common to both rural and urban peoples. It is the abnormal growth of some body cells which then destroy the normal functioning of surrounding cells. Cancer has different stages and can be cured easily if diagnosed earlier. Breast Cancer is widespread among the women of different age groups which results untimely death of so many ladies. As the cause of every cancer occurs before its actual appearance in the human body, sequential pattern from cancer datasets can be useful for determining the cause of Breast Cancer before its actual occurrence in the women body. In this article, we put forward a technique for digging out such patterns from Breast Cancer data. The effectiveness of the proposed technique is demonstrated by the experimental studies made with a real Breast Cancer dataset

Download Full-text

Diagnosis for Early Stage of Breast Cancer using Outlier Detection Algorithm Combined with Classification Technique

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4514.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3422-3426

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Outlier Detection ◽

Clustering Algorithm ◽

Early Stage ◽

Developed Countries ◽

Breast Cancer Dataset ◽

Body Parts ◽

Cancer Dataset ◽

Comparison Results

Breast cancer is the most dangerous cancers that lead to women in death. Particularly in the developed countries it takes second leading place that increase the chance of death in women. It can be not easily diagnosed by the lab. It has difficult to identifying at the beginning stage. This cancer begins from breast and disseminate to other body parts. It has cured easily if it is identified at beginning stage. The correct classification of benign cancer can prevent from superfluous treatment for patients. This paper focused on diagnosis early stage of the breast cancer based on data mining algorithms. The automatic diagnosis process plays on important role in data mining. The proposed method has a process of three stages. First, data objects are grouped into clusters using k-means clustering algorithm. Size of the dataset has to shrink gently the computation time also reduced. The second stage, the outlier detection (OD) algorithm has used to detect the outliers from the cancer dataset. Finally, diagnose the cancer is either benign or malignant using decision tree classification algorithm. The breast cancer dataset has been used to test the efficiency of the proposed method. The experiments were conducted in breast cancer dataset before and after removal of outliers. Comparison results prove that the proposed method as serves as the better one with high accuracy. This breast cancer research will help with a medical practitioner to diagnose the breast cancer and so that it helps to recover the patients.

Download Full-text

A Novel Breast Cancer Detection Technology Using an Advanced Transfer Maximal Entropy Clustering Algorithm

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2019.2775 ◽

2019 ◽

Vol 9 (8) ◽

pp. 1639-1644

Author(s):

Lifang Peng ◽

Bin Huang ◽

Kefu Chen ◽

Leyuan Zhou

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

Clustering Algorithm ◽

Small Sample Size ◽

Clustering Algorithms ◽

Small Sample ◽

Maximal Entropy ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Detection Technology

The initial diagnosis of breast cancer involves analyzing the relevant examination report of the patient to determine whether the tumor is benign or malignant. Unsupervised clustering algorithms can be used with this type of problem. In a cluster analysis of a patient's examination data, the clustering results and the preliminary diagnosis results are obtained. However, due to the high cost of detection, medical datasets often have a small sample size or lack information. The traditional clustering technique usually has poor clustering effects in such scenarios. To solve this problem, this paper proposes an advanced transfer learning mechanism based on the classic maximum entropy clustering algorithm and proposes an advanced transfer maximal entropy clustering (AT-MEC) algorithm. A simulation experiment using the Wisconsin Breast Cancer Dataset is performed. This paper verifies that the proposed AT-MEC algorithm has a better clustering effect than other clustering algorithms in the Wisconsin Breast Cancer Dataset.

Download Full-text

The improved fuzzy clustering algorithm based on AFS theory and its applications to Wisconsin breast cancer data

2010 International Conference on Intelligent Control and Information Processing ◽

10.1109/icicip.2010.5564290 ◽

2010 ◽

Cited By ~ 1

Author(s):

Xianchang Wang ◽

Xiaodong Liu ◽

Lishi Zhang

Keyword(s):

Breast Cancer ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Breast Cancer Data ◽

Cancer Data ◽

Fuzzy Clustering Algorithm ◽

Afs Theory

Download Full-text

Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

Prediction of benign and malignant breast cancer using data mining techniques

Journal of Algorithms & Computational Technology ◽

10.1177/1748301818756225 ◽

2018 ◽

Vol 12 (2) ◽

pp. 119-126 ◽

Cited By ~ 43

Author(s):

Vikas Chaurasia ◽

Saurabh Pal ◽

BB Tiwari

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Low Income ◽

Prediction Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Low Income Countries ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Rbf Network

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.

Download Full-text