Classification performance of data mining algorithms applied to breast cancer data

2021 ◽  
Vol 10 (1) ◽  
pp. 60
Author(s):  
Mahsa Dehghani Soufi ◽  
Reza Ferdousi

Introduction: Growing evidence has shown that some overweight factors could be implicated in tumor genesis, higher recurrence and mortality. In addition, association of various overweight factors and breast cancer has not been extensively explored. The goal of this research was to explore and evaluate the association of various overweight/obesity factors and breast cancer, based on obesity breast cancer data set.Material and Methods: Several studies show that a significantly stronger association is obvious between overweight and higher breast cancer incidence, but the role of some overweight factors such as BMI, insulin-resistance, Homeostasis Model Assessment (HOMA), Leptin, adiponectin, glucose and MCP.1 is still debatable, So for experiment of research work several clinical and biochemical overweight factors, including age, Body Mass Index (BMI), Glucose, Insulin, Homeostatic Model Assessment (HOMA), Leptin, Adiponectin, Resistin and Monocyte chemo attractant protein-1(MCP-1) were analyzed. Data mining algorithms including k-means, Apriori, Hierarchical clustering algorithm (HCM) were applied using orange version 3.22 as an open source data mining tool.Results: The Apriori algorithm generated a list of frequent item sets and some strong rules from dataset and found that insulin, HOMA and leptin are two items often simultaneously were seen for BC patients that leads to cancer progression. K-means algorithm applied and it divided samples on three clusters and its results showed that the pair of andlt;Adiponectin, MCP.1andgt;  has the highest effect on seperation of clusters. In addition HCM was carried out and classified BC patients into 1-32 clusters to So this research apply HCM algorithm. We carried out hierarchical clustering with average linkage without purning and classified BC patients into 1–32 clusters in order to identify BC patients with similar charestrictics.Conclusion: These finding provide the employed algorithms in this study can be helpful to our aim.


passer ◽  
2019 ◽  
Vol 3 (1) ◽  
pp. 174-179
Author(s):  
Noor Bahjat ◽  
Snwr Jamak

Cancer is a common disease that threats the life of one of every three people. This dangerous disease urgently requires early detection and diagnosis. The recent progress in data mining methods, such as classification, has proven the need for machine learning algorithms to apply to large datasets. This paper mainly aims to utilise data mining techniques to classify cancer data sets into blood cancer and non-blood cancer based on pre-defined information and post-defined information obtained after blood tests and CT scan tests. This research conducted using the WEKA data mining tool with 10-fold cross-validation to evaluate and compare different classification algorithms, extract meaningful information from the dataset and accurately identify the most suitable and predictive model. This paper depicted that the most suitable classifier with the best ability to predict the cancerous dataset is Multilayer perceptron with an accuracy of 99.3967%.


Author(s):  
Imane Chakour ◽  
Yousef El Mourabit ◽  
Mohamed Baslam

Recently, data mining and intelligent agents have emerged as two domains with tremendous potential for research. The capacity of agents to learn from their experience complements the data mining process. This chapter aims to study a multi-agent system that evaluates the performance of three well-known data mining algorithms—artificial neural network (ANN), support vector machines (SVM), and logistic regression or logit model (LR)—based on breast cancer data (WBCD). Then the system aggregates the classifications of these algorithms with a controller agent to increase the accuracy of the classification using a majority vote. Extensive studies are performed to evaluate the performance of these algorithms using various differential performance metrics such as classification rate, sensitivity, and specificity using different software modules. In the end, the authors see that this system gives more autonomy and initiative in the medical diagnosis and the agent can dialogue to share their knowledge.


2007 ◽  
Vol 46 (01) ◽  
pp. 05-18 ◽  
Author(s):  
Y. Lee ◽  
K. Dharmala ◽  
C.H. Lee

Summary Objectives: A number of controversial studies have been reported on the potential risk of breast cancer caused by hormone replacement therapy (HRT). Some studies showed a positive relationship between HRT and breast cancer onset, but other studies have not confirmed these results. To clarify the contradictory outcomes in the relationships between HRT and the onset of breast cancer, we have designed an intelligent data mining model (IDM), which is able to find proper prognostic factors for cancer onset and provides alternate measures in interpretation of outcome of clinical data through hierarchies of attributes. Methods: Based on the selection criteria, we selected 22 sets of random and case-control studies of the last 15 years, which identified any involvements of HRT with breast cancer. We analyzed the relationship between HRT and breast cancer using an IDM model consisting of data mining algorithms and public domain data mining tools. Prognostic factors which underline the major etiological dispositions of breast cancer were identified. Results: The variables which are closely associated with cancer onset to some degree are age 60-69, age at menopause 40-49, parity 0, age 40-49, and types of menopause oophorectomy. An implementation of IDM model on overall pooled data indicated that there is no significant relationship between breast cancer onset and HRT. It is suggested that HRT patients with specific physiological and pathological conditions related with the higher ranks of prognostic factors may have a greater chance to get breast cancer. Conclusion: The results of this study may guide biomedical research directed at establishing the causal relationships between various medications and their complications, allowing an accurate assessment of efficacy and side effects of new therapeutic treatment in clinical trials without reliance on a large control population.


Author(s):  
Kyriacos Chrysostomou

It is well known that the performance of most data mining algorithms can be deteriorated by features that do not add any value to learning tasks. Feature selection can be used to limit the effects of such features by seeking only the relevant subset from the original features (de Souza et al., 2006). This subset of the relevant features is discovered by removing those that are considered as irrelevant or redundant. By reducing the number of features in this way, the time taken to perform classification is significantly reduced; the reduced dataset is easier to handle as fewer training instances are needed (because fewer features are present), subsequently resulting in simpler classifiers which are often more accurate. Due to the abovementioned benefits, feature selection has been widely applied to reduce the number of features in many data mining applications where data have hundreds or even thousands of features. A large number of approaches exist for performing feature selection including filters (Kira & Rendell, 1992), wrappers (Kohavi & John, 1997), and embedded methods (Quinlan, 1993). Among these approaches, the wrapper appears to be the most popularly used approach. Wrappers have proven popular in many research areas, including Bioinformatics (Ni & Liu, 2004), image classification (Puig & Garcia, 2006) and web page classification (Piramuthu, 2003). One of the reasons for the popularity of wrappers is that they make use of a classifier to help in the selection of the most relevant feature subset (John et al., 1994). On the other hand, the remaining methods, especially filters, evaluate the merit of a feature subset based on the characteristics of the data and statistical measures, e.g., chi-square, rather than the classifiers intended for use (Huang et al., 2007). Discarding the classifier when performing feature selection can subsequently result in poor classification performance. This is because the relevant feature subset will not reflect the classifier’s specific characteristics. In this way, the resulting subset may not contain those features that are most relevant to the classifier and learning task. The wrapper is therefore superior to other feature selection methods like filters since it finds feature subsets that are more suited to the data mining problem.


Early detection and diagnosis of breast cancer plays a significant role in the welfare of women. The mortality rate due to breast cancer is on an all-time high. Factors such as food habits, environmental pollution, hectic lifestyle and genetics are commonly attributed to breast cancer. In order to detect and diagnose such types of cancer, intelligent systems are implemented. Automated diagnosis gets impacted by prediction accuracy when compared with surgical biopsy. Bioinformatics mining has emerged as the area of research that involves analyzing both data mining and Bioinformatics. In order to statistically find significant associations on a breast cancer data set, the result is conceivable. Using a larger data set results in discovering the correlations between a bigger set of gene. The algorithm has to be improved to perceive the interactions with low marginal. This research field affords most intelligent and reliable data mining models in breast cancer prediction and decision making. This survey reviews various data mining algorithms on large breast cancer biological datasets. The merits and demerits of various procedures and comparison of their corresponding results are presented in this work.


Sign in / Sign up

Export Citation Format

Share Document