scholarly journals Predicting Breast Cancer using Modern Data Science Methodology

Breast Cancer is the mass occurring cancer in women according to the World Health Organization(WHO), But the early prediction of breast cancer helps in the recovery for the effected one's. Reasons for breast cancer were Hormone replacement therapy or getting explore to harmful radioactive rays and due to late childbearing. The aim is to diagnose cancer by using a machine learning technique, Random Forest, for accurate solutions. The dataset we used is the Wisconsin Breast Cancer dataset. The output which the error rate was only about "0.0177"

2019 ◽  
Vol 8 (4) ◽  
pp. 4879-4881

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.


2020 ◽  
Vol 17 (6) ◽  
pp. 2519-2522
Author(s):  
Kalpna Guleria ◽  
Avinash Sharma ◽  
Umesh Kumar Lilhore ◽  
Devendra Prasad

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.


Author(s):  
Alfred O. Mueck ◽  
Harald Seeger ◽  
Samuel Shapiro

AbstractRegarding estrogen replacement therapy, two main mechanisms have to be considered for it to be discussed as a potential carcinogen in the breast, and also considering the World Health Organization definition of estrogens and estrogen/progestogen combinations as “carcinogenic”: (i) the proliferative/apoptotic effects on already pre-existing estrogen-sensitive cancer cells and (ii) the production of possible genotoxic estrogen metabolites. By addition of the progestogen component, as is usual in non-hysterectomized women, both mechanisms can lead to an increased risk compared to estrogenonly therapy. The detailed mechanisms underlying the development of the benign breast epithelial cell into clinically relevant breast cancer cells are very complicated. Based on these mechanisms, the following simplified summary of the main steps explains that: (i) an increased risk cannot be excluded, (ii) especially when estrogens are combined with progestogens, but (iii) there are differences between the preparations used in therapy; (iv) the risk seems to be very rare, needing very special cellular and extracellular conditions, (v) and could even be decreased in special situations of estrogen therapy. It is concluded that when critically reviewed, an increased risk of breast cancer during hormone replacement therapy cannot be excluded in very rare cases. Definitive mechanistic evidence for a possible causal relationship with carcinogenesis still remains open.


2018 ◽  
Vol 12 (2) ◽  
pp. 119-126 ◽  
Author(s):  
Vikas Chaurasia ◽  
Saurabh Pal ◽  
BB Tiwari

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.


Author(s):  
P. Hamsagayathri ◽  
P. Sampath

Breast cancer is one of the dangerous cancers among world’s women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in the United States. Also, 246,660 new cases of women with cancer are estimated for the year 2016. Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification plays an important role in breast cancer detection and used by researchers to analyse and classify the medical data. In this research work, priority-based decision tree classifier algorithm has been implemented for Wisconsin Breast cancer dataset. This paper analyzes the different decision tree classifier algorithms for Wisconsin original, diagnostic and prognostic dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.


2018 ◽  
Vol 9 (2) ◽  
pp. 165-170
Author(s):  
Husni Husni

One of the effects is not maintaining hygiene during menstruation is able to hitkankes Rahim neck (cervical). Based on data from the World Health Organization (WHO),cervical cancer is the second most cancer in women aged 15-45 years after breast cancer. Noless than 500,000 new cases with 280,000 patient deaths occur each year worldwide.Indonesia was ranked first by the victims died at least 555 women per day and 200,000women annually. This study aims to determine the correlation between knowledge andattitude towards personal hygiene during menstruation action at SMAN 2 Bengkulu City. Thisresearch is descriptive analytic. The number of respondents 84 people with a samplingtechnique that stratified random sampling. Presentation of data is done by using a frequencydistribution table. The collection of data taken using a questionnaire. The data were analyzedusing univariate and bivariate analysis with Chi-Square.The results showed that therespondents are classified as good knowledge of (54.8%), attitude unfavorabel or does notsupport (53.6%), and the biggest acts (52.4%) is good. From the bivariate analysis were foundno correlation between knowledge against acts of personal hygiene during menstruation (p =0.794), and no relation attitude towards personal hygiene actions during menstruation (p =0.975).Required role of schools, educators, parents to be more proactive in enhancingknowledge and useful information about the process of menstruation and how to maintainhygiene during menstruation.


2020 ◽  
Vol 14 ◽  

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.


2021 ◽  
Author(s):  
Rahibu A. Abassi ◽  
Amina S. Msengwa ◽  
Rocky R. J. Akarro

Abstract Background Clinical data are at risk of having missing or incomplete values for several reasons including patients’ failure to attend clinical measurements, wrong interpretations of measurements, and measurement recorder’s defects. Missing data can significantly affect the analysis and results might be doubtful due to bias caused by omission of missed observation during statistical analysis especially if a dataset is considerably small. The objective of this study is to compare several imputation methods in terms of efficiency in filling-in the missing data so as to increase the prediction and classification accuracy in breast cancer dataset. Methods Five imputation methods namely series mean, k-nearest neighbour, hot deck, predictive mean matching, and multiple imputations were applied to replace the missing values to the real breast cancer dataset. The efficiency of imputation methods was compared by using the Root Mean Square Errors and Mean Absolute Errors to obtain a suitable complete dataset. Binary logistic regression and linear discrimination classifiers were applied to the imputed dataset to compare their efficacy on classification and discrimination. Results The evaluation of imputation methods revealed that the predictive mean matching method was better off compared to other imputation methods. In addition, the binary logistic regression and linear discriminant analyses yield almost similar values on overall classification rates, sensitivity and specificity. Conclusion The predictive mean matching imputation showed higher accuracy in estimating and replacing missing/incomplete data values in a real breast cancer dataset under the study. It is a more effective and good method to handle missing data in this scenario. We recommend to replace missing data by using predictive mean matching since it is a plausible approach toward multiple imputations for numerical variables, as it improves estimation and prediction accuracy over the use complete-case analysis especially when percentage of missing data is not very small.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pooja Rani ◽  
Rajneesh Kumar ◽  
Anurag Jain

PurposeDecision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.Design/methodology/approachThe proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.FindingsThe results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.Originality/valueThe proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.


Author(s):  
Yagya Buttan ◽  
Alka Chaudhary ◽  
Komal Saxena ◽  
Samriddh Kohli ◽  
Ajay Rana

Sign in / Sign up

Export Citation Format

Share Document