scholarly journals Hasil Analisis Teknik Data Mining dengan Metode Naive Bayes untuk Mendiagnosa Penyakit Kanker Payudara

2020 ◽  
Vol 1 (2) ◽  
pp. 130
Author(s):  
Elma Tiana ◽  
Sri Wahyuni

Breast cancer or Mammae Carsinoma is an uncontrolled cell growth in the milk-producing glands (lobular), the gland tract from the lobular to the Breast nipple (ductus), and the breast support tissues that surround the lobular, ductus, vessels Blood and limfe vessels, but does not include breast skin. Research begins by conducting a preprocessing stage, to eliminate missing values. After that the process is imputasi to remove missing values. It then performed a feature selection to see which attribute had a major impact on the data. The last stage is classification with two methods, namely Naïve Bayes. At the end of the study, the method is best to classify the recurrence data of breast cancer patients.

2014 ◽  
Vol 111 ◽  
pp. S59
Author(s):  
S. Tortajada ◽  
J.L. Lopez Guerra ◽  
D. Palacios ◽  
A. Pérez-González ◽  
J.M. García-Gómez ◽  
...  

2018 ◽  
Vol 12 (2) ◽  
pp. 119-126 ◽  
Author(s):  
Vikas Chaurasia ◽  
Saurabh Pal ◽  
BB Tiwari

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


2021 ◽  
Vol 11 ◽  
Author(s):  
Hongwei Yu ◽  
Xianqi Meng ◽  
Huang Chen ◽  
Jian Liu ◽  
Wenwen Gao ◽  
...  

ObjectivesThis study aimed to investigate whether radiomics classifiers from mammography can help predict tumor-infiltrating lymphocyte (TIL) levels in breast cancer.MethodsData from 121 consecutive patients with pathologically-proven breast cancer who underwent preoperative mammography from February 2018 to May 2019 were retrospectively analyzed. Patients were randomly divided into a training dataset (n = 85) and a validation dataset (n = 36). A total of 612 quantitative radiomics features were extracted from mammograms using the Pyradiomics software. Radiomics feature selection and radiomics classifier were generated through recursive feature elimination and logistic regression analysis model. The relationship between radiomics features and TIL levels in breast cancer patients was explored. The predictive capacity of the radiomics classifiers for the TIL levels was investigated through receiver operating characteristic curves in the training and validation groups. A radiomics score (Rad score) was generated using a logistic regression analysis method to compute the training and validation datasets, and combining the Mann–Whitney U test to evaluate the level of TILs in the low and high groups.ResultsAmong the 121 patients, 32 (26.44%) exhibited high TIL levels, and 89 (73.56%) showed low TIL levels. The ER negativity (p = 0.01) and the Ki-67 negative threshold level (p = 0.03) in the low TIL group was higher than that in the high TIL group. Through the radiomics feature selection, six top-class features [Wavelet GLDM low gray-level emphasis (mediolateral oblique, MLO), GLRLM short-run low gray-level emphasis (craniocaudal, CC), LBP2D GLRLM short-run high gray-level emphasis (CC), LBP2D GLDM dependence entropy (MLO), wavelet interquartile range (MLO), and LBP2D median (MLO)] were selected to constitute the radiomics classifiers. The radiomics classifier had an excellent predictive performance for TIL levels both in the training and validation sets [area under the curve (AUC): 0.83, 95% confidence interval (CI), 0.738–0.917, with positive predictive value (PPV) of 0.913; AUC: 0.79, 95% CI, 0.615–0.964, with PPV of 0.889, respectively]. Moreover, the Rad score in the training dataset was higher than that in the validation dataset (p = 0.007 and p = 0.001, respectively).ConclusionRadiomics from digital mammograms not only predicts the TIL levels in breast cancer patients, but can also serve as non-invasive biomarkers in precision medicine, allowing for the development of treatment plans.


2020 ◽  
Vol 7 (1) ◽  
pp. 53
Author(s):  
Derisma Derisma ◽  
Fajri Febrian

Abstrak: Kanker payudara merupakan jenis kanker yang sering ditemukan oleh kebanyakan wanita. Di Indonesia Kanker payudara menempati urutan pertama pada pasien rawat inap di seluruh rumah sakit. Tujuan dari penelitian ini adalah melakukan diagnosis penyakit kanker payudara berbasis komputasi yang dapat menghasilkan bagaimana kondisi kanker seseorang berdasarkan akurasi algoritma. Penelitian ini menggunakan pemrograman orange python dan dataset Wisconsin Breast Cancer untuk pemodelan klasifikasi kanker payudara. Metode data mining yang diterapkan yaitu Neural Network, Support Vector Machine, dan Naive Bayes. Dalam penelitian ini didapat algoritma klasifikasi terbaik yaitu algoritma Kernel SVM dengan tingkat akurasi sebesar  98.9 % dan algoritma terendah yaitu Naive Bayes senilai 96.1 %.   Kata kunci: kanker payudara, neural network, support vector machine, naive bayes   Abstract: Breast cancer is a type of cancer that mostly found in many women. In Indonesia, breast cancer ranks first in hospitalized patients at every hospital. This study aimed to conduct a computation-based diagnose of breast cancer disease that could produce the state of cancer of an individual based on the accuracy of algorithm. This study used python orange programming and Wisconsin Breast Cancer dataset for a modeling and application of breast cancer classification. The data mining methods that were applied in this study were Neural Network, Support Vector Machine, dan Naive Bayes. In this study, Kernel SVM’s algorithm was the best classification algorithm of breast cancer disease with 98.9 % accuracy rate and Naïve Beyes was the lowest with 96.1 % of accuracy rate.   Keywords: breast cancer, neural network, support vector machine, naive bayes


Breast Cancer is the most often identified cancer among women and a major reason for the increased mortality rate among women. As the diagnosis of this disease manually takes long hours and the lesser availability of systems, there is a need to develop the automatic diagnosis system for early detection of cancer. The advanced engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. Data mining techniques contribute a lot to the development of such a system, Classification, and data mining methods are an effective way to classify data. For the classification of benign and malignant tumors, we have used classification techniques of machine learning in which the machine learns from the past data and can predict the category of new input. This study is a relative study on the implementation of models using Support Vector Machine (SVM), and Naïve Bayes on Breast cancer Wisconsin (Original) Data Set. With respect to the results of accuracy, precision, sensitivity, specificity, error rate, and f1 score, the efficiency of each algorithm is measured and compared. Our experiments have shown that SVM is the best for predictive analysis with an accuracy of 99.28% and naïve Bayes with an accuracy of 98.56%. It is inferred from this study that SVM is the well-suited algorithm for prediction.


2019 ◽  
Vol 63 (3) ◽  
pp. 435-447
Author(s):  
Mohsen Salehi ◽  
Jafar Razmara ◽  
Shahriar Lotfi

Abstract Breast cancer survivability has always been an important and challenging issue for researchers. Different methods have been utilized mostly based on machine learning techniques for prediction of survivability among cancer patients. The most comprehensive available database of cancer incidence is SEER in the United States, which has been frequently used for different research purposes. In this paper, a new data mining has been performed on the SEER database in order to investigate the ability of machine learning techniques for survivability prediction of breast cancer patients. To this end, the data related to breast cancer incidence have been preprocessed to remove unusable records from the dataset. In sequel, two machine learning techniques were developed based on the Multi-Layer Perceptron (MLP) learner machine including MLP stacked generalization and mixture of MLP-experts to make predictions over the database. The machines have been evaluated using K-fold cross-validation technique. The evaluation of the predictors revealed an accuracy of 84.32% and 83.86% by the mixture of MLP-experts and MLP stacked generalization methods, respectively. This indicates that the predictors can be significantly used for survivability prediction suggesting time- and cost-effective treatment for breast cancer patients.


2021 ◽  
Vol 104 (6) ◽  
pp. 902-910

Background: Detection of human papillomavirus (HPV) in breast cancer patients has suggested a possible contributing role of the virus in cancer progression in this population. Objective: To investigate the presence of HPVs in Thai breast cancer patients and examine the potential activities of HPVs identified in both breast and cervical cancer cells. Materials and Methods: Fifty-five breast cancer tissues from Thai patients were subjected to HPV detection using PCR-EIA and DNA sequencing. Detection of HPV E6 proteins in sample tissues was examined by fluorescence immunohistochemistry. Cervical and two types of breast cancer cell lines expressing HPV oncogenes were established. The separate and combination of HPV oncoproteins activity for p53 degradation and specific gene regulation were investigated using western blot analysis and qPCR. Cell proliferation was assessed by MTT assay. Results: Twenty-two percent (10/45) of invasive breast cancers were found infected with various high-risk HPV types, with HPV58 E6D4G/E7T20IG63S being the most common variant. The percentage of HPV58 alone was approximately 50% (5/10) of all HPV positive samples. Similar potential oncogenic activity for this variant was observed in breast and cervical cancer cells. A separate analysis of single or combination of 58E6 (prototype or E6D4G) with 58E7 (prototype or E7T20IG63S) demonstrated that co-expression of 58E7T20IG63S with 58E6 (either prototype or E6D4G) significantly promoted cell proliferation compared to prototype 58E6/E7. Enhanced proliferation was mediated through elevated p53 degradation and reduced p21 expression. While p53 degradation activity was greatly diminished from E6 with D4G mutation, co-expression with E7T20IG63S cooperated to enhance degradation of p53 and promoted cell growth. Conclusion: HPV58 E6D4G/E7T20IG63S was the most HPV oncogene variant detected in Thai breast cancer patients. This variant exhibited in promoting cell proliferation and p53 degradation. A cooperative effect was observed in combination of HPV oncoproteins. Keywords: Human papillomavirus type 58; oncogene variant; breast cancer; Thai patients; altered cell growth


Sign in / Sign up

Export Citation Format

Share Document