Building a Machine Learning Model on Breast Cancer Data with Focus on Cross Validation and Accuracy

Author(s):  
Sagar Rai ◽  
Aditya Anand ◽  
Kunal Singh
Author(s):  
Yuhong Huang ◽  
Wenben Chen ◽  
Xiaoling Zhang ◽  
Shaofu He ◽  
Nan Shao ◽  
...  

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.


2018 ◽  
Vol 7 (4.20) ◽  
pp. 22 ◽  
Author(s):  
Jabeen Sultana ◽  
Abdul Khader Jilani ◽  
. .

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree.  10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.  


2020 ◽  
Author(s):  
Athira B ◽  
Josette Jones ◽  
Sumam Mary Idicula ◽  
Anand Kulanthaivel ◽  
Sunandan Chakraborty ◽  
...  

BACKGROUND Widespread influence on social media has its ramifications on all walks of life over the last few decades. Interestingly enough, the healthcare sector is a significant beneficiary of the reports and pronouncements that appear on social media. Although medics and other health professionals are the final decision-makers, advice or recommendations from kindred patients has consequential role. In full appreciation of the current trend, the present paper explores the topics pertaining to the patients, diagnosed with breast cancer as well as the survivors, who are discussing on online fora. OBJECTIVE The study examines the online forum of Breast Cancer.org (BCO), automatically maps discussion entries to formal topics, and proposes a machine learning model to characterize the topics in the health-related discussion, so as to elicit meaningful deliberations. Therefore, the study of communication messages draws conclusions about what matters to the patients. METHODS Manual annotation was made in the posts of a few randomly selected forums. To explore the topics of breast cancer patients and survivors, 736 posts are selected for semantic annotation. The entire process was automated using machine learning model falling into category of supervised learning algorithms. The effectiveness of those algorithms used for above process has been compared. RESULTS The method could classify following 8-high level topics, such as writing medication reviews, explaining the adverse effects of medication, clinician knowledge, various treatment options, seeking and supporting various matters, diagnostic procedures, financial issues and implications in everyday life. The model viz. Ensembled Neural Network (ENN) achieved a promising predicted score of 83.4 % F1-score among four different models. CONCLUSIONS The research was able to segregate and name the posts all into a set of 8 classes and supported by the efficient scheme for encoding text to vectors, the current machine learning models are shown to give impressive performance in modelling the annotation process.


2019 ◽  
Vol 60 (6) ◽  
pp. 818-824 ◽  
Author(s):  
Takuya Mizutani ◽  
Taiki Magome ◽  
Hiroshi Igaki ◽  
Akihiro Haga ◽  
Kanabu Nawa ◽  
...  

ABSTRACT The purpose of this study was to predict the survival time of patients with malignant glioma after radiotherapy with high accuracy by considering additional clinical factors and optimize the prescription dose and treatment duration for individual patient by using a machine learning model. A total of 35 patients with malignant glioma were included in this study. The candidate features included 12 clinical features and 192 dose–volume histogram (DVH) features. The appropriate input features and parameters of the support vector machine (SVM) were selected using the genetic algorithm based on Akaike’s information criterion, i.e. clinical, DVH, and both clinical and DVH features. The prediction accuracy of the SVM models was evaluated through a leave-one-out cross-validation test with residual error, which was defined as the absolute difference between the actual and predicted survival times after radiotherapy. Moreover, the influences of various values of prescription dose and treatment duration on the predicted survival time were evaluated. The prediction accuracy was significantly improved with the combined use of clinical and DVH features compared with the separate use of both features (P < 0.01, Wilcoxon signed rank test). Mean ± standard deviation of the leave-one-out cross-validation using the combined clinical and DVH features, only clinical features and only DVH features were 104.7 ± 96.5, 144.2 ± 126.1 and 204.5 ± 186.0 days, respectively. The prediction accuracy could be improved with the combination of clinical and DVH features, and our results show the potential to optimize the treatment strategy for individual patients based on a machine learning model.


2021 ◽  
Author(s):  
Sidhant Mallick ◽  
Rasmita Dash ◽  
Rajashree Dash ◽  
Rasmita Rautray

2011 ◽  
Vol 36 (5) ◽  
pp. 2841-2847 ◽  
Author(s):  
Sheau-Ling Hsieh ◽  
Sung-Huai Hsieh ◽  
Po-Hsun Cheng ◽  
Chi-Huang Chen ◽  
Kai-Ping Hsu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document