Using Machine Learning Algorithms for Breast Cancer Diagnosis

There are many cancer patients, especially breast cancer patients as it is the most common type of cancer. Due to the huge number of breast cancer patients, many breast cancer-focused hospitals aren't able to process the huge number of patients and might expose some women to late stages of cancer. Thus, the automation of the process can help these hospitals in speeding up the process of cancer detection. In this paper, the authors test several machine learning models such as k-nearest neighbours (KNN), support vector machine (SVM), and artificial neural network (ANN). They then compare their accuracies and losses with themselves and other models that have been developed by other researchers to see whether their approach is efficient or not and to decide what machine learning algorithm is best to use.

Download Full-text

Machine Learning Algorithms and Whole Exome Sequencing Data from Breast Cancer Patients in the UK Biobank Predict Survival

10.21203/rs.3.rs-115867/v1 ◽

2020 ◽

Author(s):

Bum-Sup Jang ◽

In Ah Kim

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Overall Survival ◽

Cancer Patients ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Uk Biobank ◽

Breast Cancer Patients ◽

Gene Set ◽

The Uk

Abstract Background: Using by machine learning algorithms, we aimed to identify the mutated gene set from the whole exome sequencing (WES) data of blood in the cancer, which is associated with overall survival in breast cancer patients.Methods: WES data from 1,181 female breast cancer patients within the UK Biobank cohort was collected. The number of mutations for each gene was summed and defined as the blood-based mutation burden per patient. Using by Long short-term memory (LSTM) machine learning algorithm and a XGBoost—a gradient-boosted tree algorithm, we developed the model to predict patient overall survival. Results: From the UK biobank-breast cancer cohort, most altered genes in blood samples were related with the TP53 pathway. In the LSTM model, the minimum 50 genes were found to predict high vs. low mutation burden. In the XGBoost survival model, the gene-set could predict overall survival showing the concordance index of 0.75 and the scaled Brier-score of 0.146 from the held-out testing set (20%, N=236). In older patients (≥ 56 years), the high mutation group based on this gene-set showed inferior overall survival compared to the low mutation group (log-rank test, P=0.042)Conclusion: The machine learning algorithms revealed the gene-signature in the UK biobank breast cancer cohort. Mutational burden observed in blood was associated with overall survival in relatively old patients. This gene-signature should be verified in prospective setting.

Download Full-text

PCN277 Analysis of Treatment Sequences in HER2-Positive EARLY Breast Cancer Patients: A Retrospective Study from the French National Hospital Database Using a Machine Learning Algorithm, the TAK

Value in Health ◽

10.1016/j.jval.2020.08.414 ◽

2020 ◽

Vol 23 ◽

pp. S471-S472

Author(s):

M. Laurent ◽

M. Prodel ◽

A. Vainchtock ◽

M. Gilberg ◽

R. Ghorbal ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Retrospective Study ◽

Cancer Patients ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Breast Cancer Patients ◽

National Hospital ◽

Her2 Positive ◽

Hospital Database

Download Full-text

Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features

BioMed Research International ◽

10.1155/2020/1475368 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Yingyi Hao ◽

Li He ◽

Yifan Zhou ◽

Yiru Zhao ◽

Menglong Li ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

Machine Learning Algorithms ◽

Gene Set Enrichment Analysis ◽

Single Type ◽

Breast Cancer Patients ◽

Clinical Cancer Research ◽

Genomic Features ◽

Gene Set

In clinical cancer research, it is a hot topic on how to accurately stratify patients based on genomic data. With the development of next-generation sequencing technology, more and more types of genomic features, such as mRNA expression level, can be used to distinguish cancer patients. Previous studies commonly stratified patients by using a single type of genomic features, which can only reflect one aspect of the cancer. In fact, multiscale genomic features will provide more information and may be helpful for clinical prediction. In addition, most of the conventional machine learning algorithms use a handcrafted gene set as features to construct models, which is generally selected by a statistical method with an arbitrary cut-off, e.g., p value < 0.05. The genes in the gene set are not necessarily related to the cancer and will make the model unreliable. Therefore, in our study, we thoroughly investigated the performance of different machine learning methods on stratifying breast cancer patients with a single type of genomic features. Then, we proposed a strategy, which can take into account the degree of correlation between genes and cancer patients, to identify the features from mRNAs and microRNAs, and evaluated the performance of the models with the new combined features of the multiscale genomic features. The results showed that, compared with the models constructed with a single type of features, the models with the multiscale genomic features generated by our proposed method achieved better performance on stratifying the ER status of breast cancer patients. Moreover, we found that the identified multiscale genomic features were closely related to the cancer by gene set enrichment analysis, indicating that our proposed strategy can well reflect the biological relevance of the genes to breast cancer. In conclusion, modelling with multiscale genomic features closely related to the cancer not only can guarantee the prediction performance of the models but also can effectively provide candidate genes for interpreting the mechanisms of cancer.

Download Full-text

Potential determinants of radiation-induced lymphocyte decrease and lymphopenia in breast cancer patients by machine learning approaches.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e12567 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e12567-e12567

Author(s):

Hao Yu ◽

Fang Chen ◽

Li Yang ◽

Jian-Yue Jin ◽

Feng-Ming Spring Kong

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Radiation Therapy ◽

Cancer Patients ◽

Predictive Power ◽

Machine Learning Algorithms ◽

Breast Cancer Patients ◽

Value Analysis ◽

Extreme Gradient Boosting ◽

Radiation Induced

e12567 Background: Radiation-induced lymphopenia accompanied with radiation therapy is associated with inferior clinical outcomes in a wide variety of solid malignancies. This study aimed to examine the potential determines of radiation-induced lymphocyte decrease and radiation-induced lymphopenia in breast cancer patients who underwent radiotherapy. Methods: Patients with breast cancer treated who underwent radiotherapy were enrolled in University of Hong Kong-Shenzhen Hospital (our cohort). Circulating lymphocyte levels were evaluated within 7 days prior to and end of radiation therapy. Feature groups including clinical data, tumor characteristics, radiotherapy dosimetrics, treatment regiments were also collected. We applied machine learning algorithms (Extreme Gradient Boosting, XGboost) to predict the ratio of lymphocyte level after radiotherapy to baseline lymphocyte level and the event of lymphopenia and compared with Lasso regression approaches. Next, we used Shapley additive explanation (SHAP) to explore the directional contribution of each feature for lymphocyte decrease and lymphopenia. For the purpose of model validation and proof-of-concept validation, an independent cohort of patients enrolled in prospective trial was eligible (IP cohort). Results: A total of 589 patients were enrolled in our cohort and 203 patients in IP cohort. XGboost models which trained in our cohort with performances of a mean RMSE: 0.157 and R2: 53.9% for the ratio of lymphocyte levels; a mean accuracy: 0.757 and ROC-AUC: 0.733 for the lymphopenia events, separately. These models can predict the ratio of lymphocyte levels with a mean RMSE: 0.175 and R2: 47%; predict the lymphopenia events with a mean accuracy: 0.739 and ROC-AUC: 0.737 in the totally independent IP cohort. The feature group of dosimetrics had the largest predictive power with RMSE: 0.192, R2: 29.8%, accuracy: 0.678 and ROC-AUC: 0.667; followed by the group of baseline blood cells with predictive power as RMSE: 0.207, R2: 18.9%, accuracy: 0.669 and ROC-AUC: 0.645. Next, by SHAP value analysis, we investigated that integral dose of the total body, V5 dose, mean lung dose and V20 dose of ipsilateral lung/bilateral lungs were in consequence important promote factors for lymphocyte decrease and for the event of lymphopenia, while the features of baseline monocyte, mean heart dose and tumor size played a role of protection at some extend. Conclusions: In this study, we constructed robust XGboost models for predicting the lymphocyte decrease and the event of lymphopenia in breast cancer patients who underwent radiation therapy. We also applied SHAP analysis for revealing the directional contribution of features. These results are important either for the understanding the contributions of dosimetrics on immune response or for the refine of radiation dosimetrics before treatment in future clinical usages.

Download Full-text

Machine Learning Algorithm to Predict Survivability In Breast Cancer Patients

International Journal on Computer Science and Engineering ◽

10.21817/ijcse/2018/v10i4/181004013 ◽

2018 ◽

Vol 10 (4) ◽

pp. 97-101 ◽

Cited By ~ 2

Author(s):

Kahksha . ◽

Sameena Naaz

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Breast Cancer Patients

Download Full-text

Design and analysis of quantum powered support vector machines for malignant breast cancer diagnosis

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0089 ◽

2021 ◽

Vol 30 (1) ◽

pp. 998-1013

Author(s):

Shubham Vashisth ◽

Ishika Dhall ◽

Garima Aggarwal

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Malignant Breast ◽

Quantum Technology ◽

Classical Computer

Abstract The rapid pace of development over the last few decades in the domain of machine learning mirrors the advances made in the field of quantum computing. It is natural to ask whether the conventional machine learning algorithms could be optimized using the present-day noisy intermediate-scale quantum technology. There are certain computational limitations while training a machine learning model on a classical computer. Using quantum computation, it is possible to surpass these limitations and carry out such calculations in an optimized manner. This study illustrates the working of the quantum support vector machine classification model which guarantees an exponential speed-up over its typical alternatives. This research uses the quantum SVM model to solve the classification task of a malignant breast cancer diagnosis. This study also demonstrates a comparative analysis of distinct forms of SVM algorithms concerning their time complexity and performances on standard evaluation metrics, namely accuracy, precision, recall, and F1-score, to exemplify the supremacy of quantum SVM over its conventional variants.

Download Full-text

Machine Learning Algorithms for Prediction of Survival Curves in Breast Cancer Patients

Applied Bionics and Biomechanics ◽

10.1155/2021/9338091 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Roqia Saleem Awad Maabreh ◽

Malik Bader Alazzam ◽

Ahmed S. AlGhamdi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

Statistical Methods ◽

Learning Algorithms ◽

Survival Rates ◽

Machine Learning Algorithms ◽

Breast Cancer Patients ◽

Appropriate Treatment ◽

The Impact

Today, cancer is the second leading cause of death worldwide, and the number of people diagnosed with the disease is expected to rise. Breast cancer is the most commonly diagnosed cancer in women, and it has one of the highest survival rates when treated properly. Because the effectiveness and, as a result, survival of the patient are dependent on each case, it is critical to know the modelling of their survival ahead of time. Artificial intelligence is a rapidly expanding field, and its clinical applications are following suit (having surpassed humans in many evidence-based medical tasks). From the inception of since first stable risk estimator based on statistical methods appeared in survival analysis, there have been numerous versions of it created, with machine learning being used in only a few of them. Nonlinear relationships between variables and the impact they have on the variable to be predicted are very easy to evaluate using statistical methods. However, because they are just mathematical equations, they have flaws that limit the quality of their output. The main goal of this study is to find the best machine learning algorithms for predicting the individualised survival of breast cancer patients, as well as the most appropriate treatment, and to propose new numerical variable stratifications. They will still be carried out using unsupervised machine learning methods that divide patients into groups based on their risk in each dataset. We will compare it to standard groupings to see if it has more significance. Knowing that the greatest challenge in dealing with clinical data is its quantity and quality, we have gone to great lengths to ensure their quality before replicating them. We used the Cox statistical method in conjunction with other statistical methods and tests to find the best possible dataset with which to train our model, despite its ease of multivariate analysis.

Download Full-text

Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms

Journal of Healthcare Engineering ◽

10.1155/2019/4253641 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 8

Author(s):

Habib Dhahri ◽

Eslam Al Maghayreh ◽

Awais Mahmood ◽

Wail Elkilani ◽

Mohammed Faisal Nagi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Genetic Programming ◽

Learning Algorithm ◽

Empirical Studies ◽

Learning Algorithms ◽

Breast Cancer Diagnosis ◽

Machine Learning Algorithms ◽

Programming Technique ◽

Soft Computing Techniques

There have been several empirical studies addressing breast cancer using machine learning and soft computing techniques. Many claim that their algorithms are faster, easier, or more accurate than others are. This study is based on genetic programming and machine learning algorithms that aim to construct a system to accurately differentiate between benign and malignant breast tumors. The aim of this study was to optimize the learning algorithm. In this context, we applied the genetic programming technique to select the best features and perfect parameter values of the machine learning classifiers. The performance of the proposed method was based on sensitivity, specificity, precision, accuracy, and the roc curves. The present study proves that genetic programming can automatically find the best model by combining feature preprocessing methods and classifier algorithms.

Download Full-text

Machine Learning Models of Breast Cancer Risk Prediction

10.1101/723304 ◽

2019 ◽

Author(s):

Md. Mohaimenul Islam ◽

Tahmina Narin Poly

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Clinical Features ◽

Machine Learning Algorithms ◽

Receiver Operating Curve ◽

Support Vector ◽

Breast Cancer Patients ◽

K Nearest Neighbors ◽

Cancer Prediction ◽

Artificial Neural Network Ann

AbstractBreast cancer is the most common cancer in women both in the developed and less developed world. Early detection based on clinical features can greatly increase the chances for successful treatment. Our goal was to construct a breast cancer prediction model based on machine learning algorithms. A total of 10 potential clinical features like age, BMI, glucose, insulin, HOMA, leptin, adiponectin, resistin, and MCP-1 were collected from 116 patients. In this report, most commonly used machine learning model such as decision tree (DT), random forest (RF), K-nearest neighbors (KNN), support vector machine (SVM), logistic regression (LR), and artificial neural network (ANN) models were tested for breast cancer prediction. A repeated 10-fold cross-validation model was used to rank variables on the randomly split dataset. The accuracy of DT, RF, SVM, LR, ANN, and KNN was 0.71, 0.71, 0.77, 0.80, 0.81, and 0.86 respectively. However, The KNN model showed most higher accuracy with area under receiver operating curve, sensitivity, and specificity of 0.95, 0.80, 0.91. Therefore, identification of breast cancer patients correctly would create care opportunities such as monitoring and adopting intervention plans may benefit the quality of care in long-term.

Download Full-text

A Proposal of Quantum-Inspired Machine Learning for Medical Purposes: An Application Case

Mathematics ◽

10.3390/math9040410 ◽

2021 ◽

Vol 9 (4) ◽

pp. 410

Author(s):

Domenico Pomarico ◽

Annarita Fanizzi ◽

Nicola Amoroso ◽

Roberto Bellotti ◽

Albino Biafora ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Cancer Patients ◽

Early Stage ◽

Support Vector ◽

Training Procedure ◽

Sampled Data ◽

Breast Cancer Patients ◽

Data Set

Learning tasks are implemented via mappings of the sampled data set, including both the classical and the quantum framework. Biomedical data characterizing complex diseases such as cancer typically require an algorithmic support for clinical decisions, especially for early stage tumors that typify breast cancer patients, which are still controllable in a therapeutic and surgical way. Our case study consists of the prediction during the pre-operative stage of lymph node metastasis in breast cancer patients resulting in a negative diagnosis after clinical and radiological exams. The classifier adopted to establish a baseline is characterized by the result invariance for the order permutation of the input features, and it exploits stratifications in the training procedure. The quantum one mimics support vector machine mapping in a high-dimensional feature space, yielded by encoding into qubits, while being characterized by complexity. Feature selection is exploited to study the performances associated with a low number of features, thus implemented in a feasible time. Wide variations in sensitivity and specificity are observed in the selected optimal classifiers during cross-validations for both classification system types, with an easier detection of negative or positive cases depending on the choice between the two training schemes. Clinical practice is still far from being reached, even if the flexible structure of quantum-inspired classifier circuits guarantees further developments to rule interactions among features: this preliminary study is solely intended to provide an overview of the particular tree tensor network scheme in a simplified version adopting just product states, as well as to introduce typical machine learning procedures consisting of feature selection and classifier performance evaluation.

Download Full-text