An efficient classification framework for breast cancer using hyper parameter tuned Random Decision Forest Classifier and Bayesian Optimization

Abstract Introduction This paper presents a lifelong learning framework which constantly adapts with changing data patterns over time through incremental learning approach. In many big data systems, iterative re-training high dimensional data from scratch is computationally infeasible since constant data stream ingestion on top of a historical data pool increases the training time exponentially. Therefore, the need arises on how to retain past learning and fast update the model incrementally based on the new data. Also, the current machine learning approaches do the model prediction without providing a comprehensive root cause analysis. To resolve these limitations, our framework lays foundations on an ensemble process between stream data with historical batch data for an incremental lifelong learning (LML) model. Case description A cancer patient’s pathological tests like blood, DNA, urine or tissue analysis provide a unique signature based on the DNA combinations. Our analysis allows personalized and targeted medications and achieves a therapeutic response. Model is evaluated through data from The National Cancer Institute’s Genomic Data Commons unified data repository. The aim is to prescribe personalized medicine based on the thousands of genotype and phenotype parameters for each patient. Discussion and evaluation The model uses a dimension reduction method to reduce training time at an online sliding window setting. We identify the Gleason score as a determining factor for cancer possibility and substantiate our claim through Lilliefors and Kolmogorov–Smirnov test. We present clustering and Random Decision Forest results. The model’s prediction accuracy is compared with standard machine learning algorithms for numeric and categorical fields. Conclusion We propose an ensemble framework of stream and batch data for incremental lifelong learning. The framework successively applies first streaming clustering technique and then Random Decision Forest Regressor/Classifier to isolate anomalous patient data and provides reasoning through root cause analysis by feature correlations with an aim to improve the overall survival rate. While the stream clustering technique creates groups of patient profiles, RDF further drills down into each group for comparison and reasoning for useful actionable insights. The proposed MALA architecture retains the past learned knowledge and transfer to future learning and iteratively becomes more knowledgeable over time.

Download Full-text

Detection of Respiratory Effort-Related Arousals Using a Hidden Markov Model and Random Decision Forest

2018 Computing in Cardiology Conference (CinC) ◽

10.22489/cinc.2018.089 ◽

2018 ◽

Cited By ~ 2

Author(s):

János Szalma ◽

András Bánhalmi ◽

Vilmos Bilicki

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Respiratory Effort ◽

Random Decision Forest ◽

Decision Forest

Download Full-text

Novel breast cancer classification framework based on deep learning

IET Image Processing ◽

10.1049/iet-ipr.2020.0122 ◽

2020 ◽

Vol 14 (13) ◽

pp. 3254-3259

Author(s):

Wessam M. Salama ◽

Azza M. Elbagoury ◽

Moustafa H. Aly

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Cancer Classification ◽

Breast Cancer Classification ◽

Classification Framework

Download Full-text

Feature Selection for Breast Cancer Classification by Integrating Somatic Mutation and Gene Expression

Frontiers in Genetics ◽

10.3389/fgene.2021.629946 ◽

2021 ◽

Vol 12 ◽

Author(s):

Qin Jiang ◽

Min Jin

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Feature Selection ◽

Somatic Mutation ◽

Mutation Frequency ◽

Bayesian Optimization ◽

Optimal Model ◽

Gene Set ◽

Cancer Data ◽

Integrative Network

Exploring the molecular mechanisms of breast cancer is essential for the early prediction, diagnosis, and treatment of cancer patients. The large scale of data obtained from the high-throughput sequencing technology makes it difficult to identify the driver mutations and a minimal optimal set of genes that are critical to the classification of cancer. In this study, we propose a novel method without any prior information to identify mutated genes associated with breast cancer. For the somatic mutation data, it is processed to a mutated matrix, from which the mutation frequency of each gene can be obtained. By setting a reasonable threshold for the mutation frequency, a mutated gene set is filtered from the mutated matrix. For the gene expression data, it is used to generate the gene expression matrix, while the mutated gene set is mapped onto the matrix to construct a co-expression profile. In the stage of feature selection, we propose a staged feature selection algorithm, using fold change, false discovery rate to select differentially expressed genes, mutual information to remove the irrelevant and redundant features, and the embedded method based on gradient boosting decision tree with Bayesian optimization to obtain an optimal model. In the stage of evaluation, we propose a weighted metric to modify the traditional accuracy to solve the sample imbalance problem. We apply the proposed method to The Cancer Genome Atlas breast cancer data and identify a mutated gene set, among which the implicated genes are oncogenes or tumor suppressors previously reported to be associated with carcinogenesis. As a comparison with the integrative network, we also perform the optimal model on the individual gene expression and the gold standard PMA50. The results show that the integrative network outperforms the gene expression and PMA50 in the average of most metrics, which indicate the effectiveness of our proposed method by integrating multiple data sources, and can discover the associated mutated genes in breast cancer.

Download Full-text

Pattern Classification with Random Decision Forest

2012 International Conference on Industrial Control and Electronics Engineering ◽

10.1109/icicee.2012.42 ◽

2012 ◽

Cited By ~ 2

Author(s):

Honghai Wang

Keyword(s):

Pattern Classification ◽

Random Decision Forest ◽

Decision Forest

Download Full-text

A Hybrid Approach for Sub-Acute Ischemic Stroke Lesion Segmentation Using Random Decision Forest and Gravitational Search Algorithm

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180209150338 ◽

2019 ◽

Vol 15 (2) ◽

pp. 170-183

Author(s):

Sunil Babu Melingi ◽

V. Vijayalakshmi

Keyword(s):

Ischemic Stroke ◽

Magnetic Resonance ◽

Acute Ischemic Stroke ◽

Search Algorithm ◽

Gravitational Search Algorithm ◽

Mr Images ◽

Random Decision Forest ◽

Number Of Leaves ◽

Gravitational Search ◽

Decision Forest

Background: The sub-acute ischemic stroke is the most basic illnesses reason for death on the planet. We evaluate the impact of segmentation technique during the time of breaking down the capacities of the cerebrum. </P><P> Objective: The main objective of this paper is to segment the ischemic stroke lesions in Magnetic Resonance (MR) images in the presence of other pathologies like neurological disorder, encephalopathy, brain damage, Multiple sclerosis (MS). Methods: In this paper, we utilize a hybrid way to deal with segment the ischemic stroke from alternate pathologies in magnetic resonance (MR) images utilizing Random Decision Forest (RDF) and Gravitational Search Algorithm (GSA). The RDF approach is an effective machine learning approach. Results: The RDF strategy joins two parameters; they are; the number of trees in the forest and the number of leaves per tree; it runs quickly and proficiently when dealing with vast data. The GSA algorithm is utilized to optimize the RDF data for choosing the best number of trees and the number of leaves per tree in the forest. Conclusion: This paper provides a new hybrid GSA-RDF classifier technique to segment the ischemic stroke lesions in MR images. The experimental results demonstrate that the proposed technique has the Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Bias Error (MBE) ranges are 16.5485 %, 7.2654 %, and 2.4585 %individually. The proposed RDF-GSA algorithm has better precision and execution when compared with the existing ischemic stroke segmentation method.

Download Full-text

Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy Using Machine Learning Models in Patients with Breast Cancer

10.21203/rs.3.rs-217080/v1 ◽

2021 ◽

Author(s):

Ji-Yeon Kim ◽

Eunjoo Jeon ◽

Soonhwan Kwon ◽

Hyungsik Jung ◽

Sunghoon Joo ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Neoadjuvant Chemotherapy ◽

Prediction Model ◽

Prediction Models ◽

Locally Advanced ◽

Pathologic Complete Response ◽

Complete Response ◽

Bayesian Optimization ◽

Pathological Characteristics

Abstract BackgroundThe aim of this study was to develop a machine learning(ML) based model to accurately predict pathologic complete response(pCR) to neoadjuvant chemotherapy(NAC) using pretreatment clinical and pathological characteristics of electronic medical record(EMR) data in breast cancer(BC).Methods The EMR data from patients diagnosed with early and locally advanced BC and who received NAC followed by curative surgery were reviewed. A total of 16 clinical and pathological characteristics was selected to develop ML model. We practiced six ML models using default settings for multivariate analysis with extracted variables. ResultsIn total, 2,065 patients were included in this analysis. Overall, 30.6% (n=632) of patients achieved pCR. Among six ML models, the LightGBM had the highest area under the curve (AUC) for pCR prediction. After hyper-parameter tuning with Bayesian optimization, AUC was 0.810. Performance of pCR prediction models in different histology-based subtypes was compared. The AUC was highest in HR+HER2- subgroup and lowest in HR-/HER2- subgroup (HR+/HER2- 0.841, HR+/HER2+ 0.716, HR-/HER2 0.753, HR-/HER2- 0.653).ConclusionsA ML based pCR prediction model using pre-treatment clinical and pathological characteristics provided useful information to predict pCR during NAC. This prediction model would help to determine treatment strategy in patients with BC planned NAC.

Download Full-text