Prediction of Tumor Shrinkage Pattern to Neoadjuvant Chemotherapy Using a Multiparametric MRI-Based Machine Learning Model in Patients With Breast Cancer

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.

Download Full-text

High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8403.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 105-110

Author(s):

D. Mabuni ◽

S. Aquter Babu

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Classification Accuracy ◽

Cross Validation ◽

Training Dataset ◽

Decision Tree Classification ◽

Testing Dataset ◽

Tree Classifier ◽

Validation Technique ◽

Fold Cross Validation

In machine learning data usage is the most important criterion than the logic of the program. With very big and moderate sized datasets it is possible to obtain robust and high classification accuracies but not with small and very small sized datasets. In particular only large training datasets are potential datasets for producing robust decision tree classification results. The classification results obtained by using only one training and one testing dataset pair are not reliable. Cross validation technique uses many random folds of the same dataset for training and validation. In order to obtain reliable and statistically correct classification results there is a need to apply the same algorithm on different pairs of training and validation datasets. To overcome the problem of the usage of only a single training dataset and a single testing dataset the existing k-fold cross validation technique uses cross validation plan for obtaining increased decision tree classification accuracy results. In this paper a new cross validation technique called prime fold is proposed and it is experimentally tested thoroughly and then verified correctly using many bench mark UCI machine learning datasets. It is observed that the prime fold based decision tree classification accuracy results obtained after experimentation are far better than the existing techniques of finding decision tree classification accuracies.

Download Full-text

Using machine learning to identify clotted specimens in coagulation testing

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2021-0081 ◽

2021 ◽

Vol 0 (0) ◽

Cited By ~ 1

Author(s):

Kui Fang ◽

Zheqing Dong ◽

Xiling Chen ◽

Ji Zhu ◽

Bing Zhang ◽

...

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Clinical Laboratory ◽

Blood Clot ◽

Roc Curves ◽

Training Dataset ◽

Proof Of Concept ◽

Testing Dataset ◽

Coagulation Testing ◽

Fold Cross Validation

Abstract Objectives A sample with a blood clot may produce an inaccurate outcome in coagulation testing, which may mislead clinicians into making improper clinical decisions. Currently, there is no efficient method to automatically detect clots. This study demonstrates the feasibility of utilizing machine learning (ML) to identify clotted specimens. Methods The results of coagulation testing with 192 clotted samples and 2,889 no-clot-detected (NCD) samples were retrospectively retrieved from a laboratory information system to form the training dataset and testing dataset. Standard and momentum backpropagation neural networks (BPNNs) were trained and validated using the training dataset with a five-fold cross-validation method. The predictive performances of the models were then assessed based on the testing dataset. Results Our results demonstrated that there were intrinsic distinctions between the clotted and NCD specimens regarding differences in the testing results and the separation of the groups (clotted and NCD) in the t-SNE analysis. The standard and momentum BPNNs could identify the sample status (clotted and NCD) with areas under the ROC curves of 0.966 (95% CI, 0.958–0.974) and 0.971 (95% CI, 0.9641–0.9784), respectively. Conclusions Here, we have described the application of ML algorithms in identifying the sample status based on the results of coagulation testing. This approach provides a proof-of-concept application of ML algorithms to evaluate the sample quality, and it has the potential to facilitate clinical laboratory automation.

Download Full-text

MRI-based Radiomics for Prognosis of Pediatric Diffuse Intrinsic Pontine Glioma: An International Study

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab042 ◽

2021 ◽

Author(s):

Lydia T Tam ◽

Kristen W Yeom ◽

Jason N Wright ◽

Alok Jaju ◽

Alireza Radmanesh ◽

...

Keyword(s):

Machine Learning ◽

International Study ◽

Learning Model ◽

Image Feature ◽

Training Dataset ◽

Post Contrast ◽

Clinical Variables ◽

Testing Dataset ◽

Machine Learning Model ◽

Independent Testing Dataset

Abstract Background Diffuse Intrinsic pontine gliomas (DIPGs) are lethal pediatric brain tumors. Presently, MRI is the mainstay of disease diagnosis and surveillance. We identify clinically significant computational features from MRI and create a prognostic machine learning model. Methods We isolated tumor volumes of T1-post contrast (T1) and T2-weighted (T2) MRIs from 177 treatment-naïve DIPG patients from an international cohort for model training and testing. The Quantitative Image Feature Pipeline and PyRadiomics was used for feature extraction. Ten-fold cross-validation of LASSO Cox regression selected optimal features to predict overall survival (OS) in the training dataset and tested in the independent testing dataset. We analyzed model performance using clinical variables (age at diagnosis and sex) only, radiomics only, and radiomics plus clinical variables. Results All selected features were intensity and texture-based on the wavelet filtered images (three T1 grey-level co-occurrence matrix (GLCM) texture features, T2 GLCM texture feature, and T2 first order-mean). This multivariable Cox model demonstrated a concordance of 0.68 [95% CI: 0.61-0.74] in the training dataset, significantly outperforming the clinical-only model (C=0.57 [95% CI: 0.49-0.64]). Adding clinical features to radiomics slightly improved performance (C=0.70 [95% CI: 0.64-0.77]). The combined radiomics and clinical model was validated in the independent testing dataset (C=0.59 [95% CI: 0.51-0.67], Noether’s test p=0.02). Conclusion In this international study, we demonstrate the use of radiomic signatures to create a machine learning model for DIPG prognostication. Standardized, quantitative approaches that objectively measure DIPG changes, including computational MRI evaluation, could offer new approaches to assessing tumor phenotype and serve a future role for optimizing clinical trial eligibility and tumor surveillance.

Download Full-text

Building a Machine Learning Model on Breast Cancer Data with Focus on Cross Validation and Accuracy

A Collection of Contemporary Research Articles in Electronics, Communication and Computation ◽

10.47531/mantech/ecc.2021.37 ◽

2021 ◽

pp. 242

Author(s):

Sagar Rai ◽

Aditya Anand ◽

Kunal Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cross Validation ◽

Learning Model ◽

Breast Cancer Data ◽

Cancer Data ◽

Machine Learning Model

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Machine Learning model for Breast Cancer Prediction

2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) ◽

10.1109/i-smac49090.2020.9243323 ◽

2020 ◽

Author(s):

Ankur Gupta ◽

Dushyant Kaushik ◽

Muskan Garg ◽

Apurv Verma

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Model ◽

Cancer Prediction ◽

Machine Learning Model

Download Full-text

Machine Learning Prediction Approach to Enhance Congestion Control in 5G IoT Environment

Electronics ◽

10.3390/electronics8060607 ◽

2019 ◽

Vol 8 (6) ◽

pp. 607 ◽

Cited By ~ 5

Author(s):

Ihab Ahmed Najm ◽

Alaa Khalaf Hamoud ◽

Jaime Lloret ◽

Ignacio Bosch

Keyword(s):

Machine Learning ◽

Congestion Control ◽

Learning Model ◽

Training Dataset ◽

Transmission Protocol ◽

Control Mechanisms ◽

Next Generation ◽

Control Approach ◽

Machine Learning Model ◽

Prediction Approach

The 5G network is a next-generation wireless form of communication and the latest mobile technology. In practice, 5G utilizes the Internet of Things (IoT) to work in high-traffic networks with multiple nodes/sensors in an attempt to transmit their packets to a destination simultaneously, which is a characteristic of IoT applications. Due to this, 5G offers vast bandwidth, low delay, and extremely high data transfer speed. Thus, 5G presents opportunities and motivations for utilizing next-generation protocols, especially the stream control transmission protocol (SCTP). However, the congestion control mechanisms of the conventional SCTP negatively influence overall performance. Moreover, existing mechanisms contribute to reduce 5G and IoT performance. Thus, a new machine learning model based on a decision tree (DT) algorithm is proposed in this study to predict optimal enhancement of congestion control in the wireless sensors of 5G IoT networks. The model was implemented on a training dataset to determine the optimal parametric setting in a 5G environment. The dataset was used to train the machine learning model and enable the prediction of optimal alternatives that can enhance the performance of the congestion control approach. The DT approach can be used for other functions, especially prediction and classification. DT algorithms provide graphs that can be used by any user to understand the prediction approach. The DT C4.5 provided promising results, with more than 92% precision and recall.

Download Full-text

Efficient simulation of flood events using machine learning

10.5194/egusphere-egu2020-6254 ◽

2020 ◽

Author(s):

Jihane Elyahyioui ◽

Valentijn Pauwels ◽

Edoardo Daly ◽

Francois Petitjean ◽

Mahesh Prakash

Keyword(s):

Machine Learning ◽

Water Depth ◽

Input Data ◽

Learning Model ◽

Water Levels ◽

Training Dataset ◽

Time Step ◽

Machine Learning Model ◽

Maximum Water ◽

Spatio Temporal

Flooding is one of the most common and costly natural hazards at global scale. Flood models are important in supporting flood management. This is a computationally expensive process, due to the high nonlinearity of the equations involved and the complexity of the surface topography. New modelling approaches based on deep learning algorithms have recently emerged for multiple applications.This study aims to investigate the capacity of machine learning to achieve spatio-temporal flood modelling. The combination of spatial and temporal input data to obtain dynamic results of water levels and flows from a machine learning model on multiple domains for applications in flood risk assessments has not been achieved yet. Here, we develop increasingly complex architectures aimed at interpreting the raw input data of precipitation and terrain to generate essential spatio-temporal variables (water level and velocity fields) and derived products (flood maps) by training these based on hydrodynamic simulations.An extensive training dataset is generated by solving the 2D shallow water equations on simplified topographies using Lisflood-FP.As a first task, the machine learning model is trained to reproduce the maximum water depth, using as inputs the precipitation time series and the topographic grid. The models combine the spatial and temporal information through a combination of 1D and 2D convolutional layers, pooling, merging and upscaling. Multiple variations of this generic architecture are trained to determine the best one(s). Overall, the trained models return good results regarding performance indices (mean squared error, mean absolute error and classification accuracy) but fail at predicting the maximum water depths with sufficient precision for practical applications.A major limitation of this approach is the availability of training examples. As a second task, models will be trained to bring the state of the system (spatially distributed water depth and velocity) from one time step to the next, based on the same inputs as previously, generating the full solution equivalent to that of a hydrodynamic solver. The training database becomes much larger as each pair of consecutive time steps constitutes one training example.Assuming that a reliable model can be built and trained, such methodology could be applied to build models that are faster and less computationally demanding than hydrodynamic models. Indeed, in with the synthetic cases shown here, the simulation times of the machine learning models (< seconds) are far shorter than those of the hydrodynamic model (a few minutes at least). These data-driven models could be used for interpolation and forecasting. The potential for extrapolation beyond the range of training datasets will also be investigated (different topography and high intensity precipitation events).&#160;

Download Full-text

A Novel XGBoost Method to Infer the Primary Lesion of 20 Solid Tumor Types From Gene Expression Data

Frontiers in Genetics ◽

10.3389/fgene.2021.632761 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sijie Chen ◽

Wenjing Zhou ◽

Jinghui Tu ◽

Jian Li ◽

Bo Wang ◽

...

Keyword(s):

Machine Learning ◽

Learning Model ◽

Training Data ◽

Diagnostic Efficiency ◽

Metastatic Tumors ◽

Pathological Conditions ◽

Machine Learning Model ◽

Independent Test ◽

Tumor Types ◽

Fold Cross Validation

PurposeEstablish a suitable machine learning model to identify its primary lesions for primary metastatic tumors in an integrated learning approach, making it more accurate to improve primary lesions’ diagnostic efficiency.MethodsAfter deleting the features whose expression level is lower than the threshold, we use two methods to perform feature selection and use XGBoost for classification. After the optimal model is selected through 10-fold cross-validation, it is verified on an independent test set.ResultsSelecting features with around 800 genes for training, theR2-score of a 10-fold CV of training data can reach 96.38%, and theR2-score of test data can reach 83.3%.ConclusionThese findings suggest that by combining tumor data with machine learning methods, each cancer has its corresponding classification accuracy, which can be used to predict primary metastatic tumors’ location. The machine-learning-based method can be used as an orthogonal diagnostic method to judge the machine learning model processing and clinical actual pathological conditions.

Download Full-text

Annotating and Detecting Topics from Social Media Forum and Modelling the Annotation to Derive Directions-A Case Study (Preprint)

10.2196/preprints.20608 ◽

2020 ◽

Author(s):

Athira B ◽

Josette Jones ◽

Sumam Mary Idicula ◽

Anand Kulanthaivel ◽

Sunandan Chakraborty ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Social Media ◽

Semantic Annotation ◽

Current Trend ◽

Learning Model ◽

Healthcare Sector ◽

Final Decision ◽

Breast Cancer Patients ◽

Machine Learning Model

BACKGROUND Widespread influence on social media has its ramifications on all walks of life over the last few decades. Interestingly enough, the healthcare sector is a significant beneficiary of the reports and pronouncements that appear on social media. Although medics and other health professionals are the final decision-makers, advice or recommendations from kindred patients has consequential role. In full appreciation of the current trend, the present paper explores the topics pertaining to the patients, diagnosed with breast cancer as well as the survivors, who are discussing on online fora. OBJECTIVE The study examines the online forum of Breast Cancer.org (BCO), automatically maps discussion entries to formal topics, and proposes a machine learning model to characterize the topics in the health-related discussion, so as to elicit meaningful deliberations. Therefore, the study of communication messages draws conclusions about what matters to the patients. METHODS Manual annotation was made in the posts of a few randomly selected forums. To explore the topics of breast cancer patients and survivors, 736 posts are selected for semantic annotation. The entire process was automated using machine learning model falling into category of supervised learning algorithms. The effectiveness of those algorithms used for above process has been compared. RESULTS The method could classify following 8-high level topics, such as writing medication reviews, explaining the adverse effects of medication, clinician knowledge, various treatment options, seeking and supporting various matters, diagnostic procedures, financial issues and implications in everyday life. The model viz. Ensembled Neural Network (ENN) achieved a promising predicted score of 83.4 % F1-score among four different models. CONCLUSIONS The research was able to segregate and name the posts all into a set of 8 classes and supported by the efficient scheme for encoding text to vectors, the current machine learning models are shown to give impressive performance in modelling the annotation process.

Download Full-text