Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction

3135 Background: Saliva is non-invasively accessible and informative biological fluid which has high potential for the early diagnosis of various diseases. The aim of this study is to develop machine learning methods and to explore new salivary biomarkers to discriminate breast cancer patients from healthy controls. Methods: We conducted a comprehensive metabolite analysis of saliva samples obtained from 101 patients with invasive carcinoma (IC), 23 patients with ductal carcinoma in situ (DCIS) and 42 healthy controls, using capillary electrophoresis and liquid chromatography with mass spectrometry to quantify hundreds of hydrophilic metabolites. Saliva samples were collected under 9h fasting and were split into training and validation data. Conventional statistical analyses and artificial intelligence-based methods were used to access the discrimination abilities of the quantified metabolite. Multiple logistic regression (MLR) model and an alternative decision tree (ADTree)-based machine learning methods were used. The generalization abilities of these mathematical models were validated in various computational tests, such as cross-validation and resampling methods. Results: Among quantified 260 metabolites, amino acids and polyamines showed significantly elevated in saliva from breast cancer patients, e.g. spermine showed the highest area under the receiver operating characteristic curves (AUC) to discriminate IC from C; 0.766 (95% confidence interval [CI]; 0.671 – 0.840, P < 0.0001). These metabolites showed no significant difference between C and DICS, i.e., these metabolites were elevated only in the samples of IC. The MLR yielded higher AUC to discriminate IC from C; 0.790 (95% CI; 0.699 – 0.859, P < 0.0001). The ADTree with ensemble approach showed the best AUC; 0.912 (95% CI; 0.838 – 0.961, P < 0.0001). In the comparison of these metabolites in the analysis of each subtype, seven metabolites were significantly different between Luminal A-like and Luminal B-like while, but few metabolites were significantly different among the other subtypes. Conclusions: These data indicated the combination of salivary metabolomic profiles including polyamines showed potential ability to screening breast cancer in a non-invasive way.

Download Full-text

Performance Evaluation of Pseudo Code with Weka for Accuracy Calculation

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5394.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7818-7823

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Data Transmission ◽

Software Metrics ◽

Fault Prediction ◽

Learning Methods ◽

Accuracy Measurement ◽

Machine Learning Methods ◽

Interoperability Test

Programming testing is a fundamental and essential advance of the existence cycle of programming improvement to recognize and defects in programming and afterward fix the deficiencies. The reliability of the data transmission or the quality of proper processing ,maintenance and retrieval of information to a server can be tested for some systems. Accuracy is also one factor that is usually used to the Joint Interoperability Test Command as a criterion for accessing interoperability. This is the main investigation of PC flaw forecast and exactness as per our examination, which spotlights on the utilization of PROMISE database dataset. Some PROMISE database dataset tests are compared between pseudo code (PYTHON) and actual software (WEKA),which in computer fault prediction and accuracy measurement are effective software metrics and machine learning methods.

Download Full-text

Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data

10.1101/214254 ◽

2017 ◽

Author(s):

Fadhl M Alakwaa ◽

Kumardeep Chaudhary ◽

Lana X Garmire

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Estrogen Receptor ◽

Deep Learning ◽

Support Vector ◽

Integrated Analysis ◽

Learning Method ◽

Learning Methods ◽

Metabolomics Data ◽

Machine Learning Methods

ABSTRACTMetabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER-patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accurcy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.

Download Full-text

Pendekatan Machine Learning yang Efisien untuk Prediksi Kanker Payudara

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i3.1347 ◽

2019 ◽

Vol 3 (3) ◽

pp. 458-469

Author(s):

Azminuddin I. S. Azis ◽

Irma Surya Kumala Idris ◽

Budy Santoso ◽

Yasin Aril Mustofa

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Particle Swarm Optimization ◽

Nearest Neighbor ◽

Breast Cancer Dataset ◽

Z Score ◽

Cancer Dataset ◽

Swarm Optimization ◽

Cancer Prediction ◽

Machine Learning Methods

Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.

Download Full-text