Validation of miRNAs as Breast Cancer Biomarkers with a Machine Learning Approach

Certain small noncoding microRNAs (miRNAs) are differentially expressed in normal tissues and cancers, which makes them great candidates for biomarkers for cancer. Previously, a selected subset of miRNAs has been experimentally verified to be linked to breast cancer. In this paper, we validated the importance of these miRNAs using a machine learning approach on miRNA expression data. We performed feature selection, using Information Gain (IG), Chi-Squared (CHI2) and Least Absolute Shrinkage and Selection Operation (LASSO), on the set of these relevant miRNAs to rank them by importance. We then performed cancer classification using these miRNAs as features using Random Forest (RF) and Support Vector Machine (SVM) classifiers. Our results demonstrated that the miRNAs ranked higher by our analysis had higher classifier performance. Performance becomes lower as the rank of the miRNA decreases, confirming that these miRNAs had different degrees of importance as biomarkers. Furthermore, we discovered that using a minimum of three miRNAs as biomarkers for breast cancers can be as effective as using the entire set of 1800 miRNAs. This work suggests that machine learning is a useful tool for functional studies of miRNAs for cancer detection and diagnosis.

Download Full-text

Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data

10.1101/2020.09.13.295592 ◽

2020 ◽

Author(s):

Fei Deng ◽

Jibing Huang ◽

Xiaoling Yuan ◽

Chao Cheng ◽

Lanjing Zhang

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Dimension Reduction ◽

Cause Of Death ◽

Information Gain ◽

Machine Learning Algorithms ◽

Support Vector ◽

Biomedical Data ◽

Breast Cancers ◽

Similar Accuracy

AbstractMost of the biomedical datasets, including those of ‘omics, population studies and surveys, are rectangular in shape and have few missing data. Recently, their sample sizes have grown significantly. Rigorous analyses on these large datasets demand considerably more efficient and more accurate algorithms. Machine learning (ML) algorithms have been used to classify outcomes in biomedical datasets, including random forests (RF), decision tree (DT), artificial neural networks (ANN) and support vector machine (SVM). However, their performance and efficiency in classifying multi-category outcomes in rectangular data are poorly understood. Therefore, we aimed to compare these metrics among the 4 ML algorithms. As an example, we created a large rectangular dataset using the female breast cancers in the Surveillance, Epidemiology, and End Results-18 (SEER-18) database which were diagnosed in 2004 and followed up until December 2016. The outcome was the 6-category cause of death, namely alive, non-breast cancer, breast cancer, cardiovascular disease, infection and other cause. We included 58 dichotomized features from ~53,000 patients. All analyses were performed using MatLab (version 2018a) and the 10-fold cross validation approach. The accuracy in classifying 6-category cause of death with DT, RF, ANN and SVM was 72.68%, 72.66%, 70.01% and 71.85%, respectively. Based on the information entropy and information gain of feature values, we optimized dimension reduction (i.e. reduce the number of features in models). We found 22 or more features were required to maintain the similar accuracy, while the running time decreased from 440s for 58 features to 90s for 22 features in RF, from 70s to 40s in ANN and from 440s to 80s in SVM. In summary, we here show that RF, DT, ANN and SVM had similar accuracy for classifying multi-category outcomes in this large rectangular dataset. Dimension reduction based on information gain will significantly increase model’s efficiency while maintaining classification accuracy.

Download Full-text

Machine Learning Approach towards Mammographic Breast Density Measurement for Breast Cancer Risk Prediction: An Overview

SSRN Electronic Journal ◽

10.2139/ssrn.3599187 ◽

2020 ◽

Author(s):

Shivaji Pawar ◽

Suhas Sapate ◽

Kamal Sharma

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Breast Cancer Risk ◽

Cancer Risk ◽

Risk Prediction ◽

Density Measurement ◽

Mammographic Breast Density ◽

Learning Approach ◽

Machine Learning Approach ◽

Breast Density Measurement

Download Full-text

Applying a Machine Learning Approach to Predict Acute Toxicities During Radiation for Breast Cancer Patients

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2018.06.167 ◽

2018 ◽

Vol 102 (3) ◽

pp. S59

Author(s):

J. Reddy ◽

W.D. Lindsay ◽

C.G. Berlind ◽

C.A. Ahern ◽

B.D. Smith

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

Learning Approach ◽

Breast Cancer Patients ◽

Machine Learning Approach

Download Full-text

A machine learning approach to predict healthcare cost of breast cancer patients

Scientific Reports ◽

10.1038/s41598-021-91580-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Pratyusha Rakshit ◽

Onintze Zaballa ◽

Aritz Pérez ◽

Elisa Gómez-Inhiesto ◽

Maria T. Acaiturri-Ayesta ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

Healthcare Cost ◽

Percentage Error ◽

Learning Approach ◽

Early Prediction ◽

Breast Cancer Patients ◽

Machine Learning Approach ◽

Clinical Records

AbstractThis paper presents a novel machine learning approach to perform an early prediction of the healthcare cost of breast cancer patients. The learning phase of our prediction method considers the following two steps: (1) in the first step, the patients are clustered taking into account the sequences of actions undergoing similar clinical activities and ensuring similar healthcare costs, and (2) a Markov chain is then learned for each group to describe the action-sequences of the patients in the cluster. A two step procedure is undertaken in the prediction phase: (1) first, the healthcare cost of a new patient’s treatment is estimated based on the average healthcare cost of its k-nearest neighbors in each group, and (2) finally, an aggregate measure of the healthcare cost estimated by each group is used as the final predicted cost. Experiments undertaken reveal a mean absolute percentage error as small as 6%, even when half of the clinical records of a patient is available, substantiating the early prediction capability of the proposed method. Comparative analysis substantiates the superiority of the proposed algorithm over the state-of-the-art techniques.

Download Full-text

Supervised Machine Learning Approach For The Prediction of Breast Cancer

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262403 ◽

2020 ◽

Author(s):

Tarun Jain ◽

Vivek Kumar Verma ◽

Mahek Agarwal ◽

Anju Yadav ◽

Ashish Jain

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Distribution Grids Fault Location employing ST based Optimized Machine Learning Approach

Energies ◽

10.3390/en11092328 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2328 ◽

Cited By ~ 12

Author(s):

Md Shafiullah ◽

M. Abido ◽

Taher Abdel-Fattah

Keyword(s):

Machine Learning ◽

Fault Location ◽

Percentage Error ◽

Support Vector ◽

Learning Approach ◽

Efficiency Coefficient ◽

Learning Tools ◽

Performance Indices ◽

Machine Learning Approach ◽

Distribution Grids

Precise information of fault location plays a vital role in expediting the restoration process, after being subjected to any kind of fault in power distribution grids. This paper proposed the Stockwell transform (ST) based optimized machine learning approach, to locate the faults and to identify the faulty sections in the distribution grids. This research employed the ST to extract useful features from the recorded three-phase current signals and fetches them as inputs to different machine learning tools (MLT), including the multilayer perceptron neural networks (MLP-NN), support vector machines (SVM), and extreme learning machines (ELM). The proposed approach employed the constriction-factor particle swarm optimization (CF-PSO) technique, to optimize the parameters of the SVM and ELM for their better generalization performance. Hence, it compared the obtained results of the test datasets in terms of the selected statistical performance indices, including the root mean squared error (RMSE), mean absolute percentage error (MAPE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), coefficient of determination (R2), Willmott’s index of agreement (WIA), and Nash–Sutcliffe model efficiency coefficient (NSEC) to confirm the effectiveness of the developed fault location scheme. The satisfactory values of the statistical performance indices, indicated the superiority of the optimized machine learning tools over the non-optimized tools in locating faults. In addition, this research confirmed the efficacy of the faulty section identification scheme based on overall accuracy. Furthermore, the presented results validated the robustness of the developed approach against the measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle.

Download Full-text

Hybrid Machine Learning Approach for Skin Disease Detection Using Optimal Support Vector Machine

Intelligent Data Communication Technologies and Internet of Things - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-34080-3_73 ◽

2019 ◽

pp. 647-658

Author(s):

K. Melbin ◽

Y. Jacob Vetha Raj

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Skin Disease ◽

Support Vector ◽

Disease Detection ◽

Learning Approach ◽

Machine Learning Approach ◽

Hybrid Machine

Download Full-text

Arabic English Cross-Lingual Plagiarism Detection Based on Keyphrases Extraction, Monolingual and Machine Learning Approach

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2018/v2i330075 ◽

2019 ◽

pp. 1-12

Author(s):

Mokhtar Al-Suhaiqi ◽

Muneer A. S. Hazaa ◽

Mohammed Albared

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Detection Methods ◽

Support Vector ◽

Svm Classifier ◽

Learning Approach ◽

Plagiarism Detection ◽

Machine Learning Approach ◽

Cross Lingual ◽

Cross Language

Due to rapid growth of research articles in various languages, cross-lingual plagiarism detection problem has received increasing interest in recent years. Cross-lingual plagiarism detection is more challenging task than monolingual plagiarism detection. This paper addresses the problem of cross-lingual plagiarism detection (CLPD) by proposing a method that combines keyphrases extraction, monolingual detection methods and machine learning approach. The research methodology used in this study has facilitated to accomplish the objectives in terms of designing, developing, and implementing an efficient Arabic – English cross lingual plagiarism detection. This paper empirically evaluates five different monolingual plagiarism detection methods namely i)N-Grams Similarity, ii)Longest Common Subsequence, iii)Dice Coefficient, iv)Fingerprint based Jaccard Similarity and v) Fingerprint based Containment Similarity. In addition, three machine learning approaches namely i) naïve Bayes, ii) Support Vector Machine, and iii) linear logistic regression classifiers are used for Arabic-English Cross-language plagiarism detection. Several experiments are conducted to evaluate the performance of the key phrases extraction methods. In addition, Several experiments to investigate the performance of machine learning techniques to find the best method for Arabic-English Cross-language plagiarism detection. According to the experiments of Arabic-English Cross-language plagiarism detection, the highest result was obtained using SVM classifier with 92% f-measure. In addition, the highest results were obtained by all classifiers are achieved, when most of the monolingual plagiarism detection methods are used.

Download Full-text

Driver Stress State Evaluation by Means of Thermal Imaging: A Supervised Machine Learning Approach Based on ECG Signal

Applied Sciences ◽

10.3390/app10165673 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5673 ◽

Cited By ~ 2

Author(s):

Daniela Cardone ◽

David Perpetuini ◽

Chiara Filippini ◽

Edoardo Spadolini ◽

Lorenza Mancini ◽

...

Keyword(s):

Machine Learning ◽

Stress State ◽

Thermal Imaging ◽

Driving Simulator ◽

Supervised Machine Learning ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach ◽

Thermal Features ◽

Driver Stress

Traffic accidents determine a large number of injuries, sometimes fatal, every year. Among other factors affecting a driver’s performance, an important role is played by stress which can decrease decision-making capabilities and situational awareness. In this perspective, it would be beneficial to develop a non-invasive driver stress monitoring system able to recognize the driver’s altered state. In this study, a contactless procedure for drivers’ stress state assessment by means of thermal infrared imaging was investigated. Thermal imaging was acquired during an experiment on a driving simulator, and thermal features of stress were investigated with comparison to a gold-standard metric (i.e., the stress index, SI) extracted from contact electrocardiography (ECG). A data-driven multivariate machine learning approach based on a non-linear support vector regression (SVR) was employed to estimate the SI through thermal features extracted from facial regions of interest (i.e., nose tip, nostrils, glabella). The predicted SI showed a good correlation with the real SI (r = 0.61, p = ~0). A two-level classification of the stress state (STRESS, SI ≥ 150, versus NO STRESS, SI < 150) was then performed based on the predicted SI. The ROC analysis showed a good classification performance with an AUC of 0.80, a sensitivity of 77%, and a specificity of 78%.

Download Full-text