Integrative Predictive Modeling of Metastasis in Melanoma Cancer Based on MicroRNA, mRNA, and DNA Methylation Data

Introduction: Despite the significant progress in understanding cancer biology, the deduction of metastasis is still a challenge in the clinic. Transcriptional regulation is one of the critical mechanisms underlying cancer development. Even though mRNA, microRNA, and DNA methylation mechanisms have a crucial impact on the metastatic outcome, there are no comprehensive data mining models that combine all transcriptional regulation aspects for metastasis prediction. This study focused on identifying the regulatory impact of genetic biomarkers for monitoring metastatic molecular signatures of melanoma by investigating the consolidated effect of miRNA, mRNA, and DNA methylation.Method: We developed multiple machine learning models to distinguish the metastasis by integrating miRNA, mRNA, and DNA methylation markers. We used the TCGA melanoma dataset to differentiate between metastatic melanoma samples by assessing a set of predictive models. For this purpose, machine learning models using a support vector machine with different kernels, artificial neural networks, random forests, AdaBoost, and Naïve Bayes are compared. An iterative combination of differentially expressed miRNA, mRNA, and methylation signatures is used as a candidate marker to reveal each new biomarker category’s impact. In each iteration, the performances of the combined models are calculated. During all comparisons, the choice of the feature selection method and under and oversampling approaches are analyzed. Selected biomarkers of the highest performing models are further analyzed for the biological interpretation of functional enrichment.Results: In the initial model, miRNA biomarkers can identify metastatic melanoma with an 81% F-score. The addition of mRNA markers upon miRNA increased the F-score to 92%. In the final integrated model, the addition of the methylation data resulted in a similar F-score of 92% but produced a stable model with low variance across multiple trials.Conclusion: Our results support the role of miRNA regulation in metastatic melanoma as miRNA markers model metastasis outcomes with high accuracy. Moreover, the integrated evaluation of miRNA with mRNA and methylation biomarkers increases the model’s power. It populates selected biomarkers on the metastasis-associated pathways of melanoma, such as the “osteoclast”, “Rap1 signaling”, and “chemokine signaling” pathways.Source Code:https://github.com/aysegul-kt/MelonomaMetastasisPrediction/

Download Full-text

Machine Learning Models For Predicting Lymph Node And Distant Metastases In Colorectal Cancer

10.21203/rs.3.rs-45372/v1 ◽

2020 ◽

Author(s):

Xiaoyu Dai ◽

Siqi Dai ◽

Xi Yang ◽

Jing Zhuang ◽

Jin Liu ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Lymph Node ◽

Distant Metastases ◽

Functional Enrichment ◽

Support Vector ◽

Biological Functions ◽

Learning Models ◽

Optimal Model ◽

Machine Learning Models

Abstract Background: Colorectal cancer (CRC) is the third most common malignancy in the world and metastasis is responsible for a major proportion of the cancer-related deaths in CRC patients.Aims: To construct machine learning models for predicting lymph node and distant metastases in colorectal cancer and analyze biological functions features of metastasis-related genes.Methods: RNA-seq and miRNA-seq data as well as corresponding clinical data from colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) were obtained from The Cancer Genome Atlas (TCGA) database. The differentially expressed RNAs (DE-RNAs) in non-LNM (N0) and LNM (N1/N2) as well as non-distant metastases (M0) and distant metastases (M1) were analyzed. Six machine learning models including logistic regression (LR), random forest (RF), support vector machine (SVM), Catboost, gradient boosting decision tree (GBDT), and artificial neural network (NN) were constructed to predict cancer metastasis and the feature genes of the optimal model were further analyzed by functional enrichment, protein-protein interaction (PPI) network, and drug-target analyses.Results: Differential RNA expression profiles of LNM and non-LNM as well as M0 vs. M1 were observed in both COAD and READ samples. NN model was determined to be the optimal model for predicting distant metastases, while Catboost and LR models were the optimal models for predicting LNM in COAD and READ samples, respectively. PPI analysis indicated that KIR2DL4, chemokine-related genes CXCL9/10/11/13 and CCL25, and gamma-aminobutyric acid (GABA) receptor genes (GABRR1, GABRB2 and GABRA3) were key genes in metastasis. In addition, atorvastatin and eszopiclone were identified as potential therapeutic agents as they target these genes.Conclusions: We constructed six machine learning models for predicting colorectal cancer metastases and identify the optimal model. We analyzed biological functions features of metastasis-related RNAs in colorectal cancer.

Download Full-text

Machine learning models for predicting lymph node and distant metastases in colorectal cancer

10.21203/rs.3.rs-45372/v2 ◽

2020 ◽

Author(s):

Xiaoyu Dai ◽

Siqi Dai ◽

Xi Yang ◽

Jing Zhuang ◽

Jin Liu ◽

...

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Lymph Node ◽

Distant Metastases ◽

Functional Enrichment ◽

Support Vector ◽

Biological Functions ◽

Learning Models ◽

Optimal Model ◽

Machine Learning Models

Download Full-text

Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2020.101976 ◽

2020 ◽

Vol 110 ◽

pp. 101976

Author(s):

Laura Macías-García ◽

María Martínez-Ballesteros ◽

José María Luna-Romera ◽

José M. García-Heredia ◽

Jorge García-Gutiérrez ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Dna Methylation ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Methylation Data ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A Comparative Study of Machine Learning Models with Hyperparameter Optimization Algorithm for Mapping Mineral Prospectivity

Minerals ◽

10.3390/min11020159 ◽

2021 ◽

Vol 11 (2) ◽

pp. 159

Author(s):

Nan Lin ◽

Yongliang Chen ◽

Haiqi Liu ◽

Hanlin Liu

Keyword(s):

Machine Learning ◽

Swarm Intelligence ◽

Optimization Algorithm ◽

Geochemical Data ◽

Support Vector ◽

Learning Models ◽

Hyperparameter Optimization ◽

Mineral Prospectivity ◽

Swarm Intelligence Optimization ◽

Machine Learning Models

Selecting internal hyperparameters, which can be set by the automatic search algorithm, is important to improve the generalization performance of machine learning models. In this study, the geological, remote sensing and geochemical data of the Lalingzaohuo area in Qinghai province were researched. A multi-source metallogenic information spatial data set was constructed by calculating the Youden index for selecting potential evidence layers. The model for mapping mineral prospectivity of the study area was established by combining two swarm intelligence optimization algorithms, namely the bat algorithm (BA) and the firefly algorithm (FA), with different machine learning models. The receiver operating characteristic (ROC) and prediction-area (P-A) curves were used for performance evaluation and showed that the two algorithms had an obvious optimization effect. The BA and FA differentiated in improving multilayer perceptron (MLP), AdaBoost and one-class support vector machine (OCSVM) models; thus, there was no optimization algorithm that was consistently superior to the other. However, the accuracy of the machine learning models was significantly enhanced after optimizing the hyperparameters. The area under curve (AUC) values of the ROC curve of the optimized machine learning models were all higher than 0.8, indicating that the hyperparameter optimization calculation was effective. In terms of individual model improvement, the accuracy of the FA-AdaBoost model was improved the most significantly, with the AUC value increasing from 0.8173 to 0.9597 and the prediction/area (P/A) value increasing from 3.156 to 10.765, where the mineral targets predicted by the model occupied 8.63% of the study area and contained 92.86% of the known mineral deposits. The targets predicted by the improved machine learning models are consistent with the metallogenic geological characteristics, indicating that the swarm intelligence optimization algorithm combined with the machine learning model is an efficient method for mineral prospectivity mapping.

Download Full-text

Machine Learning Models for Finger Bend Evaluation using Implemented Low cost Flex Sensor

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35742 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 3605-3611

Author(s):

Pratyush Kaware

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Low Cost ◽

Learning Algorithms ◽

Cost Effective ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

In this paper a cost-effective sensor has been implemented to read finger bend signals, by attaching the sensor to a finger, so as to classify them based on the degree of bent as well as the joint about which the finger was being bent. This was done by testing with various machine learning algorithms to get the most accurate and consistent classifier. Finally, we found that Support Vector Machine was the best algorithm suited to classify our data, using we were able predict live state of a finger, i.e., the degree of bent and the joints involved. The live voltage values from the sensor were transmitted using a NodeMCU micro-controller which were converted to digital and uploaded on a database for analysis.

Download Full-text

Machine Learning Models for Sarcopenia Identification Based on Radiomic Features of Muscles in Computed Tomography

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168710 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8710

Author(s):

Young Jae Kim

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Ct Images ◽

Erector Spinae ◽

Support Vector ◽

Gray Level ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Nsclc Patients ◽

Machine Learning Models

The diagnosis of sarcopenia requires accurate muscle quantification. As an alternative to manual muscle mass measurement through computed tomography (CT), artificial intelligence can be leveraged for the automation of these measurements. Although generally difficult to identify with the naked eye, the radiomic features in CT images are informative. In this study, the radiomic features were extracted from L3 CT images of the entire muscle area and partial areas of the erector spinae collected from non-small cell lung carcinoma (NSCLC) patients. The first-order statistics and gray-level co-occurrence, gray-level size zone, gray-level run length, neighboring gray-tone difference, and gray-level dependence matrices were the radiomic features analyzed. The identification performances of the following machine learning models were evaluated: logistic regression, support vector machine (SVM), random forest, and extreme gradient boosting (XGB). Sex, coarseness, skewness, and cluster prominence were selected as the relevant features effectively identifying sarcopenia. The XGB model demonstrated the best performance for the entire muscle, whereas the SVM was the worst-performing model. Overall, the models demonstrated improved performance for the entire muscle compared to the erector spinae. Although further validation is required, the radiomic features presented here could become reliable indicators for quantifying the phenomena observed in the muscles of NSCLC patients, thus facilitating the diagnosis of sarcopenia.

Download Full-text