CPT Data Interpretation Employing Different Machine Learning Techniques

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Analysis of Machine Learning Techniques Applied to Sensory Detection of Vehicles in Intelligent Crosswalks

Sensors ◽

10.3390/s20216019 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6019

Author(s):

José Manuel Lozano Domínguez ◽

Faroq Al-Tam ◽

Tomás de J. Mateo Sanguino ◽

Noélia Correia

Keyword(s):

Machine Learning ◽

Smart Cities ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Fuzzy Classifier ◽

Logistic Regression Models ◽

The Road ◽

Learning Agent ◽

Machine Learning Models

Improving road safety through artificial intelligence-based systems is now crucial turning smart cities into a reality. Under this highly relevant and extensive heading, an approach is proposed to improve vehicle detection in smart crosswalks using machine learning models. Contrarily to classic fuzzy classifiers, machine learning models do not require the readjustment of labels that depend on the location of the system and the road conditions. Several machine learning models were trained and tested using real traffic data taken from urban scenarios in both Portugal and Spain. These include random forest, time-series forecasting, multi-layer perceptron, support vector machine, and logistic regression models. A deep reinforcement learning agent, based on a state-of-the-art double-deep recurrent Q-network, is also designed and compared with the machine learning models just mentioned. Results show that the machine learning models can efficiently replace the classic fuzzy classifier.

Download Full-text

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

AJIT-e Online Academic Journal of Information Technology ◽

10.5824/ajite.2020.01.001.x ◽

2020 ◽

Vol 11 (40) ◽

pp. 8-23

Author(s):

Pius MARTHIN ◽

Duygu İÇEN

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Semantic Analysis ◽

Classification Tree ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Download Full-text

Support Vector Machine And K-Nearest Neighbor Based Liver Disease Classification Model

Indonesian Journal of electronics, electromedical engineering, and medical informatics ◽

10.35882/ijeeemi.v3i1.2 ◽

2021 ◽

Vol 3 (1) ◽

pp. 9-14

Author(s):

Tsehay Admassu Assegie

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Liver Disease ◽

Classification Model ◽

Support Vector ◽

Disease Prediction ◽

Accuracy Score ◽

Learning Models ◽

Accuracy And Precision ◽

Machine Learning Models

Machine-learning approaches have become greatly applicable in disease diagnosis and prediction process. This is because of the accuracy and better precision of the machine learning models in disease prediction. However, different machine learning models have different accuracy and precision on disease prediction. Selecting the better model that would result in better disease prediction accuracy and precision is an open research problem. In this study, we have proposed machine learning model for liver disease prediction using Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) learning algorithms and we have evaluated the accuracy and precision of the models on liver disease prediction using the Indian liver disease data repository. The analysis of result showed 82.90% accuracy for SVM and 72.64% accuracy for the KNN algorithm. Based on the accuracy score of SVM and KNN on experimental test results, the SVM is better in performance on the liver disease prediction than the KNN algorithm.

Download Full-text

Machine learning model for predicting the optimal depth of tracheal tube insertion in pediatric patients: A retrospective cohort study

PLoS ONE ◽

10.1371/journal.pone.0257069 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257069

Author(s):

Jae-Geum Shim ◽

Kyoung-Ho Ryu ◽

Sung Hyun Lee ◽

Eun-Ah Cho ◽

Sungho Lee ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Random Forest ◽

Tracheal Tube ◽

Pediatric Patients ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.

Download Full-text

Rice Crop Detection Using LSTM, Bi-LSTM, and Machine Learning Models from Sentinel-1 Time Series

Remote Sensing ◽

10.3390/rs12162655 ◽

2020 ◽

Vol 12 (16) ◽

pp. 2655 ◽

Cited By ~ 4

Author(s):

Hugo Crisóstomo de Castro Filho ◽

Osmar Abílio de Carvalho Júnior ◽

Osmar Luiz Ferreira de Carvalho ◽

Pablo Pozzobon de Bem ◽

Rebeca dos Santos de Moura ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Rio Grande ◽

Machine Learning Techniques ◽

Support Vector ◽

High Temporal Resolution ◽

Rio Grande Do Sul ◽

Learning Models ◽

Free Data ◽

Machine Learning Models

The Synthetic Aperture Radar (SAR) time series allows describing the rice phenological cycle by the backscattering time signature. Therefore, the advent of the Copernicus Sentinel-1 program expands studies of radar data (C-band) for rice monitoring at regional scales, due to the high temporal resolution and free data distribution. Recurrent Neural Network (RNN) model has reached state-of-the-art in the pattern recognition of time-sequenced data, obtaining a significant advantage at crop classification on the remote sensing images. One of the most used approaches in the RNN model is the Long Short-Term Memory (LSTM) model and its improvements, such as Bidirectional LSTM (Bi-LSTM). Bi-LSTM models are more effective as their output depends on the previous and the next segment, in contrast to the unidirectional LSTM models. The present research aims to map rice crops from Sentinel-1 time series (band C) using LSTM and Bi-LSTM models in West Rio Grande do Sul (Brazil). We compared the results with traditional Machine Learning techniques: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Normal Bayes (NB). The developed methodology can be subdivided into the following steps: (a) acquisition of the Sentinel time series over two years; (b) data pre-processing and minimizing noise from 3D spatial-temporal filters and smoothing with Savitzky-Golay filter; (c) time series classification procedures; (d) accuracy analysis and comparison among the methods. The results show high overall accuracy and Kappa (>97% for all methods and metrics). Bi-LSTM was the best model, presenting statistical differences in the McNemar test with a significance of 0.05. However, LSTM and Traditional Machine Learning models also achieved high accuracy values. The study establishes an adequate methodology for mapping the rice crops in West Rio Grande do Sul.

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

10.20944/preprints202010.0186.v1 ◽

2020 ◽

Author(s):

Nelson Yego ◽

Juma Kasozi ◽

Joseph Nkrunziza

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Sampled Data ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Download Full-text

Solar Power Prediction via Support Vector Machine and Random Forest

E3S Web of Conferences ◽

10.1051/e3sconf/20186901004 ◽

2018 ◽

Vol 69 ◽

pp. 01004 ◽

Cited By ~ 2

Author(s):

Chih-Feng Yen ◽

He-Yen Hsieh ◽

Kuan-Wu Su ◽

Min-Chieh Yu ◽

Jenq-Shiou Leu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Output Power ◽

Environmental Parameters ◽

Energy Market ◽

Support Vector ◽

Learning Models ◽

Power Prediction ◽

Machine Learning Models

Due to the variability and instability of photovoltaic (PV) output, the accurate prediction of PV output power plays a major role in energy market for PV operators to optimize their profits in energy market. In order to predict PV output, environmental parameters such as temperature, humidity, rainfall and win speed are gathered as indicators and different machine learning models are built for each solar panel inverters. In this paper, we propose two different kinds of solar prediction schemes for one-hour ahead forecasting of solar output using Support Vector Machine (SVM) and Random Forest (RF).

Download Full-text

MODIS Fractional Snow Cover Mapping Using Machine Learning Technology in a Mountainous Area

Remote Sensing ◽

10.3390/rs12060962 ◽

2020 ◽

Vol 12 (6) ◽

pp. 962 ◽

Cited By ~ 3

Author(s):

Changyu Liu ◽

Xiaodong Huang ◽

Xubing Li ◽

Tiangang Liang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Snow Cover ◽

Back Propagation ◽

Network Models ◽

Support Vector ◽

Learning Models ◽

Neural Network Models ◽

Fractional Snow Cover ◽

Machine Learning Models

To improve the poor accuracy of the MODIS (Moderate Resolution Imaging Spectroradiometer) daily fractional snow cover product over the complex terrain of the Tibetan Plateau (RMSE = 0.30), unmanned aerial vehicle and machine learning technologies are employed to map the fractional snow cover based on MODIS over this terrain. Three machine learning models, including random forest, support vector machine, and back-propagation artificial neural network models, are trained and compared in this study. The results indicate that compared with the MODIS daily fractional snow cover product, the introduction of a highly accurate snow map acquired by unmanned aerial vehicles as a reference into machine learning models can significantly improve the MODIS fractional snow cover mapping accuracy. The random forest model shows the best accuracy among the three machine learning models, with an RMSE (root-mean-square error) of 0.23, especially over forestland and shrubland, with RMSEs of 0.13 and 0.18, respectively. Although the accuracy of the support vector machine and back-propagation artificial neural network models are worse over forestland and shrubland, their average errors are still better than that of MOD10A1. Different fractional snow cover gradients also affect the accuracy of the machine learning algorithms. Nevertheless, the random forest model remains stable in different fractional snow cover gradients and is, therefore, the best machine learning algorithm for MODIS fractional snow cover mapping in Tibetan Plateau areas with complex terrain and severely fragmented snow cover.

Download Full-text

On classifying sepsis heterogeneity in the ICU: insight using machine learning

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz211 ◽

2020 ◽

Vol 27 (3) ◽

pp. 437-443 ◽

Cited By ~ 4

Author(s):

Zina M Ibrahim ◽

Honghan Wu ◽

Ahmed Hamoud ◽

Lukas Stappen ◽

Richard J B Dobson ◽

...

Keyword(s):

Machine Learning ◽

Organ Dysfunction ◽

Added Value ◽

Classification Model ◽

Superior Performance ◽

Support Vector ◽

Learning Models ◽

Patients At Risk ◽

Improved Performance ◽

Machine Learning Models

Abstract Objectives Current machine learning models aiming to predict sepsis from electronic health records (EHR) do not account 20 for the heterogeneity of the condition despite its emerging importance in prognosis and treatment. This work demonstrates the added value of stratifying the types of organ dysfunction observed in patients who develop sepsis in the intensive care unit (ICU) in improving the ability to recognize patients at risk of sepsis from their EHR data. Materials and Methods Using an ICU dataset of 13 728 records, we identify clinically significant sepsis subpopulations with distinct organ dysfunction patterns. We perform classification experiments with random forest, gradient boost trees, and support vector machines, using the identified subpopulations to distinguish patients who develop sepsis in the ICU from those who do not. Results The classification results show that features selected using sepsis subpopulations as background knowledge yield a superior performance in distinguishing septic from non-septic patients regardless of the classification model used. The improved performance is especially pronounced in specificity, which is a current bottleneck in sepsis prediction machine learning models. Conclusion Our findings can steer machine learning efforts toward more personalized models for complex conditions including sepsis.

Download Full-text