MODIS Fractional Snow Cover Mapping Using Machine Learning Technology in a Mountainous Area

To improve the poor accuracy of the MODIS (Moderate Resolution Imaging Spectroradiometer) daily fractional snow cover product over the complex terrain of the Tibetan Plateau (RMSE = 0.30), unmanned aerial vehicle and machine learning technologies are employed to map the fractional snow cover based on MODIS over this terrain. Three machine learning models, including random forest, support vector machine, and back-propagation artificial neural network models, are trained and compared in this study. The results indicate that compared with the MODIS daily fractional snow cover product, the introduction of a highly accurate snow map acquired by unmanned aerial vehicles as a reference into machine learning models can significantly improve the MODIS fractional snow cover mapping accuracy. The random forest model shows the best accuracy among the three machine learning models, with an RMSE (root-mean-square error) of 0.23, especially over forestland and shrubland, with RMSEs of 0.13 and 0.18, respectively. Although the accuracy of the support vector machine and back-propagation artificial neural network models are worse over forestland and shrubland, their average errors are still better than that of MOD10A1. Different fractional snow cover gradients also affect the accuracy of the machine learning algorithms. Nevertheless, the random forest model remains stable in different fractional snow cover gradients and is, therefore, the best machine learning algorithm for MODIS fractional snow cover mapping in Tibetan Plateau areas with complex terrain and severely fragmented snow cover.

Download Full-text

COMPARATIVE ANALYSIS OF MACHINE LEARNING MODELS AND REGRESSIONS FOR CAR PRICE PREDICTION

Bulletin of V. N. Karazin Kharkiv National University Economic Series ◽

10.26565/2311-2379-2019-97-04 ◽

2019 ◽

Keyword(s):

Neural Network ◽

Machine Learning ◽

Comparative Analysis ◽

Random Forest ◽

Network Models ◽

Gradient Boosting ◽

Learning Models ◽

Neural Network Models ◽

Boosting Algorithms ◽

Machine Learning Models

The purpose of the research described in this article is a comparative analysis of the predictive qualities of some models of machine learning and regression. The factors for models are the consumer characteristics of a used car: brand, transmission type, drive type, engine type, mileage, body type, year of manufacture, seller's region in Ukraine, condition of the car, information about accident, average price for analogue in Ukraine, engine volume, quantity of doors, availability of extra equipment, quantity of passenger’s seats, the first registration of a car, car was driven from abroad or not. Qualitative variables has been encoded as binary variables or by mean target encoding. The information about more than 200 thousand cars have been used for modeling. All models have been evaluated in the Python Software using Sklearn, Catboost, StatModels and Keras libraries. The following regression models and machine learning models were considered in the course of the study: linear regression; polynomial regression; decision tree; neural network; models based on "k-nearest neighbors", "random forest", "gradient boosting" algorithms; ensemble of models. The article presents the best in terms of quality (according to the criteria R2, MAE, MAD, MAPE) options from each class of models. It has been found that the best way to predict the price of a passenger car is through non-linear models. The results of the modeling show that the dependence between the price of a car and its characteristics is best described by the ensemble of models, which includes a neural network, models using "random forest" and "gradient boosting" algorithms. The ensemble of models showed an average relative approximation error of 11.2% and an average relative forecast error of 14.34%. All nonlinear models for car price have approximately the same predictive qualities (the difference between the MAPE within 2%) in this research.

Download Full-text

CPT Data Interpretation Employing Different Machine Learning Techniques

Geosciences ◽

10.3390/geosciences11070265 ◽

2021 ◽

Vol 11 (7) ◽

pp. 265

Author(s):

Stefan Rauter ◽

Franz Tschuchnigg

Keyword(s):

Machine Learning ◽

Grain Size ◽

Random Forest ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Cone Penetration ◽

Tip Resistance ◽

Machine Learning Models

The classification of soils into categories with a similar range of properties is a fundamental geotechnical engineering procedure. At present, this classification is based on various types of cost- and time-intensive laboratory and/or in situ tests. These soil investigations are essential for each individual construction site and have to be performed prior to the design of a project. Since Machine Learning could play a key role in reducing the costs and time needed for a suitable site investigation program, the basic ability of Machine Learning models to classify soils from Cone Penetration Tests (CPT) is evaluated. To find an appropriate classification model, 24 different Machine Learning models, based on three different algorithms, are built and trained on a dataset consisting of 1339 CPT. The applied algorithms are a Support Vector Machine, an Artificial Neural Network and a Random Forest. As input features, different combinations of direct cone penetration test data (tip resistance qc, sleeve friction fs, friction ratio Rf, depth d), combined with “defined”, thus, not directly measured data (total vertical stresses σv, effective vertical stresses σ’v and hydrostatic pore pressure u0), are used. Standard soil classes based on grain size distributions and soil classes based on soil behavior types according to Robertson are applied as targets. The different models are compared with respect to their prediction performance and the required learning time. The best results for all targets were obtained with models using a Random Forest classifier. For the soil classes based on grain size distribution, an accuracy of about 75%, and for soil classes according to Robertson, an accuracy of about 97–99%, was reached.

Download Full-text

Application of Natural Language Processing with Supervised Machine Learning Techniques to Predict the Overall Drugs Performance

AJIT-e Online Academic Journal of Information Technology ◽

10.5824/ajite.2020.01.001.x ◽

2020 ◽

Vol 11 (40) ◽

pp. 8-23

Author(s):

Pius MARTHIN ◽

Duygu İÇEN

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Semantic Analysis ◽

Classification Tree ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.

Download Full-text

Machine learning model for predicting the optimal depth of tracheal tube insertion in pediatric patients: A retrospective cohort study

PLoS ONE ◽

10.1371/journal.pone.0257069 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257069

Author(s):

Jae-Geum Shim ◽

Kyoung-Ho Ryu ◽

Sung Hyun Lee ◽

Eun-Ah Cho ◽

Sungho Lee ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Random Forest ◽

Tracheal Tube ◽

Pediatric Patients ◽

Support Vector ◽

Learning Models ◽

Machine Learning Models

Objective To construct a prediction model for optimal tracheal tube depth in pediatric patients using machine learning. Methods Pediatric patients aged <7 years who received post-operative ventilation after undergoing surgery between January 2015 and December 2018 were investigated in this retrospective study. The optimal location of the tracheal tube was defined as the median of the distance between the upper margin of the first thoracic(T1) vertebral body and the lower margin of the third thoracic(T3) vertebral body. We applied four machine learning models: random forest, elastic net, support vector machine, and artificial neural network and compared their prediction accuracy to three formula-based methods, which were based on age, height, and tracheal tube internal diameter(ID). Results For each method, the percentage with optimal tracheal tube depth predictions in the test set was calculated as follows: 79.0 (95% confidence interval [CI], 73.5 to 83.6) for random forest, 77.4 (95% CI, 71.8 to 82.2; P = 0.719) for elastic net, 77.0 (95% CI, 71.4 to 81.8; P = 0.486) for support vector machine, 76.6 (95% CI, 71.0 to 81.5; P = 1.0) for artificial neural network, 66.9 (95% CI, 60.9 to 72.5; P < 0.001) for the age-based formula, 58.5 (95% CI, 52.3 to 64.4; P< 0.001) for the tube ID-based formula, and 44.4 (95% CI, 38.3 to 50.6; P < 0.001) for the height-based formula. Conclusions In this study, the machine learning models predicted the optimal tracheal tube tip location for pediatric patients more accurately than the formula-based methods. Machine learning models using biometric variables may help clinicians make decisions regarding optimal tracheal tube depth in pediatric patients.

Download Full-text

Machine Learning Models for Forecasting of Individual Stocks Price Patterns

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Pattern Engineering System Development for Big Data Analytics ◽

10.4018/978-1-5225-3870-7.ch008 ◽

2018 ◽

pp. 111-129 ◽

Cited By ~ 1

Author(s):

Dilip Singh Sisodia ◽

Sagar Jadhav

Keyword(s):

Machine Learning ◽

Time Window ◽

Stock Exchange ◽

Back Propagation ◽

Future Price ◽

Support Vector ◽

Learning Models ◽

Closing Price ◽

Feature Values ◽

Machine Learning Models

Stock investors always consider potential future prices before investing in any stock for making a profit. A large number of studies are found on the prediction of stock market indices. However, the focus on individual stock closing price predictions well ahead of time is limited. In this chapter, a comparative study of machine-learning-based models is used for the prediction of the closing price of a particular stock. The proposed models are designed using back propagation neural networks (BPNN), support vector regression (SVR) with SMOReg, and linear regression (LR) for the prediction of the closing price of individual stocks. A total of 37 technical indicators (features) derived from historical closing prices of stocks are considered for predicting the future price of stock in a time window of five days. The experiment is performed on stocks listed on Bombay Stock Exchange (BSS), India. The model is trained and tested using feature values extracted from the past five-year closing price of stocks of different sectors including aviation, pharma, banking, entertainment, and IT.

Download Full-text

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

2020 RIVF International Conference on Computing and Communication Technologies (RIVF) ◽

10.1109/rivf48685.2020.9140745 ◽

2020 ◽

Cited By ~ 2

Author(s):

Son T. Luu ◽

Hung P. Nguyen ◽

Kiet Van Nguyen ◽

Ngan Luu-Thuy Nguyen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Hate Speech ◽

Network Models ◽

Learning Models ◽

Neural Network Models ◽

Speech Detection ◽

Machine Learning Models

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

10.20944/preprints202010.0186.v1 ◽

2020 ◽

Author(s):

Nelson Yego ◽

Juma Kasozi ◽

Joseph Nkrunziza

Keyword(s):

Machine Learning ◽

Random Forest ◽

Characteristic Curve ◽

Confusion Matrix ◽

Gradient Boosting ◽

Support Vector ◽

Sampled Data ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Download Full-text

Solar Power Prediction via Support Vector Machine and Random Forest

E3S Web of Conferences ◽

10.1051/e3sconf/20186901004 ◽

2018 ◽

Vol 69 ◽

pp. 01004 ◽

Cited By ~ 2

Author(s):

Chih-Feng Yen ◽

He-Yen Hsieh ◽

Kuan-Wu Su ◽

Min-Chieh Yu ◽

Jenq-Shiou Leu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Output Power ◽

Environmental Parameters ◽

Energy Market ◽

Support Vector ◽

Learning Models ◽

Power Prediction ◽

Machine Learning Models

Due to the variability and instability of photovoltaic (PV) output, the accurate prediction of PV output power plays a major role in energy market for PV operators to optimize their profits in energy market. In order to predict PV output, environmental parameters such as temperature, humidity, rainfall and win speed are gathered as indicators and different machine learning models are built for each solar panel inverters. In this paper, we propose two different kinds of solar prediction schemes for one-hour ahead forecasting of solar output using Support Vector Machine (SVM) and Random Forest (RF).

Download Full-text

Vibration characteristic analyses of medium-and small-span girder bridge groups in highway systems based on machine learning models

Advances in Structural Engineering ◽

10.1177/1369433221997722 ◽

2021 ◽

pp. 136943322199772

Author(s):

Guanya Lu ◽

Kehai Wang ◽

Weizuo Guo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Large Scale ◽

Vibration Characteristics ◽

Structural Vibration ◽

Learning Models ◽

Longitudinal Vibrations ◽

Neural Network Models ◽

Artificial Neural ◽

Machine Learning Models

There are large amounts of small-and medium-span girder bridges which bear structural similarity, while the large-scale bridge structures are generally limited in the timely applications of structural vibration characteristics. Therefore, in this study a framework based on machine learning models was proposed to analyze the vibration characteristics of specific line bridge groups. The probability distributions of structural, geometric, and material properties of bridge groups in specific lines were obtained using statistical tools and a Latin hypercube sampling method was used to generate reasonable sample sets for the bridges group, and parameterized finite element models of the bridges were established. Then, the optimal models were tuned and determined to predict fundamental mode and period by the 10-fold cross-validation method applying the numerical simulation results. This study’s results showed that the random forest models divided the vibration modes of the bridge groups into the longitudinal vibrations of the main girders and the longitudinal vibrations of the adjacent spans and side piers with a classification accuracy of greater than 90%, while the artificial neural network models exhibited the lowest normalized mean square error for the periods. The periods mainly ranged between 0.7 and 1.5 s. Furthermore, the bearing settings, ratios of the pier height to section diameters, and boundary types were determined to be the most significant properties influencing the fundamental modes and periods of the examined bridges, by respectively observing the reduced value of the random forest Gini indices and distribution of the generalized weight value of the input variables in artificial neural networks. This study provides an intelligent and efficient method for obtaining vibration characteristics of bridges group for a specific network.

Download Full-text

Machine learning-based analysis of adolescent gambling factors

Journal of Behavioral Addictions ◽

10.1556/2006.2020.00063 ◽

2020 ◽

Vol 9 (3) ◽

pp. 734-743

Author(s):

Wonju Seo ◽

Namho Kim ◽

Sang-Kyu Lee ◽

Sung-Min Park

Keyword(s):

Machine Learning ◽

Random Forest ◽

Problem Gambling ◽

Online Gambling ◽

Easy Access ◽

Support Vector ◽

Gambling Problems ◽

Learning Models ◽

The Past ◽

Machine Learning Models

AbstractBackground and aimsProblem gambling among adolescents has recently attracted attention because of easy access to gambling in online environments and its serious effects on adolescent lives. We proposed a machine learning-based analysis method for predicting the degree of problem gambling.MethodsOf the 17,520 respondents in the 2018 National Survey on Youth Gambling Problems dataset (collected by the Korea Center on Gambling Problems), 5,045 students who had gambled in the past 3 months were included in this study. The Gambling Problem Severity Scale was used to provide the binary label information. After the random forest-based feature selection method, we trained four models: random forest (RF), support vector machine (SVM), extra trees (ETs), and ridge regression.ResultsThe online gambling behavior in the past 3 months, experience of winning money or goods, and gambling of personal relationship were three factors exhibiting the high feature importance. All four models demonstrated an area under the curve (AUC) of >0.7; ET showed the highest AUC (0.755), RF demonstrated the highest accuracy (71.8%), and SVM showed the highest F1 score (0.507) on a testing set.DiscussionThe results indicate that machine learning models can convey meaningful information to support predictions regarding the degree of problem gambling.ConclusionMachine learning models trained using important features showed moderate accuracy in a large-scale Korean adolescent dataset. These findings suggest that the method will help screen adolescents at risk of problem gambling. We believe that expandable machine learning-based approaches will become more powerful as more datasets are collected.

Download Full-text