Estimation of Soil Cohesion Using Machine Learning Method: A Random Forest Approach

Soil cohesion (C) is one of the critical soil properties and is closely related to basic soil properties such as particle size distribution, pore size, and shear strength. Hence, it is mainly determined by experimental methods. However, the experimental methods are often time-consuming and costly. Therefore, developing an alternative approach based on machine learning (ML) techniques to solve this problem is highly recommended. In this study, machine learning models, namely, support vector machine (SVM), Gaussian regression process (GPR), and random forest (RF), were built based on a data set of 145 soil samples collected from the Da Nang-Quang Ngai expressway project, Vietnam. The database also includes six input parameters, that is, clay content, moisture content, liquid limit, plastic limit, specific gravity, and void ratio. The performance of the model was assessed by three statistical criteria, namely, the correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE). The results demonstrated that the proposed RF model could accurately predict soil cohesion with high accuracy (R = 0.891) and low error (RMSE = 3.323 and MAE = 2.511), and its predictive capability is better than SVM and GPR. Therefore, the RF model can be used as a cost-effective approach in predicting soil cohesion forces used in the design and inspection of constructions.

Download Full-text

Using Machine Learning-Based Algorithms to Analyze Erosion Rates of a Watershed in Northern Taiwan

Sustainability ◽

10.3390/su12052022 ◽

2020 ◽

Vol 12 (5) ◽

pp. 2022 ◽

Cited By ~ 1

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Random Forest ◽

Fuzzy Inference ◽

Absolute Error ◽

Organic Content ◽

Rank Test ◽

Support Vector ◽

Inference System ◽

Erosion Rates ◽

Northern Taiwan

This study continues a previous study with further analysis of watershed-scale erosion pin measurements. Three machine learning (ML) algorithms—Support Vector Machine (SVM), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Artificial Neural Network (ANN)—were used to analyze depth of erosion of a watershed (Shihmen reservoir) in northern Taiwan. In addition to three previously used statistical indexes (Mean Absolute Error, Root Mean Square of Error, and R-squared), Nash–Sutcliffe Efficiency (NSE) was calculated to compare the predictive performances of the three models. To see if there was a statistical difference between the three models, the Wilcoxon signed-rank test was used. The research utilized 14 environmental attributes as the input predictors of the ML algorithms. They are distance to river, distance to road, type of slope, sub-watershed, slope direction, elevation, slope class, rainfall, epoch, lithology, and the amount of organic content, clay, sand, and silt in the soil. Additionally, measurements of a total of 550 erosion pins installed on 55 slopes were used as the target variable of the model prediction. The dataset was divided into a training set (70%) and a testing set (30%) using the stratified random sampling with sub-watershed as the stratification variable. The results showed that the ANFIS model outperforms the other two algorithms in predicting the erosion rates of the study area. The average RMSE of the test data is 2.05 mm/yr for ANFIS, compared to 2.36 mm/yr and 2.61 mm/yr for ANN and SVM, respectively. Finally, the results of this study (ANN, ANFIS, and SVM) were compared with the previous study (Random Forest, Decision Tree, and multiple regression). It was found that Random Forest remains the best predictive model, and ANFIS is the second-best among the six ML algorithms.

Download Full-text

Waste Management System Fraud Detection Using Machine Learning Algorithms to Minimize Penalties Avoidance and Redemption Abuse

Recycling ◽

10.3390/recycling6040065 ◽

2021 ◽

Vol 6 (4) ◽

pp. 65

Author(s):

Ali Hewiagh ◽

Kannan Ramakrishnan ◽

Timothy Tzen Vun Yap ◽

Ching Seong Tan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Waste Management ◽

Management System ◽

Learning Algorithms ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Waste Management System

Online frauds have pernicious impacts on different system domains, including waste management systems. Fraudsters illegally obtain rewards for their recycling activities or avoid penalties for those who are required to recycle their own waste. Although some approaches have been introduced to prevent such fraudulent activities, the fraudsters continuously seek new ways to commit illegal actions. Machine learning technology has shown significant and impressive results in identifying new online fraud patterns in different system domains such as e-commerce, insurance, and banking. The purpose of this paper, therefore, is to analyze a waste management system and develop a machine learning model to detect fraud in the system. The intended system allows consumers, individuals, and organizations to track, monitor, and update their performance in their recycling activities. The data set provided by a waste management organization is used for the analysis and the model training. This data set contains transactions of users’ recycling activities and behaviors. Three machine learning algorithms, random forest, support vector machine, and multi-layer perceptron are used in the experiments and the best detection model is selected based on the model’s performance. Results show that each of these algorithms can be used for fraud detection in waste managements with high accuracy. The random forest algorithm produces the optimal model with an accuracy of 96.33%, F1-score of 95.20%, and ROC of 98.92%.

Download Full-text

A Machine Learning Approach to Determine Maturity Stages of Tomatoes

Oriental journal of computer science and technology ◽

10.13005/ojcst/10.03.19 ◽

2017 ◽

Vol 10 (3) ◽

pp. 683-690 ◽

Cited By ~ 2

Author(s):

Kamalpreet Kaur ◽

O.P. Guptata

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Fruits And Vegetables ◽

Support Vector ◽

Large Set ◽

Accuracy Score ◽

Data Set ◽

Maturity Stages ◽

Total Data

Maturity checking has become mandatory for the food industries as well as for the farmers so as to ensure that the fruits and vegetables are not diseased and are ripe. However, manual inspection leads to human error, unripe fruits and vegetables may decrease the production [3]. Thus, this study proposes a Tomato Classification system for determining maturity stages of tomato through Machine Learning which involves training of different algorithms like Decision Tree, Logistic Regression, Gradient Boosting, Random Forest, Support Vector Machine, K-NN and XG Boost. This system consists of image collection, feature extraction and training the classifiers on 80% of the total data. Rest 20% of the total data is used for the testing purpose. It is concluded from the results that the performance of the classifier depends on the size and kind of features extracted from the data set. The results are obtained in the form of Learning Curve, Confusion Matrix and Accuracy Score. It is observed that out of seven classifiers, Random Forest is successful with 92.49% accuracy due to its high capability of handling large set of data. Support Vector Machine has shown the least accuracy due to its inability to train large data set.

Download Full-text

Amino Acid Composition and Charge Based Prediction of Antisepsis Peptides by Random Forest Machine Learning Algorithm

10.1101/2021.09.26.461860 ◽

2021 ◽

Author(s):

Aayushi Rathore ◽

Anu Saini ◽

Navjot Kaur ◽

Aparna Singh ◽

Ojasvi Dutta ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Multiple Organ ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Organ Systems

ABSTRACTSepsis is a severe infectious disease with high mortality, and it occurs when chemicals released in the bloodstream to fight an infection trigger inflammation throughout the body and it can cause a cascade of changes that damage multiple organ systems, leading them to fail, even resulting in death. In order to reduce the possibility of sepsis or infection antiseptics are used and process is known as antisepsis. Antiseptic peptides (ASPs) show properties similar to antigram-negative peptides, antigram-positive peptides and many more. Machine learning algorithms are useful in screening and identification of therapeutic peptides and thus provide initial filters or built confidence before using time consuming and laborious experimental approaches. In this study, various machine learning algorithms like Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbour (KNN) and Logistic Regression (LR) were evaluated for prediction of ASPs. Moreover, the characteristics physicochemical features of ASPs were also explored to use them in machine learning. Both manual and automatic feature selection methodology was employed to achieve best performance of machine learning algorithms. A 5-fold cross validation and independent data set validation proved RF as the best model for prediction of ASPs. Our RF model showed an accuracy of 97%, Matthew’s Correlation Coefficient (MCC) of 0.93, which are indication of a robust and good model. To our knowledge this is the first attempt to build a machine learning classifier for prediction of ASPs.

Download Full-text

Malaria Prediction Model Using Machine Learning Algorithms

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i10.5655 ◽

2021 ◽

Vol 12 (10) ◽

pp. 7488-7496

Author(s):

Yusuf Aliyu Adamu, Et. al.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Model ◽

Public Awareness ◽

Health Sector ◽

Weather Condition ◽

Machine Learning Algorithms ◽

Support Vector ◽

African Countries ◽

Data Set

Measures have been taking to ensure the safety of individuals from the burden of vector-borne disease but it remains the causative agent of death than any other diseases in Africa. Many human lives are lost particularly of children below five years regardless of the efforts made. The effect of malaria is much more challenging mostly in developing countries. In 2019, 51% of malaria fatality happen in Africa which it increased by 20% in 2020 due to the covid-19 pandemic. The majority of African countries lack a proper or a sound health care system, proper environmental settlement, economic hardship, limited funding in the health sector, and absence of good policies to ensure the safety of individuals. Information has to become available to the peoples on the effect of malaria by making public awareness program to make sure people become acquainted with the disease so that certain measure can be maintained. The prediction model can help the policymakers to know more about the expected time of the malaria occurrence based on the existing features so that people will get to know the information regarding the disease on time, health equipment and medication to be made available by government through it policy. In this research weather condition, non-climatic features, and malaria cases are considered in designing the model for prediction purposes and also the performance of six different machine learning classifiers for instance Support Vector Machine, K-Nearest Neighbour, Random Forest, Decision Tree, Logistic Regression, and Naïve Bayes is identified and found that Random Forest is the best with accuracy (97.72%), AUC (98%) AUC, and (100%) precision based on the data set used in the analysis.

Download Full-text

Estimation of Precipitation Area Using S-Band Dual-Polarization Radar Measurements

Remote Sensing ◽

10.3390/rs13112039 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2039

Author(s):

Joon Jin Song ◽

Melissa Innerst ◽

Kyuhee Shin ◽

Bo-Young Ye ◽

Minho Kim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Support Vector ◽

Classification Methods ◽

Dual Polarization ◽

Data Set ◽

Spatial Classification ◽

Precipitation Area ◽

Radar Measurements

Estimating precipitation area is important for weather forecasting as well as real-time application. This paper aims to develop an analytical framework for efficient precipitation area estimation using S-band dual-polarization radar measurements. Several types of factors, such as types of sensors, thresholds, and models, are considered and compared to form a data set. After building the appropriate data set, this paper yields a rigorous comparison of classification methods in statistical (logistic regression and linear discriminant analysis) and machine learning (decision tree, support vector machine, and random forest). To achieve better performance, spatial classification is considered by incorporating latitude and longitude of observation location into classification, compared with non-spatial classification. The data used in this study were collected by rain detector and present weather sensor in a network of automated weather systems (AWS), and an S-band dual-polarimetric weather radar during ten different rainfall events of varying lengths. The mean squared prediction error (MSPE) from leave-one-out cross validation (LOOCV) is computed to assess the performance of the methods. Of the methods, the decision tree and random forest methods result in the lowest MSPE, and spatial classification outperforms non-spatial classification. Particularly, machine-learning-based spatial classification methods accurately estimate the precipitation area in the northern areas of the study region.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text

Machine Learning Application for Classification Prediction of Household’s Welfare Status

Journal on Information Technology and Computer Engineering ◽

10.25077/jitce.4.02.72-82.2020 ◽

2020 ◽

Vol 4 (02) ◽

pp. 72-82

Author(s):

Nofriani Nofriani

Keyword(s):

Machine Learning ◽

Random Forest ◽

Social Welfare ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

The Government ◽

Classification Prediction

Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.

Download Full-text

Heart Disease Prediction and Performance Assessment through Attribute Element Diminution using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1597.0881119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 604-609

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Heart Disease ◽

Random Forest ◽

Heart Diseases ◽

Support Vector ◽

Disease Prediction ◽

Data Set ◽

Raw Data ◽

Reduced Data

In today’s modern world, the human beings are affected with heart disease irrespective of the age. With the advancement of technological growth, predicting the availability of Heart diseases still remains a challenging issue. The difficulty of predicting the heart disease prevails due to the lack of availability of the symptoms. According to World Health Organization, 33% of population died due to heart diseases. For this, the diagnosis of heart diseases is made by complex combination of clinical data. With this overview, we have used Heart Disease Prediction dataset extracted from UCI Machine Learning Repository for predicting the level of heart disease. The prediction of heart disease classes are achieved in four ways. Firstly, the data set is preprocessed with Feature Scaling and Missing Values. Secondly, the raw data set is fitted to classifiers like logistic regression, KNN classifier, Support Vector Machine, Kernel Support Vector Machine, Naive Bayes, Random Forest and Decision Tree classifiers. Third, the raw data set is subjected to dimensionality reduction using Principal Component Analysis to project the dataset with important components. The dimensionality PCA reduced data set is fitted to the above-mentioned classifiers. Fourth, the performance comparison of raw data set and PCA reduced data set is done by analyzing the performance metrics like Precision, Recall, Accuracy and F-score. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that Random forest is found to be effective with the accuracy of 89% without applying PCA, 85% with five component PCA and 86% with seven component PCA.

Download Full-text

Machine learning based prognostic model for predicting infection susceptibility of COVID-19 using health care data

10.21203/rs.3.rs-46681/v1 ◽

2020 ◽

Author(s):

R Srivat ◽

Prithviraj N Indi ◽

Swapnil Agrahari ◽

Siddharth Menon ◽

S. Denis Ashok

Keyword(s):

Machine Learning ◽

Health Care ◽

Random Forest ◽

Support Vector Regression ◽

Prognostic Model ◽

Principal Component ◽

Random Forest Classifier ◽

Support Vector ◽

Data Set ◽

Health Care Data

Abstract From public health perspectives of COVID-19 pandemic, accurate estimates of infection severity of individuals are extremely valuable for the informed decision making and targeted response to an emerging pandemic. This paper presents machine learning based prognostic model for providing early warning to the individuals for COVID-19 infection using the health care data set. In the present work, a prognostic model using Random Forest classifier and support vector regression is developed for predicting the susceptibility of COVID-19 infection and it is applied on an open health care data set containing 27 field values. The typical fields of the health care data set include basic personal details such as age, gender, number of children in the household, marital status along with medical data like Coma score, Pulmonary score, Blood Glucose level, HDL cholesterol etc. An effective preprocessing method is carried out for handling the numerical, categorical values (non-numerical), missing data in the health care data set. Principal component analysis is applied for dimensionality reduction of the health care data set. From the classification results, it is noted that the random forest classifier provides a higher accuracy as compared to Support vector regression for the given health data set. Proposed machine learning approach can help the individuals to take additional precautions for protecting against COVID-19 infection. Based on the results of the proposed method, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spread. Methods In the present work, Random Forest classifier and support vector regression techniques are applied to a medical health care dataset containing 27 variables for predicting the susceptibility score of an individual towards COVID-19 infection and the accuracy of prediction is compared. An effective preprocessing is carried for handling the missing data in the health care data set. Principal Component Analysis is carried out on the data set for dimensionality reduction of the feature vectors. Results From the classification results, it is noted that the Random Forest classifier provides an accuracy of 90%, sensitivity of 94% and specificity of 81% for the given medical data set.Conclusion Proposed machine learning approach can help the individuals to take additional precautions for protecting people from the COVID-19 infection, clinicians and government officials can focus on the highly susceptible people for limiting the pandemic spread.

Download Full-text