scholarly journals A Comparative Study of Different Machine Learning Algorithms in Predicting the Content of Ilmenite in Titanium Placer

2020 ◽  
Vol 10 (2) ◽  
pp. 635 ◽  
Author(s):  
Yingli LV ◽  
Qui-Thao Le ◽  
Hoang-Bac Bui ◽  
Xuan-Nam Bui ◽  
Hoang Nguyen ◽  
...  

In this study, the ilmenite content in beach placer sand was estimated using seven soft computing techniques, namely random forest (RF), artificial neural network (ANN), k-nearest neighbors (kNN), cubist, support vector machine (SVM), stochastic gradient boosting (SGB), and classification and regression tree (CART). The 405 beach placer borehole samples were collected from Southern Suoi Nhum deposit, Binh Thuan province, Vietnam, to test the feasibility of these soft computing techniques in estimating ilmenite content. Heavy mineral analysis indicated that valuable minerals in the placer sand are zircon, ilmenite, leucoxene, rutile, anatase, and monazite. In this study, five materials, namely rutile, anatase, leucoxene, zircon, and monazite, were used as the input variables to estimate ilmenite content based on the above mentioned soft computing models. Of the whole dataset, 325 samples were used to build the regarded soft computing models; 80 remaining samples were used for the models’ verification. Root-mean-squared error (RMSE), determination coefficient (R2), a simple ranking method, and residuals analysis technique were used as the statistical criteria for assessing the model performances. The numerical experiments revealed that soft computing techniques are capable of estimating the content of ilmenite with high accuracy. The residuals analysis also indicated that the SGB model was the most suitable for determining the ilmenite content in the context of this research.

Water ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3490
Author(s):  
Noor Hafsa ◽  
Sayeed Rushd ◽  
Mohammed Al-Yaari ◽  
Muhammad Rahman

Applications of machine learning algorithms (MLAs) to modeling the adsorption efficiencies of different heavy metals have been limited by the adsorbate–adsorbent pair and the selection of specific MLAs. In the current study, adsorption efficiencies of fourteen heavy metal–adsorbent (HM-AD) pairs were modeled with a variety of ML models such as support vector regression with polynomial and radial basis function kernels, random forest (RF), stochastic gradient boosting, and bayesian additive regression tree (BART). The wet experiment-based actual measurements were supplemented with synthetic data samples. The first batch of dry experiments was performed to model the removal efficiency of an HM with a specific AD. The ML modeling was then implemented on the whole dataset to develop a generalized model. A ten-fold cross-validation method was used for the model selection, while the comparative performance of the MLAs was evaluated with statistical metrics comprising Spearman’s rank correlation coefficient, coefficient of determination (R2), mean absolute error, and root-mean-squared-error. The regression tree methods, BART, and RF demonstrated the most robust and optimum performance with 0.96 ⫹ R2 ⫹ 0.99. The current study provides a generalized methodology to implement ML in modeling the efficiency of not only a specific adsorption process but also a group of comparable processes involving multiple HM-AD pairs.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Susan Idicula-Thomas ◽  
Ulka Gawde ◽  
Prabhat Jha

Abstract Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.


2020 ◽  
Vol 12 (19) ◽  
pp. 3265
Author(s):  
Rei Sonobe ◽  
Hiroto Yamashita ◽  
Harumi Mihara ◽  
Akio Morita ◽  
Takashi Ikka

Japanese horseradish (wasabi) grows in very specific conditions, and recent environmental climate changes have damaged wasabi production. In addition, the optimal culture methods are not well known, and it is becoming increasingly difficult for incipient farmers to cultivate it. Chlorophyll a, b and carotenoid contents, as well as their allocation, could be an adequate indicator in evaluating its production and environmental stress; thus, developing an in situ method to monitor photosynthetic pigments based on reflectance could be useful for agricultural management. Besides original reflectance (OR), five pre-processing techniques, namely, first derivative reflectance (FDR), continuum-removed (CR), de-trending (DT), multiplicative scatter correction (MSC), and standard normal variate transformation (SNV), were compared to assess the accuracy of the estimation. Furthermore, five machine learning algorithms—random forest (RF), support vector machine (SVM), kernel-based extreme learning machine (KELM), Cubist, and Stochastic Gradient Boosting (SGB)—were considered. To classify the samples under different pH or sulphur ion concentration conditions, the end of the red edge bands was effective for OR, FDR, DT, MSC, and SNV, while a green-peak band was effective for CR. Overall, KELM and Cubist showed high performance and incorporating pre-processing techniques was effective for obtaining estimated values with high accuracy. The best combinations were found to be DT–KELM for chl a (RPD = 1.511–5.17, RMSE = 1.23–3.62 μg cm−2) and chl a:b (RPD = 0.73–3.17, RMSE = 0.13–0.60); CR–KELM for chl b (RPD = 1.92–5.06, RMSE = 0.41–1.03 μg cm−2) and chl a:car (RPD = 1.31–3.23, RMSE = 0.26–0.50); SNV–Cubist for car (RPD = 1.63–3.32, RMSE = 0.31–1.89 μg cm−2); and DT–Cubist for chl:car (RPD = 1.53–3.96, RMSE = 0.27–0.74).


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2849 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


Resources ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 156 ◽  
Author(s):  
Oluwaseun Oyebode ◽  
Desmond Eseoghene Ighravwe

Previous studies have shown that soft computing models are excellent predictive models for demand management problems. However, their applications in solving water demand forecasting problems have been scantily reported. In this study, feedforward artificial neural networks (ANNs) and a support vector machine (SVM) were used to forecast water consumption. Two ANN models were trained using different algorithms: differential evolution (DE) and conjugate gradient (CG). The performance of these soft computing models was investigated with real-world data sets from the City of Ekurhuleni, South Africa, and compared with conventionally used exponential smoothing (ES) and multiple linear regression (MLR). The results obtained showed that the ANN model that was trained with DE performed better than the CG-trained ANN and other predictive models (SVM, ES and MLR). This observation further demonstrates the robustness of evolutionary computation techniques amongst soft computing techniques.


Weather forecasting and warning is the application of science and technology to predict the state of the weather for a future time of a given location. The emergence of adverse effects of weather has endangered the life of general public in previous years. The unpredicted flood and super cyclone in many places have created havoc. The government and private agencies are working on its behaviours but still it is challenging and incomplete. But, the application of soft computing techniques in weather prediction has made a significant perfomance now a days. This research work presents the comparative study of soft computing techniques like MultiLayer Perceptron(MLP), Support Vector Machine(SVM) and J48 Decision Tree for forecasting the weather of Delhi with ten years data comprising of temperature, dew, humidity, air pressure, wind speed and visibility. This paper tries to describe the comparison among above models using four different error values like Relative Absolute Error(RAE), Mean Absolute Error(MAE), Root Mean Squared Error(RMSE) and Root Relative Squared Error(R2 ) with a proposed model by defining new algorithm. Further the performance can be enhanced if textmining will be applied in this proposed model.


2022 ◽  
Vol 4 ◽  
Author(s):  
Matthew D. Stocker ◽  
Yakov A. Pachepsky ◽  
Robert L. Hill

The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml−1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


Sign in / Sign up

Export Citation Format

Share Document