adaptive regression splines
Recently Published Documents


TOTAL DOCUMENTS

373
(FIVE YEARS 113)

H-INDEX

39
(FIVE YEARS 9)

2022 ◽  
Author(s):  
Alberto Celma ◽  
Richard Bade ◽  
Juan V. Sancho ◽  
Félix Hernández ◽  
Melissa Humpries ◽  
...  

Abstract Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or non-target approaches (i.e. when no reference standards are available) there is no information on retention time (RT) and collision cross section (CCS) values to facilitate identification. In-silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) was evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the empirical data. The RT model (R2=0.855) showed a deviation between predicted and empirical data of ± 2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using CCSH model (R2=0.966) was ± 4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules resulting in deviations below ± 5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2=0.954) with deviation below ± 5.25% for the 95% of the cases. The developed models have been incorporated in an open access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.


2022 ◽  
Vol 14 (2) ◽  
pp. 798
Author(s):  
Snezhana Gocheva-Ilieva ◽  
Atanas Ivanov ◽  
Maya Stoimenova-Minova

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Eman H. Alkhammash ◽  
Abdelmonaim Fakhry Kamel ◽  
Saud M. Al-Fattah ◽  
Ahmed M. Elshewey

This paper presents optimized linear regression with multivariate adaptive regression splines (LR-MARS) for predicting crude oil demand in Saudi Arabia based on social spider optimization (SSO) algorithm. The SSO algorithm is applied to optimize LR-MARS performance by fine-tuning its hyperparameters. The proposed prediction model was trained and tested using historical oil data gathered from different sources. The results suggest that the demand for crude oil in Saudi Arabia will continue to increase during the forecast period (1980–2015). A number of predicting accuracy metrics including Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Square Error (MSE), Root Mean Square Error (RMSE), and coefficient of determination ( R 2 ) were used to examine and verify the predicting performance for various models. Analysis of variance (ANOVA) was also applied to reveal the predicting result of the crude oil demand in Saudi Arabia and also to compare the actual test data and predict results between different predicting models. The experimental results show that optimized LR-MARS model performs better than other models in predicting the crude oil demand.


Author(s):  
Shen Xing-xing ◽  
Cao Wei-wei ◽  
Li Kai

Abstract In this study, multivariate adaptive regression splines (MARS) model with order two and three were developed for predicting the California bearing capacity (CBR) value of pond ash stabilized with lime and lime sludge. To this aim, the model had five variables named maximum dry density, optimum moisture content, lime percentage, lime sludge percentage, and curing period as inputs, and CBR as output variable. MARS-O3 has the best results, which its R2 stood at 0.9565 and 0.9312, and PI 0.0709 and 0.1061 for the training and testing phases, respectively. In both developed models, the estimated CBR values in training and testing stages specify acceptable agreement with experimental results, representing the workability of proposed equations for predicting the CBR values with high accuracy. Comparison of two developed equations supplied that MARS-O3 has a better result than MARS-O2. Based on error curves, the MARS-O3 model results in the lowest error percentage in the CBR predicting process, providing roughly accurate prediction than those of the rest developed methods specified. Therefore, MARS-O3 could be recognized as the proposed model.


Author(s):  
Paulino José García-Nieto ◽  
E. García-Gonzalo ◽  
José Ramón Alonso Fernández ◽  
Cristina Díaz Muñiz

AbstractTotal phosphorus (from now on mentioned as TP) and chlorophyll-a (from now on mentioned as Chl-a) are recognized indicators for phytoplankton large quantity and biomass-thus, actual estimates of the eutrophic state-of water bodies (i.e., reservoirs, lakes and seas). A robust nonparametric method, called support vector regression (SVR) approach, for forecasting the output Chl-a and TP concentrations coming from 268 samples obtained in Tanes reservoir is described in this investigation. Previously, we have carried out a selection of the main features (biological and physico-chemical predictors) employing the multivariate adaptive regression splines approximation to construct reduced models for the purpose of making them easier to interpret for researchers/readers and to reduce the overfitting. As an optimizer, the heuristic technique termed as whale optimization iterative algorithm (WOA), was employed here to optimize the regression parameters with success. Two main results have been obtained. Firstly, the relative relevance of the models variables was stablished. Secondly, the Chl-a and TP can be successfully foretold employing this hybrid WOA/SVR-based approximation. The coincidence between the predicted approximation and the observed data obviously demonstrates the quality of this novel technique.


2021 ◽  
Author(s):  
Georgios Baskozos ◽  
Andreas Themistocleous ◽  
Harry L Hebert ◽  
Mathilde Pascal ◽  
Jishi John ◽  
...  

Abstract Background: To improve the treatment of painful Diabetic Peripheral Neuropathy (DPN) and associated co-morbidities, a better understanding of the pathophysiology and risk factors for painful DPN is required. Using harmonised cohorts (N = 1230) we have built models that classify painful versus painless DPN. Methods: The Random Forest, Adaptive Regression Splines and Naive Bayes machine learning models were trained for classifying painful/painless DPN. Their performance was estimated using cross-validation in large cross-sectional cohorts (N = 935). Models were externally validated in a large population-based cohort (N = 295) in the presence of missing values. Variables were ranked for importance using model specific metrics and marginal effects of predictors were aggregated and assessed at the global level. Model selection was carried out using the Mathews Correlation Coefficient (MCC) and model performance was quantified in the validation set using MCC, the area under the precision/recall curve (AUPRC) and accuracy.Results: Random Forest (MCC=0.28, AUPRC = 0.76) and Adaptive Regression Splines (MCC = 0.29, AUPRC = 0.77) were the best performing models and showed the smallest reduction in performance between the training and validation dataset. EQ5D index, the 10-item personality dimensions, HbA1c, Depression and Anxiety t-scores, age and Body Mass Index were consistently amongst the most powerful predictors in classifying painful vs painless DPN. Conclusions: Machine learning models trained on large cross-sectional cohorts were able to accurately classify painful or painless DPN on an independent population-based dataset. Painful DPN is associated with more depression, anxiety and certain personality traits. It is also associated with poorer self-reported quality of life, younger age, poor glucose control and high Body Mass Index (BMI). The models showed good performance in realistic conditions in the presence of missing values and noisy datasets. These models can be used either in the clinical context to assist patient stratification based on the risk of painful DPN or return broad risk categories based on user input. Model’s performance and calibration suggest that in both cases they could potentially improve diagnosis and outcomes by changing modifiable factors like BMI and HbA1c control and institute earlier preventive or supportive measures like psychological interventions.


2021 ◽  
Vol 2021 (1) ◽  
pp. 1044-1053
Author(s):  
Nuri Taufiq ◽  
Siti Mariyah

Metode yang digunakan untuk pemeringkatan status sosial ekonomi rumah tangga Basis Data Terpadu adalah dengan memprediksi nilai pengeluaran rumah tangga dengan metode Proxy Mean Testing (PMT). Secara umum metode ini merupakan model prediksi dengan menggunakan teknik regresi. Pilihan model statistik yang digunakan adalah forward-stepwise. Dalam praktiknya diasumsikan bahwa variabel prediktor yang digunakan dalam PMT memiliki korelasi linier dengan variabel pengeluaran. Penelitian ini mencoba menerapkan pendekatan machine learning sebagai alternatif metode prediksi selain model forward-stepwise. Model dibangun menggunakan beberapa algoritma machine learning seperti Multivariate Adaptive Regression Splines (MARS), K-Nearest Neighbors, Decision Tree, dan Bagging. Hasil pemodelan menunjukkan bahwa model machine learning menghasilkan nilai rata-rata inclusion error (IE) lebih rendah dibandingkan nilai rata-rata exclusion error (EE). Model machine learning bekerja efektif dalam mengurangi IE namun belum cukup sensitif dalam mengurangi EE. Nilai rata-rata IE model machine learning sebesar 0,21 sedangkan nilai rata-rata IE model PMT sebesar 0,29.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2696
Author(s):  
Nawin Raj ◽  
Zahra Gharineiat

Mean sea level rise is a significant emerging risk from climate change. This research paper is based on the use of artificial intelligence models to assess and predict the trend on mean sea level around northern Australian coastlines. The study uses sea-level times series from four sites (Broom, Darwin, Cape Ferguson, Rosslyn Bay) to make the prediction. Multivariate adaptive regression splines (MARS) and artificial neural network (ANN) algorithms have been implemented to build the prediction model. Both models show high accuracy (R2 > 0.98) and low error values (RMSE < 27%) overall. The ANN model showed slightly better performance compared to MARS over the selected sites. The ANN performance was further assessed for modelling storm surges associated with cyclones. The model reproduced the surge profile with the maximum correlation coefficients ~0.99 and minimum RMS errors ~4 cm at selected validating sites. In addition, the ANN model predicted the maximum surge at Rosslyn Bay for cyclone Marcia to within 2 cm of the measured peak and the maximum surge at Broome for cyclone Narelle to within 7 cm of the measured peak. The results are comparable with a MARS model previously used in this region; however, the ANN shows better agreement with the measured peak and arrival time, although it suffers from slightly higher predictions than the observed sea level by tide gauge station.


Sign in / Sign up

Export Citation Format

Share Document