adaptive regression
Recently Published Documents


TOTAL DOCUMENTS

676
(FIVE YEARS 217)

H-INDEX

48
(FIVE YEARS 11)

2022 ◽  
Author(s):  
Alberto Celma ◽  
Richard Bade ◽  
Juan V. Sancho ◽  
Félix Hernández ◽  
Melissa Humpries ◽  
...  

Abstract Ultra-high performance liquid chromatography coupled to ion mobility separation and high-resolution mass spectrometry instruments have proven very valuable for screening of emerging contaminants in the aquatic environment. However, when applying suspect or non-target approaches (i.e. when no reference standards are available) there is no information on retention time (RT) and collision cross section (CCS) values to facilitate identification. In-silico prediction tools of RT and CCS can therefore be of great utility to decrease the number of candidates to investigate. In this work, Multiple Adaptive Regression Splines (MARS) was evaluated for the prediction of both RT and CCS. MARS prediction models were developed and validated using a database of 477 protonated molecules, 169 deprotonated molecules and 249 sodium adducts. Multivariate and univariate models were evaluated showing a better fit for univariate models to the empirical data. The RT model (R2=0.855) showed a deviation between predicted and empirical data of ± 2.32 min (95% confidence intervals). The deviation observed for CCS data of protonated molecules using CCSH model (R2=0.966) was ± 4.05% with 95% confidence intervals. The CCSH model was also tested for the prediction of deprotonated molecules resulting in deviations below ± 5.86% for the 95% of the cases. Finally, a third model was developed for sodium adducts (CCSNa, R2=0.954) with deviation below ± 5.25% for the 95% of the cases. The developed models have been incorporated in an open access and user-friendly online platform which represents a great advantage for third-party research laboratories for predicting both RT and CCS data.


2022 ◽  
Vol 14 (2) ◽  
pp. 798
Author(s):  
Snezhana Gocheva-Ilieva ◽  
Atanas Ivanov ◽  
Maya Stoimenova-Minova

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Eman H. Alkhammash ◽  
Abdelmonaim Fakhry Kamel ◽  
Saud M. Al-Fattah ◽  
Ahmed M. Elshewey

This paper presents optimized linear regression with multivariate adaptive regression splines (LR-MARS) for predicting crude oil demand in Saudi Arabia based on social spider optimization (SSO) algorithm. The SSO algorithm is applied to optimize LR-MARS performance by fine-tuning its hyperparameters. The proposed prediction model was trained and tested using historical oil data gathered from different sources. The results suggest that the demand for crude oil in Saudi Arabia will continue to increase during the forecast period (1980–2015). A number of predicting accuracy metrics including Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Square Error (MSE), Root Mean Square Error (RMSE), and coefficient of determination ( R 2 ) were used to examine and verify the predicting performance for various models. Analysis of variance (ANOVA) was also applied to reveal the predicting result of the crude oil demand in Saudi Arabia and also to compare the actual test data and predict results between different predicting models. The experimental results show that optimized LR-MARS model performs better than other models in predicting the crude oil demand.


2021 ◽  
Vol 2 (2) ◽  
pp. 94-104
Author(s):  
Arash Nejatian ◽  
Maryam Khaksar ◽  
Alireza Zahiroddin ◽  
Leila Azimi

The present research has studied Bonyan-Method Experiential Marathon Structured Groups' efficacy on the nonclinical populations' ego functions. This study was a quasi-experimental trial with a control group. The trial group participated in the marathon group on three consecutive days (36 hours) and weekly sessions for three weeks. Then the ego function evaluation questionnaire was simultaneously given to both groups. All ego functions in the trial group showed significant growth compared to the control group. Among these, the most remarkable statistical effect size was related to "Adaptive Regression in Service of the Ego" and "Stimulus barrier." The relationship between improving ego functions and mental health can be anticipated, and steps can be taken to promote the community’s mental health by using these groups.


2021 ◽  
Vol 16 ◽  
pp. 686-695
Author(s):  
Endang Krisnawati ◽  
Adji Achmad Rinaldo Fernandes ◽  
Solimun Solimun

The purpose of this study is to develop a Non-parametric Path with the MARS (Multivariate Adaptive Regression Spline) approach which is applied to the behavior of paying credit compliance at Bank. prospective debtor by a Bank. The data used in this study is primary data using a research instrument in the form of a questionnaire. There are 7 variables, namely 5 exogenous variables in the form of 5C variables (Character (X1), Capacity (X2), Capital (X3), Collateral (X4), Condition of Economy (X5)), and two endogenous variables, namely Punctual Payment (Y1), Obedient Paying Behavior (Y2). Variable measurement technique is done by calculating the average score on the items. Sampling in this study used a purposive sampling technique with the criteria of respondents in the study were mortgage debtors (House Ownership Credit) at Bank X. Respondents obtained in this study were 100 respondents. The analysis used is nonparametric path with Multivariate Adaptive Regression Spline (MARS) approach. The result of this research is the estimation of nonparametric Path function using MARS approach on various interactions. The best estimate of the function of obedient behavior in paying credit is when it involves 4 variables, namely Character (X1), Capacity (X2), Conditions of economy (X5), and On time pay (Y1) with a value of generalized cross-validation The smallest (GCV) obtained is 0.2496. The originality of this research is the development of a nonparametric path with the MARS approach that is able to capture interactions between existing variables and is also able to handle the limitations of the truncated spline to determine the position and number of knot points used when involving many predictor variables. There has been no previous research that has examined the development of a nonparametric path with the MARS approach.


Author(s):  
Shen Xing-xing ◽  
Cao Wei-wei ◽  
Li Kai

Abstract In this study, multivariate adaptive regression splines (MARS) model with order two and three were developed for predicting the California bearing capacity (CBR) value of pond ash stabilized with lime and lime sludge. To this aim, the model had five variables named maximum dry density, optimum moisture content, lime percentage, lime sludge percentage, and curing period as inputs, and CBR as output variable. MARS-O3 has the best results, which its R2 stood at 0.9565 and 0.9312, and PI 0.0709 and 0.1061 for the training and testing phases, respectively. In both developed models, the estimated CBR values in training and testing stages specify acceptable agreement with experimental results, representing the workability of proposed equations for predicting the CBR values with high accuracy. Comparison of two developed equations supplied that MARS-O3 has a better result than MARS-O2. Based on error curves, the MARS-O3 model results in the lowest error percentage in the CBR predicting process, providing roughly accurate prediction than those of the rest developed methods specified. Therefore, MARS-O3 could be recognized as the proposed model.


Author(s):  
Paulino José García-Nieto ◽  
E. García-Gonzalo ◽  
José Ramón Alonso Fernández ◽  
Cristina Díaz Muñiz

AbstractTotal phosphorus (from now on mentioned as TP) and chlorophyll-a (from now on mentioned as Chl-a) are recognized indicators for phytoplankton large quantity and biomass-thus, actual estimates of the eutrophic state-of water bodies (i.e., reservoirs, lakes and seas). A robust nonparametric method, called support vector regression (SVR) approach, for forecasting the output Chl-a and TP concentrations coming from 268 samples obtained in Tanes reservoir is described in this investigation. Previously, we have carried out a selection of the main features (biological and physico-chemical predictors) employing the multivariate adaptive regression splines approximation to construct reduced models for the purpose of making them easier to interpret for researchers/readers and to reduce the overfitting. As an optimizer, the heuristic technique termed as whale optimization iterative algorithm (WOA), was employed here to optimize the regression parameters with success. Two main results have been obtained. Firstly, the relative relevance of the models variables was stablished. Secondly, the Chl-a and TP can be successfully foretold employing this hybrid WOA/SVR-based approximation. The coincidence between the predicted approximation and the observed data obviously demonstrates the quality of this novel technique.


2021 ◽  
Author(s):  
Georgios Baskozos ◽  
Andreas Themistocleous ◽  
Harry L Hebert ◽  
Mathilde Pascal ◽  
Jishi John ◽  
...  

Abstract Background: To improve the treatment of painful Diabetic Peripheral Neuropathy (DPN) and associated co-morbidities, a better understanding of the pathophysiology and risk factors for painful DPN is required. Using harmonised cohorts (N = 1230) we have built models that classify painful versus painless DPN. Methods: The Random Forest, Adaptive Regression Splines and Naive Bayes machine learning models were trained for classifying painful/painless DPN. Their performance was estimated using cross-validation in large cross-sectional cohorts (N = 935). Models were externally validated in a large population-based cohort (N = 295) in the presence of missing values. Variables were ranked for importance using model specific metrics and marginal effects of predictors were aggregated and assessed at the global level. Model selection was carried out using the Mathews Correlation Coefficient (MCC) and model performance was quantified in the validation set using MCC, the area under the precision/recall curve (AUPRC) and accuracy.Results: Random Forest (MCC=0.28, AUPRC = 0.76) and Adaptive Regression Splines (MCC = 0.29, AUPRC = 0.77) were the best performing models and showed the smallest reduction in performance between the training and validation dataset. EQ5D index, the 10-item personality dimensions, HbA1c, Depression and Anxiety t-scores, age and Body Mass Index were consistently amongst the most powerful predictors in classifying painful vs painless DPN. Conclusions: Machine learning models trained on large cross-sectional cohorts were able to accurately classify painful or painless DPN on an independent population-based dataset. Painful DPN is associated with more depression, anxiety and certain personality traits. It is also associated with poorer self-reported quality of life, younger age, poor glucose control and high Body Mass Index (BMI). The models showed good performance in realistic conditions in the presence of missing values and noisy datasets. These models can be used either in the clinical context to assist patient stratification based on the risk of painful DPN or return broad risk categories based on user input. Model’s performance and calibration suggest that in both cases they could potentially improve diagnosis and outcomes by changing modifiable factors like BMI and HbA1c control and institute earlier preventive or supportive measures like psychological interventions.


Sign in / Sign up

Export Citation Format

Share Document