scholarly journals A hydrological model skill score and revised R-squared

2021 ◽  
Author(s):  
Charles Onyutha

Abstract Despite the advances in methods of statistical and mathematical modeling, there is considerable lack of focus on improving how to judge models’ quality. Coefficient of determination (R2) is arguably the most widely applied ‘goodness-of-fit’ metric in modelling and prediction of environmental systems. However, known issues of R2 are that it: (i) can be low and high for an accurate and imperfect model, respectively; (ii) yields the same value when we regress observed on modelled series and vice versa; and (iii) does not quantify a model's bias (B). A new model skill score E and revised R-squared (RRS) are presented to combine correlation, term B and capacity to capture variability. Differences between E and RRS lie in the forms of correlation and the term B used for each metric. Acceptability of E and RRS was demonstrated through comparison of results from a large number of hydrological simulations. By applying E and RRS, the modeller can diagnostically identify and expose systematic issues behind model optimizations based on other ‘goodness-of-fits’ such as Nash–Sutcliffe efficiency (NSE) and mean squared error. Unlike NSE, which varies from −∞ to 1, E and RRS occur over the range 0–1. MATLAB codes for computing E and RRS are provided.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Rashmi Bhardwaj ◽  
Aashima Bangia

Imbalance in the pH of water reduces this precious resource as an extremely dangerous liquid for human health and plants’ growth. Change in the pH levels of the drinkable water has majorly raised concern towards diverse health issues like heart problems, infant mortality rates, pigmentation of skin, and cholera outbreaks. Therefore, it is necessary to keep a check on essential water quality components that include acidic/basic nature of water. As per the US Environmental Protection Agency (USEPA), the drinkable water should have a pH level ranging from 6.5 to 8.5. Two sample situates have been identified wherever highly reported pollutants levels were found and have been analyzed through artificial intelligence (AI) techniques. It can be observed that wavelet denoised signals fed into the least squares support vector regression (LSSVR) and M5 prime regression tree (M5pRT) predicted more accurately on the basis of the performance errors that are as follows: (a) root mean squared error (RMSE); (b) mean squared error (MSE); (c) mean absolute error (MAE). On the basis of these errors, the coefficient of determination/goodness of fit (R2) simulated for the prototypes is developed in this study. RMSE outcomes diminish on the whole on applying the training and forecasting data-division via WLSSVR and WM5pRT as compared with fitting the normalized data through LSSVR and M5pRT. These performance measures are essential to analyze the concentration levels of pH in the river streams at the identified sites of study. Thus, the observed pattern from this study may help for future estimation of the quality of water at their sources so that it prohibits the further increase in either acidic or basic salts which prove to be lethal for the environment. Thus, these predictors would be helpful towards formulation of strategies for protection of ecosystem and human health.


2008 ◽  
Vol 51 (4) ◽  
pp. 329-337
Author(s):  
Ö. Koçak ◽  
B. Ekiz

Abstract. The objective of this study was to compare the goodness of fit of seven mathematical models (including the gamma function, the exponential model, the mixed log model, the inverse quadratic polynomial model and their various modifications) on daily milk yield records. The criteria used to compare models were mean R2, root mean squared errors (RMSE) and difference between actual and predicted lactation milk yields. The effect of lactation number on curve parameters was significant for models with three parameters. Third lactation cows had the highest intercept post-calving, greatest incline between calving and peak milk yield and greatest decline between peak milk yield and end of lactation. Latest peak production occurred in first lactation for all models, while third lactation cows had the earliest day of peak production. The R2 values ranged between 0.590 and 0.650 for first lactation, between 0.703 and 0.773 for second lactation and between 0.686 and 0.824 for third lactation, depending on the model fitted. The root mean squared error values of different models varied between 1.748 kg and 2.556 kg for first parity cows, between 2.133 kg and 3.284 kg for second parity cows and between 2.342 kg and 7.898 kg for third parity cows. Lactation milk yield deviations of Ali and Schaeffer, Wilmink and Guo and Swalve Models were close to zero for all lactations. Ali and Schaeffer Model had the highest R2 for all lactations and also yielded smallest RMSE and actual and predicted lactation milk yield differences. Wilmink and Guo and Swalve Models gave better fit than other three parameter models.


2018 ◽  
Vol 80 (01) ◽  
pp. 072-078 ◽  
Author(s):  
Berdine Heesterman ◽  
John-Melle Bokhorst ◽  
Lisa de Pont ◽  
Berit Verbist ◽  
Jean-Pierre Bayley ◽  
...  

Background To improve our understanding of the natural course of head and neck paragangliomas (HNPGL) and ultimately differentiate between cases that benefit from early treatment and those that are best left untreated, we studied the growth dynamics of 77 HNPGL managed with primary observation. Methods Using digitally available magnetic resonance images, tumor volume was estimated at three time points. Subsequently, nonlinear least squares regression was used to fit seven mathematical models to the observed growth data. Goodness of fit was assessed with the coefficient of determination (R 2) and root-mean-squared error. The models were compared with Kruskal–Wallis one-way analysis of variance and subsequent post-hoc tests. In addition, the credibility of predictions (age at onset of neoplastic growth and estimated volume at age 90) was evaluated. Results Equations generating sigmoidal-shaped growth curves (Gompertz, logistic, Spratt and Bertalanffy) provided a good fit (median R 2: 0.996–1.00) and better described the observed data compared with the linear, exponential, and Mendelsohn equations (p < 0.001). Although there was no statistically significant difference between the sigmoidal-shaped growth curves regarding the goodness of fit, a realistic age at onset and estimated volume at age 90 were most often predicted by the Bertalanffy model. Conclusions Growth of HNPGL is best described by decelerating tumor growth laws, with a preference for the Bertalanffy model. To the best of our knowledge, this is the first time that this often-neglected model has been successfully fitted to clinically obtained growth data.


2021 ◽  
Author(s):  
Hangsik Shin

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104


2019 ◽  
Vol 32 (1) ◽  
pp. 251-258 ◽  
Author(s):  
Francisco Arthur Arré ◽  
José Elivalto Guimarães Campelo ◽  
José Lindenberg Rocha Sarmento ◽  
Luiz Antônio Silva Figueiredo Filho ◽  
Diego Helcias Cavalcante

ABSTRACT The objective of this study was to determine the optimum age at last weighing and compare the goodness of fit of nonlinear models used to fit longitudinal weight-age data to describe the growth pattern of Anglo-Nubian does. Weights of 104 animals from birth to 60 months of age were grouped into 10 age groups at six-month intervals. In each age group, parameters A (asymptotic weight), B (integration constant), and K (maturity index) were estimated using the Brody, Gompertz, logistic, and von Bertalanffy models. Data were analyzed using analysis of variance in a factorial design (10 age groups × 4 nonlinear models). The age group × model interaction was not significant. Mean estimates of A, B, and K were significantly different between age groups up to 30 months (p < 0.05), indicating that the estimated curve is affected by weights taken before this age independent of the model. The values of mean squared error (MSE), mean absolute deviation (MAD), coefficient of determination (R2) and Rate of convergence (RC) at each age group up to 30 months were compared to determine the goodness of fit of nonlinear models. The ranking of fit was logistic, Brody, von Bertalanffy, and Gompertz. The logistic and Brody models respectively estimated the smallest and largest asymptotic weight. Longitudinal weight records taken until 30 months of age are most appropriate for estimating the growth of Anglo-Nubian does using nonlinear models.


2020 ◽  
Vol 20 (9) ◽  
pp. 5716-5719 ◽  
Author(s):  
Cho Hwe Kim ◽  
Young Chul Kim

The application of artificial neural network (ANN) for modeling, combined steam-carbon dioxide reforming of methane over nickel-based catalysts, was investigated. The artificial neural network model consisted of a 3-layer feed forward network, with hyperbolic tangent function. The number of hidden neurons is optimized by minimization of mean square error and maximization of R2 (R square, coefficient of determination) and set of 8 neurons. With feed ratio, flow rate, and temperature as independent variables, methane, carbon dioxide conversion, and H2/CO ratio, were measured using artificial neural network. Coefficient of determination (R2) values of 0.9997, 0.9962, and 0.9985 obtained, and MAE (Mean Absolute Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), and MAPE (Mean Absolute Percentage Error) showed low value. This study indicates ANN can successfully model a highly nonlinear process and function.


2018 ◽  
Author(s):  
Cailey Elizabeth Fitzgerald ◽  
Ryne Estabrook ◽  
Daniel Patrick Martin ◽  
Andreas Markus Brandmaier ◽  
Timo von Oertzen

Missing data are ubiquitous in both small and large datasets. Missing data may come about as a result of coding or computer error, participant absences, or it may be intentional, as in planned missing designs. We discuss missing data as it relates to goodness-of-fit indices in Structural Equation Modeling (SEM), specifically the effects of missing data on the Root Mean Squared Error of Approximation (RMSEA). We use simulations to show that naive implementations of the RMSEA have a downward bias in the presence of missing data and, thus, overestimate model goodness-of-fit. Unfortunately, many state-of-the-art software packages report the biased form of RMSEA. As a consequence, the community may have been accepting a much larger fraction of models with non-acceptable model fit. We propose a bias-correction for the RMSEA based on information-theoretic considerations that take into account the expected misfit of a person with fully observed data. This results in an RMSEA which is asymptotically independent of the proportion of missing data for misspecified models. Importantly, results of the corrected RMSEA computation are identical to naive RMSEA if there are no missing data.


Author(s):  
Hanna Unterauer ◽  
Norbert Brunner ◽  
Manfred Kühleitner

Scientific growth literature often uses the models of Brody, Gompertz, Verhulst, and von Bertalanffy. The versatile five-parameter Bertalanffy-Pütter (BP) model generalizes them. Using the least-squares method, we fitted the BP model to mass-at-age data of 161 calves, cows, bulls, and oxen of cattle breeds that are common in Austria and Southern Germany. We used three measures to assess the goodness of fit: R-squared, normalized root-mean squared error, and the Akaike information criterion together with a correction for sample size. Although the BP model improved the fit of the linear growth model considerably in terms of R-squared, the better fit did not, in general, justify the use of its additional parameters, because most of the data had a non-sigmoidal character. In terms of the Akaike criterion, we could identify only a small core of data (15%) where sigmoidal models were indispensable.    


2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Rahmat Robi Waliyansyah ◽  
Nugroho Dwi Saputro

College education institutions regularly hold new student admissions activities, and the number of new students can increase and can also decrease. University of PGRI Semarang (UPGRIS) on the development of new student admissions for the 2014/2015 academic year up to 2018/2019 with so many admissions selection stages. To meet the minimum comparison requirements between the number of students with the development of human resources, facilities, and infrastructure, it is necessary to predict how much the number of students increases each year. To make a prediction system or forecasting, the number of prospective new students required a good forecasting method and sufficiently precise calculations to predict the number of prospective students who register. In this study, the method to be taken is the Random Forest method. For the evaluation of forecasting models used Random Sampling and Cross-validation. The parameter used is Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R2). The results of this study obtained the five highest and lowest study programs in the admission of new students. Therefore, UPGRIS will make a new strategy for the five lowest study programs so that the desired number of new students is achieved


2014 ◽  
Vol 625 ◽  
pp. 188-191
Author(s):  
Muhammad Zubair Shahid ◽  
Abdulhalim Shah Maulud ◽  
Mohammad Azmi Bustam

Carbon dioxide (CO2) capturing has been an important issue for decades. Alkanoamines, such as diethanolamine (DEA) have been widely used for CO2separation by absorption process. During this process, CO2loading measurement is an imperative action for a proper process control. Currently used methods are titration based which requires a long processing time. In this work Raman spectroscopy is used to model and predict the CO2loading in wide range (0-0.97 CO2mole/amine mole). The models are developed by using Raman peak ratios to minimize the error due to peaks fluctuations. The Raman peak ratio of 1022 cm-1/1461cm-1has been found as a good fit with the coefficient of determination (R2) of 0.92 and mean squared error (MSE) of 0.00656 CO2mole2/ amine mole2in prediction of CO2loading.


Sign in / Sign up

Export Citation Format

Share Document