scholarly journals Multivariable Integrated Evaluation of Model Performance with the Vector Field Evaluation Diagram

Author(s):  
Zhongfeng Xu ◽  
Ying Han ◽  
Congbin Fu

Abstract. This paper develops a multivariable integrated evaluation (MVIE) method to measure the overall performance of climate model in simulating multiple fields. The general idea of MVIE is to group various scalar fields into a vector field and compare the constructed vector field against the observed one using the vector field evaluation (VFE) diagram. The VFE diagram was devised based on the cosine relationship between three statistical quantities: root mean square length (RMSL) of a vector field, vector field similarity coefficient, and root mean square vector deviation (RMSVD). The three statistical quantities can reasonably represent the corresponding statistics between two multidimensional vector fields. Therefore, one can summarize the three statistics of multiple scalar fields using VFE diagram and facilitate the intercomparison of model performances. The VFE diagram can illustrate how much the overall root mean square deviation of various fields is attributable to the differences in the root mean square value and how much is due to the poor pattern similarity. The MVIE method can be flexibly applied to full fields (including both the mean and anomaly) or anomaly fields depending on the application. We also propose a multivariable integrated evaluation index (MIEI) which takes the amplitude and pattern similarity of multiple scalar fields into account. The MIEI is expected to provide a more accurate evaluation of model performance in simulating multiple fields. The MIEI, VFE diagram, and commonly used statistical metrics for individual variables constitute a hierarchical evaluation methodology, which can provide a more comprehensive evaluation on model performance.

2017 ◽  
Vol 10 (10) ◽  
pp. 3805-3820 ◽  
Author(s):  
Zhongfeng Xu ◽  
Ying Han ◽  
Congbin Fu

Abstract. This paper develops a multivariable integrated evaluation (MVIE) method to measure the overall performance of climate model in simulating multiple fields. The general idea of MVIE is to group various scalar fields into a vector field and compare the constructed vector field against the observed one using the vector field evaluation (VFE) diagram. The VFE diagram was devised based on the cosine relationship between three statistical quantities: root mean square length (RMSL) of a vector field, vector field similarity coefficient, and root mean square vector deviation (RMSVD). The three statistical quantities can reasonably represent the corresponding statistics between two multidimensional vector fields. Therefore, one can summarize the three statistics of multiple scalar fields using the VFE diagram and facilitate the intercomparison of model performance. The VFE diagram can illustrate how much the overall root mean square deviation of various fields is attributable to the differences in the root mean square value and how much is due to the poor pattern similarity. The MVIE method can be flexibly applied to full fields (including both the mean and anomaly) or anomaly fields depending on the application. We also propose a multivariable integrated evaluation index (MIEI) which takes the amplitude and pattern similarity of multiple scalar fields into account. The MIEI is expected to provide a more accurate evaluation of model performance in simulating multiple fields. The MIEI, VFE diagram, and commonly used statistical metrics for individual variables constitute a hierarchical evaluation methodology, which can provide a more comprehensive evaluation of model performance.


2016 ◽  
Author(s):  
Ben Kravitz ◽  
Cary Lynch ◽  
Corinne Hartin ◽  
Ben Bond-Lamberty

Abstract. Pattern scaling is a well established method for approximating modeled spatial distributions of changes in temperature by assuming a time-invariant pattern that scales with changes in global mean temperature. We compare three methods of pattern scaling for precipitation (regression, epoch difference, and a physically-based method) and evaluate which methods are “better” in particular circumstances by quantifying their robustness to interpolation/extrapolation, inter-model variations, and inter-scenario variations. Although the regression and epoch difference methods (the two most commonly used methods of pattern scaling) have better absolute performance in reconstructing the climate model output by two orders of magnitude (measured as an area-weighted root mean square error), the physically-based method shows a greater degree of robustness (less relative root-mean-square variation than the other two methods) and could be a particularly advantageous method if outstanding biases could be reduced. We decompose the precipitation response in the RCP8.5 scenario into a CO2 portion and a non-CO2 portion; these two patterns oppose each other in sign. Due to low signal-to-noise ratios, extrapolating RCP8.5 patterns to re- construct precipitation change in the RCP2.6 scenario results in double the error of reconstructing the RCP8.5 scenario for the regression and epoch difference methods. The methodologies discussed in this paper can help provide precipitation fields for other models (including integrated assessment models or impacts assessment models) for a wide variety of scenarios of future climate change.


2014 ◽  
Vol 7 (3) ◽  
pp. 1247-1250 ◽  
Author(s):  
T. Chai ◽  
R. R. Draxler

Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error, and thus the MAE would be a better metric for that purpose. While some concerns over using RMSE raised by Willmott and Matsuura (2005) and Willmott et al. (2009) are valid, the proposed avoidance of RMSE in favor of MAE is not the solution. Citing the aforementioned papers, many researchers chose MAE over RMSE to present their model evaluation statistics when presenting or adding the RMSE measures could be more beneficial. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric, whereas Willmott et al. (2009) indicated that the sums-of-squares-based statistics do not satisfy this rule. In the end, we discussed some circumstances where using the RMSE will be more beneficial. However, we do not contend that the RMSE is superior over the MAE. Instead, a combination of metrics, including but certainly not limited to RMSEs and MAEs, are often required to assess model performance.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5876
Author(s):  
Mohsen Sharifi Renani ◽  
Abigail M. Eustace ◽  
Casey A. Myers ◽  
Chadd W. Clary

Gait analysis based on inertial sensors has become an effective method of quantifying movement mechanics, such as joint kinematics and kinetics. Machine learning techniques are used to reliably predict joint mechanics directly from streams of IMU signals for various activities. These data-driven models require comprehensive and representative training datasets to be generalizable across the movement variability seen in the population at large. Bottlenecks in model development frequently occur due to the lack of sufficient training data and the significant time and resources necessary to acquire these datasets. Reliable methods to generate synthetic biomechanical training data could streamline model development and potentially improve model performance. In this study, we developed a methodology to generate synthetic kinematics and the associated predicted IMU signals using open source musculoskeletal modeling software. These synthetic data were used to train neural networks to predict three degree-of-freedom joint rotations at the hip and knee during gait either in lieu of or along with previously measured experimental gait data. The accuracy of the models’ kinematic predictions was assessed using experimentally measured IMU signals and gait kinematics. Models trained using the synthetic data out-performed models using only the experimental data in five of the six rotational degrees of freedom at the hip and knee. On average, root mean square errors in joint angle predictions were improved by 38% at the hip (synthetic data RMSE: 2.3°, measured data RMSE: 4.5°) and 11% at the knee (synthetic data RMSE: 2.9°, measured data RMSE: 3.3°), when models trained solely on synthetic data were compared to measured data. When models were trained on both measured and synthetic data, root mean square errors were reduced by 54% at the hip (measured + synthetic data RMSE: 1.9°) and 45% at the knee (measured + synthetic data RMSE: 1.7°), compared to measured data alone. These findings enable future model development for different activities of clinical significance without the burden of generating large quantities of gait lab data for model training, streamlining model development, and ultimately improving model performance.


Metals ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1549
Author(s):  
Francis Gyakwaa ◽  
Tuomas Alatarvas ◽  
Qifeng Shu ◽  
Matti Aula ◽  
Timo Fabritius

Steel quality and properties can be affected by the formation of complex inclusions, including Ti-based inclusions such as TiN and Ti2O3 and oxides like Al2O3 and MgO·Al2O3 (MA). This study assessed the prospective use of Raman spectroscopy to characterize synthetic binary inclusion samples of TiN–Al2O3, TiN–MA, Ti2O3–MA, and Ti2O3–Al2O3 with varying phase fractions. The relative intensities of the Raman peaks were used for qualitative evaluation and linear regression calibration models were used for the quantitative prediction of individual phases. The model performance was evaluated with root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP). For the raw Raman spectra data, R2 values were between 0.48–0.98, the RMSECV values varied between 3.26–14.60 wt%, and the RMSEP ranged between 2.98–15.01 wt% for estimating the phases. The SNV Raman spectra data had estimated R2 values within 0.94–0.99 and RMSECV and RMSEP values ranged between 2.50–3.26 wt% and 2.80–9.01 wt%, respectively, showing improved model performance. The study shows that the specific phases of TiN, Al2O3, MA, and Ti2O3 in synthetic inclusion mixtures of TiN–(Al2O3 or MA) and Ti2O3–(Al2O3 or MA) could be characterized by the Raman spectroscopy.


2019 ◽  
Vol 16 (17) ◽  
pp. 3457-3474 ◽  
Author(s):  
Marcos A. S. Scaranello ◽  
Michael Keller ◽  
Marcos Longo ◽  
Maiza N. dos-Santos ◽  
Veronika Leitold ◽  
...  

Abstract. Coarse dead wood is an important component of forest carbon stocks, but it is rarely measured in Amazon forests and is typically excluded from regional forest carbon budgets. Our study is based on line intercept sampling for fallen coarse dead wood conducted along 103 transects with a total length of 48 km matched with forest inventory plots where standing coarse dead wood was measured in the footprints of larger areas of airborne lidar acquisitions. We developed models to relate lidar metrics and Landsat time series variables to coarse dead wood stocks for intact, logged, burned, or logged and burned forests. Canopy characteristics such as gap area produced significant individual relations for logged forests. For total fallen plus standing coarse dead wood (hereafter defined as total coarse dead wood), the relative root mean square error for models with only lidar metrics ranged from 33 % in logged forest to up to 36 % in burned forests. The addition of historical information improved model performance slightly for intact forests (31 % against 35 % relative root mean square error), not justifying the use of a number of disturbance events from historical satellite images (Landsat) with airborne lidar data. Lidar-derived estimates of total coarse dead wood compared favorably with independent ground-based sampling for areas up to several hundred hectares. The relations found between total coarse dead wood and variables quantifying forest structure derived from airborne lidar highlight the opportunity to quantify this important but rarely measured component of forest carbon over large areas in tropical forests.


2014 ◽  
Vol 7 (1) ◽  
pp. 1525-1534 ◽  
Author(s):  
T. Chai ◽  
R. R. Draxler

Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error and thus the MAE would be a better metric for that purpose. Their paper has been widely cited and may have influenced many researchers in choosing MAE when presenting their model evaluation statistics. However, we contend that the proposed avoidance of RMSE and the use of MAE is not the solution to the problem. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric.


2016 ◽  
Vol 9 (12) ◽  
pp. 4365-4380 ◽  
Author(s):  
Zhongfeng Xu ◽  
Zhaolu Hou ◽  
Ying Han ◽  
Weidong Guo

Abstract. Vector quantities, e.g., vector winds, play an extremely important role in climate systems. The energy and water exchanges between different regions are strongly dominated by wind, which in turn shapes the regional climate. Thus, how well climate models can simulate vector fields directly affects model performance in reproducing the nature of a regional climate. This paper devises a new diagram, termed the vector field evaluation (VFE) diagram, which is a generalized Taylor diagram and able to provide a concise evaluation of model performance in simulating vector fields. The diagram can measure how well two vector fields match each other in terms of three statistical variables, i.e., the vector similarity coefficient, root mean square length (RMSL), and root mean square vector difference (RMSVD). Similar to the Taylor diagram, the VFE diagram is especially useful for evaluating climate models. The pattern similarity of two vector fields is measured by a vector similarity coefficient (VSC) that is defined by the arithmetic mean of the inner product of normalized vector pairs. Examples are provided, showing that VSC can identify how close one vector field resembles another. Note that VSC can only describe the pattern similarity, and it does not reflect the systematic difference in the mean vector length between two vector fields. To measure the vector length, RMSL is included in the diagram. The third variable, RMSVD, is used to identify the magnitude of the overall difference between two vector fields. Examples show that the VFE diagram can clearly illustrate the extent to which the overall RMSVD is attributed to the systematic difference in RMSL and how much is due to the poor pattern similarity.


2016 ◽  
Author(s):  
Zhongfeng Xu ◽  
Zhaolu Hou ◽  
Ying Han ◽  
Weidong Guo

Abstract. Vector quantities, e.g. vector winds, play an extremely important role in climate system. Energy and water exchanges between different regions are strongly dominated by wind, which in turn shapes regional climate. Thus, how well climate models can simulate vector fields directly affect model performance in reproducing the nature of regional climate. The paper devises a new diagram, termed vector field evaluation (VFE) diagram, which is very similar to Taylor diagram but to provide a concise evaluation of model performance in simulating vector fields. The diagram can measure how well of two vector fields match each other in terms of three statistical variables, i.e. vector similarity coefficient, root-mean-square (RMS) length (RMSL), and RMS vector difference (RMSVD). As the Taylor diagram, the VFE diagram is especially useful in evaluating climate models. The pattern similarity of two vector fields is measured by a vector similarity coefficient (VSC) that is defined by the arithmetic mean of inner product of normalized vector pairs. Examples are given showing that VSC can well identify how close one vector field resemble another. Note that VSC can only describe the pattern similarity and do not reflect the systematic difference in the mean vector length between two vector fields. To measure the vector length, RMSL is included in the diagram. The third variable, RMSVD, is used to identify the magnitude of overall difference between two vector fields. Examples show that the new diagram can clearly illustrate how much the overall RMSVD is attributed to the systematic difference in RMSL and how much is due to the poor pattern similarity.


Sign in / Sign up

Export Citation Format

Share Document