scholarly journals Comparison of statistical indices for the evaluation of crop models performance

2021 ◽  
Vol 74 (3) ◽  
pp. 9675-9684
Author(s):  
Tatiana María Saldaña Villota ◽  
José Miguel Cotes Torres

This study presents a comparison of the usual statistical methods used for crop model assessment. A case study was conducted using a data set from observations of the total dry weight in diploid potato crop, and six simulated data sets derived from the observationsaimed to predict the measured data. Statistical indices such as the coefficient of determination, the root mean squared error, the relative root mean squared error, mean error, index of agreement, modified index of agreement, revised index of agreement, modeling efficiency, and revised modeling efficiency were compared. The results showed that the coefficient of determination is not a useful statistical index for model evaluation. The root mean squared error together with the relative root mean squared error offer an excellent notion of how deviated the simulations are in the same unit of the variable and percentage terms, and they leave no doubt when evaluating the quality of the simulations of a model.

2021 ◽  
Vol 19 (1) ◽  
pp. 2-20
Author(s):  
Piyush Kant Rai ◽  
Alka Singh ◽  
Muhammad Qasim

This article introduces calibration estimators under different distance measures based on two auxiliary variables in stratified sampling. The theory of the calibration estimator is presented. The calibrated weights based on different distance functions are also derived. A simulation study has been carried out to judge the performance of the proposed estimators based on the minimum relative root mean squared error criterion. A real-life data set is also used to confirm the supremacy of the proposed method.


2020 ◽  
Vol 10 (5) ◽  
pp. 1751 ◽  
Author(s):  
Wonsuk Ko ◽  
Hamsakutty Vettikalladi ◽  
Seung-Ho Song ◽  
Hyeong-Jin Choi

In this paper, we show the development of a demand-side management solution (DSMS) for demand response (DR) aggregator and actual demand response operation cases in South Korea. To show an experience, Korea’s demand response market outline, functions of DSMS, real contracted capacity, and payment between consumer and load aggregator and DR operation cases are revealed. The DSMS computes the customer baseline load (CBL), relative root mean squared error (RRMSE), and payments of the customers in real time. The case of 10 MW contracted customers shows 108.03% delivery rate and a benefit of 854,900,394 KRW for two years. The results illustrate that an integrated demand-side management solution contributes by participating in a DR market and gives a benefit and satisfaction to the consumer.


2019 ◽  
pp. 1-6

The objective of this study was to test the efficiency of the Hydraulic Pedotrans- fer Functions (PTFs) employed in the Decision Support System for Agrotechnol- ogy Transfer – Crop Simulation Model (DSSAT-CSM) in modeling topsoil WHC in Northern Guinea Savanna (NGS) and Sudan Savanna (SS) of Kano State in Nigeria. Coefficient of determination (R2), Root Mean Squared Error (RMSE), and Index of Agreement (d-index) were the three statistical methods used to test the fitness between predicted, and laboratory observed WHC of dis- turbed, auger sampled topsoil. Findings of the study established that the PTFs fitted in the algorithm of DSSAT-CSM soil water sub module made a significant topsoil WHC estimation in NGS with statistics R² = 0.352, RMSE = 0.03, and d- Index = 0.71. However, the model did not estimate the WHC validly in Sudan Savanna, with insignificant statistics of R² = 0.031, RMSE of 0.10, and 0.44 as the index of agreement. The conclusion drawn was that DSSAT made fair and poor predictions of topsoil WHC in NGS and SS soils respectively, irrespective of texture and other intrinsic properties. Based on the findings above, we recom- mend the development of local PTFs alternatives to be used with DSSAT’s algo- rithm for Nigerian Savanna soil


2017 ◽  
Vol 47 (6) ◽  
pp. 703-715 ◽  
Author(s):  
Francisco Mauro ◽  
Zane Haxtema ◽  
Hailemariam Temesgen

Neighborhood-based indices such as mingling index and diameter differentiation are a set of diversity measures that are based on the relationship between a reference tree and a certain number of nearest neighbors (i.e., trees to which it has the lowest horizontal distance). Using stem-mapped data from eight headwater sites, we compared the relative bias and relative root mean square error (relative to the true mean of each site) of several different methods of choosing reference trees for calculation of diameter differentiation ([Formula: see text]) and species mingling ([Formula: see text]) index. Indices were defined using two, three, and four neighbors and methods for selection of the reference tree were random selection of a tree in a fixed-radius plot (FI), random selection of a tree in a variable-radius plot (VA), azimuth selection method (AZ), and nearest tree selection (NT). In general, the relative bias was lower than ±2.5% for [Formula: see text] and lower than ±10% for [Formula: see text] regardless of the method. The FI method consistently had the lowest relative bias and relative root mean squared error. The NT and AZ methods were second in terms of relative root mean squared error for [Formula: see text] and [Formula: see text], respectively. Simplicity of these two methods might outweigh their slightly worse performance.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


2021 ◽  
pp. 1-21
Author(s):  
Elsa Arrua-Duarte ◽  
Marta Migoya-Borja ◽  
Igor Barahona ◽  
Lena C. Quilty ◽  
Sakina J. Rizvi ◽  
...  

Abstract Objective: The Dimensional Anhedonia Rating Scale (DARS) is a novel questionnaire to assess anhedonia of recent validation. In this work we aim to study the equivalence between the traditional paper-and-pencil and the digital format of DARS. Methods: 69 patients filled the DARS in a paper-based and digital versions. We assessed differences between formats (Wilcoxon test), validity of the scales (Kappa and Intraclass Correlation Coefficients), and reliability (Cronbach’s alpha and Guttman’s coefficient). We calculated the Comparative Fit Index and the Root Mean Squared Error associated with the proposed one-factor structure. Results: Total scores were higher for paper-based format. Significant differences between both formats were found for three items. The weighted Kappa coefficient was approximately 0.40 for most of the items. Internal consistency was greater than 0.94, and the Intraclass Correlation Coefficient for the digital version was 0.95 and 0.94 for the paper-and-pencil version (F= 16.7, p < 0.001). Comparative Adjustment Index was 0.97 for the digital DARS and 0.97 for the paper-and-pencil DARS, and Root Mean Squared Error was 0.11 for the digital DARS and 0.10 for the paper-and-pencil DARS. Conclusion: The digital DARS is consistent in many respects to the paper-and-pencil questionnaire, but equivalence with this format cannot be assumed without caution.


2018 ◽  
Vol 4 (1) ◽  
pp. 24
Author(s):  
Imam Halimi ◽  
Wahyu Andhyka Kusuma

Investasi saham merupakan hal yang tidak asing didengar maupun dilakukan. Ada berbagai macam saham di Indonesia, salah satunya adalah Indeks Harga Saham Gabungan (IHSG) atau dalam bahasa inggris disebut Indonesia Composite Index, ICI, atau IDX Composite. IHSG merupakan parameter penting yang dipertimbangkan pada saat akan melakukan investasi mengingat IHSG adalah saham gabungan. Penelitian ini bertujuan memprediksi pergerakan IHSG dengan teknik data mining menggunakan algoritma neural network dan dibandingkan dengan algoritma linear regression, yang dapat dijadikan acuan investor saat akan melakukan investasi. Hasil dari penelitian ini berupa nilai Root Mean Squared Error (RMSE) serta label tambahan angka hasil prediksi yang didapatkan setelah dilakukan validasi menggunakan sliding windows validation dengan hasil paling baik yaitu pada pengujian yang menggunakan algoritma neural network yang menggunakan windowing yaitu sebesar 37,786 dan pada pengujian yang tidak menggunakan windowing sebesar 13,597 dan untuk pengujian algoritma linear regression yang menggunakan windowing yaitu sebesar 35,026 dan pengujian yang tidak menggunakan windowing sebesar 12,657. Setelah dilakukan pengujian T-Test menunjukan bahwa pengujian menggunakan neural network yang dibandingkan dengan linear regression memiliki hasil yang tidak signifikan dengan nilai T-Test untuk pengujian dengan windowing dan tanpa windowing hasilnya sama, yaitu sebesar 1,000.


2021 ◽  
Vol 13 (22) ◽  
pp. 4675
Author(s):  
William Yamada ◽  
Wei Zhao ◽  
Matthew Digman

An automatic method of obtaining geographic coordinates of bales using monovision un-crewed aerial vehicle imagery was developed utilizing a data set of 300 images with a 20-megapixel resolution containing a total of 783 labeled bales of corn stover and soybean stubble. The relative performance of image processing with Otsu’s segmentation, you only look once version three (YOLOv3), and region-based convolutional neural networks was assessed. As a result, the best option in terms of accuracy and speed was determined to be YOLOv3, with 80% precision, 99% recall, 89% F1 score, 97% mean average precision, and a 0.38 s inference time. Next, the impact of using lower-cost cameras was evaluated by reducing image quality to one megapixel. The lower-resolution images resulted in decreased performance, with 79% precision, 97% recall, 88% F1 score, 96% mean average precision, and 0.40 s inference time. Finally, the output of the YOLOv3 trained model, density-based spatial clustering, photogrammetry, and map projection were utilized to predict the geocoordinates of the bales with a root mean squared error of 2.41 m.


10.2196/27386 ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. e27386
Author(s):  
Qingyu Chen ◽  
Alex Rankine ◽  
Yifan Peng ◽  
Elaheh Aghaarabi ◽  
Zhiyong Lu

Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.


2005 ◽  
Vol 5 ◽  
pp. 89-97 ◽  
Author(s):  
P. Krause ◽  
D. P. Boyle ◽  
F. Bäse

Abstract. The evaluation of hydrologic model behaviour and performance is commonly made and reported through comparisons of simulated and observed variables. Frequently, comparisons are made between simulated and measured streamflow at the catchment outlet. In distributed hydrological modelling approaches, additional comparisons of simulated and observed measurements for multi-response validation may be integrated into the evaluation procedure to assess overall modelling performance. In both approaches, single and multi-response, efficiency criteria are commonly used by hydrologists to provide an objective assessment of the "closeness" of the simulated behaviour to the observed measurements. While there are a few efficiency criteria such as the Nash-Sutcliffe efficiency, coefficient of determination, and index of agreement that are frequently used in hydrologic modeling studies and reported in the literature, there are a large number of other efficiency criteria to choose from. The selection and use of specific efficiency criteria and the interpretation of the results can be a challenge for even the most experienced hydrologist since each criterion may place different emphasis on different types of simulated and observed behaviours. In this paper, the utility of several efficiency criteria is investigated in three examples using a simple observed streamflow hydrograph.


Sign in / Sign up

Export Citation Format

Share Document