validation metrics
Recently Published Documents


TOTAL DOCUMENTS

73
(FIVE YEARS 25)

H-INDEX

14
(FIVE YEARS 1)

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Ivandro Ortet Lopes ◽  
Deqing Zou ◽  
Francis A Ruambo ◽  
Saeed Akbar ◽  
Bin Yuan

Distributed Denial of Service (DDoS) is a predominant threat to the availability of online services due to their size and frequency. However, developing an effective security mechanism to protect a network from this threat is a big challenge because DDoS uses various attack approaches coupled with several possible combinations. Furthermore, most of the existing deep learning- (DL-) based models pose a high processing overhead or may not perform well to detect the recently reported DDoS attacks as these models use outdated datasets for training and evaluation. To address the issues mentioned earlier, we propose CyDDoS, an integrated intrusion detection system (IDS) framework, which combines an ensemble of feature engineering algorithms with the deep neural network. The ensemble feature selection is based on five machine learning classifiers used to identify and extract the most relevant features used by the predictive model. This approach improves the model performance by processing only a subset of relevant features while reducing the computation requirement. We evaluate the model performance based on CICDDoS2019, a modern and realistic dataset consisting of normal and DDoS attack traffic. The evaluation considers different validation metrics such as accuracy, precision, F1-Score, and recall to argue the effectiveness of the proposed framework against state-of-the-art IDSs.


2021 ◽  
Author(s):  
Zhe Wang ◽  
Ardan Patwardhan ◽  
Gerard J Kleywegt

The Electron Microscopy Data Bank (EMDB) is the central archive of the electron cryo-microscopy (cryo-EM) community for storing and disseminating volume maps and tomograms. With input from the community, EMDB has developed new resources for validation of cryo-EM structures, focussing on the quality of the volume data alone and that of the fit of any models, themselves archived in the Protein Data Bank (PDB), to the volume data. Based on recommendations from community experts, the validation resources are developed in a three-tiered system. Tier 1 covers an extensive and evolving set of validation metrics, including tried and tested as well as more experimental ones, which are calculated for all EMDB entries and presented in the Validation Analysis (VA) web resource. This system is particularly useful for cryo-EM experts, both to validate individual structures and to assess the utility of new validation metrics. Tier 2 comprises a subset of the validation metrics covered by the VA resource that have been subjected to extensive testing and are considered to be useful for specialists as well as non-specialists. These metrics are presented on the entry-specific web pages for the entire archive on the EMDB website. As more experience is gained with the metrics included in the VA resource, it is expected that consensus will emerge in the community regarding a subset that is suitable for inclusion in the tier 2 system. Tier 3, finally, consists of the validation reports and servers that are produced by the Worldwide Protein Data Bank (wwPDB) Consortium. Successful metrics from tier 2 will be proposed for inclusion in the wwPDB validation pipeline and reports. We describe the details of the new resource, with an emphasis on the tier 1 system. The output of all three tiers is publicly available, either through the EMDB website (tiers 1 and 2) or through the wwPDB ftp sites (tier 3), although the content of all three will evolve over time (fastest for tier 1 and slowest for tier 3). It is our hope that these validation resources will help the cryo-EM community to get a better understanding of the quality, and the best ways to assess the quality of cryo-EM structures in EMDB and PDB.


2021 ◽  
Vol 10 (11) ◽  
pp. 735
Author(s):  
Lih Wei Yeow ◽  
Raymond Low ◽  
Yu Xiang Tan ◽  
Lynette Cheah

Point-of-interest (POI) data from map sources are increasingly used in a wide range of applications, including real estate, land use, and transport planning. However, uncertainties in data quality arise from the fact that some of this data are crowdsourced and proprietary validation workflows lack transparency. Comparing data quality between POI sources without standardized validation metrics is a challenge. This study reviews and implements the available POI validation methods, working towards identifying a set of metrics that is applicable across datasets. Twenty-three validation methods were found and categorized. Most methods evaluated positional accuracy, while logical consistency and usability were the least represented. A subset of nine methods was implemented to assess four real-world POI datasets extracted for a highly urbanized neighborhood in Singapore. The datasets were found to have poor completeness with errors of commission and omission, although spatial errors were reasonably low (<60 m). Thematic accuracy in names and place types varied. The move towards standardized validation metrics depends on factors such as data availability for intrinsic or extrinsic methods, varying levels of detail across POI datasets, the influence of matching procedures, and the intended application of POI data.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Roberto Coscarelli ◽  
Giulio Nils Caroletti ◽  
Magnus Joelsson ◽  
Erik Engström ◽  
Tommaso Caloiero

AbstractIn order to correctly detect climate signals and discard possible instrumentation errors, establishing coherent data records has become increasingly relevant. However, since real measurements can be inhomogeneous, their use for assessing homogenization techniques is not directly possible, and the study of their performance must be done on homogeneous datasets subjected to controlled, artificial inhomogeneities. In this paper, considering two European temperature networks over the 1950–2005 period, up to 7 artificial breaks and an average of 107 missing data per station were introduced, in order to determine that mean square error, absolute bias and factor of exceedance can be meaningfully used to validate the best-performing homogenization technique. Three techniques were used, ACMANT and two versions of HOMER: the standard, automated setup mode and a manual setup. Results showed that the HOMER techniques performed better regarding the factor of exceedance, while ACMANT was best with regard to absolute error and root mean square error. Regardless of the technique used, it was also established that homogenization quality anti-correlated meaningfully to the number of breaks. On the other hand, as missing data are almost always replaced in the two HOMER techniques, only ACMANT performance is significantly, negatively affected by the amount of missing data.


Author(s):  
Mwangi Muhindi George ◽  
Geoffrey Mariga Wambugu ◽  
Aaron Mogeni Oirere

This paper provides an Extended Client Based Technique (ECBT) that performs classification on emails using the Bayessian classifier that attain in-depth defense by performing textual analysis on email messages and attachment extensions to detect and flag snooping emails. The technique was implemented using python 3.6 in a jupyter notebook. An experimental research method on a personal computer was used to validate the developed technique using different metrics. The validation results produced a high acceptable percentage rate based on the four calculated validation metrics indicating that the technique was valid. The cosine of similarity showed a high percentage rate of similarity between the validation labels indicating that there is a high rate of similarity between the known and output message labels. The direction for further study on this paper is to conduct a replica experiments, which enhances the classification and flagging of the snooped emails using an advanced classification method.


2021 ◽  
Vol 13 (9) ◽  
pp. 1747
Author(s):  
Shanlei Sun ◽  
Jiazhi Wang ◽  
Wanrong Shi ◽  
Rongfan Chai ◽  
Guojie Wang

Assessing satellite-based precipitation product capacity for detecting precipitation and linear trends is fundamental for accurately knowing precipitation characteristics and changes, especially for regions with scarce and even no observations. In this study, we used daily gauge observations across the Huai River Basin (HRB) during 1983–2012 and four validation metrics to evaluate the Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) capacity for detecting extreme precipitation and linear trends. The PERSIANN-CDR well captured climatologic characteristics of the precipitation amount- (PRCPTOT, R85p, R95p, and R99p), duration- (CDD and CWD), and frequency-based indices (R10mm, R20mm, and Rnnmm), followed by moderate performance for the intensity-based indices (Rx1day, R5xday, and SDII). Based on different validation metrics, the PERSIANN-CDR capacity to detect extreme precipitation varied spatially, and meanwhile the validation metric-based performance differed among these indices. Furthermore, evaluation of the PERSIANN-CDR linear trends indicated that this product had a much limited and even no capacity to represent extreme precipitation changes across the HRB. Briefly, this study provides a significant reference for PERSIANN-CDR developers to use to improve product accuracy from the perspective of extreme precipitation, and for potential users in the HRB.


Author(s):  
Nathan W. Porter ◽  
Kathryn A. Maupin ◽  
Laura P. Swiler ◽  
Vincent A. Mousseau

Abstract The modern scientific process often involves the development of a predictive computational model. To improve its accuracy, a computational model can be calibrated to a set of experimental data. A variety of validation metrics can be used to quantify this process. Some of these metrics have direct physical interpretations and a history of use, while others, especially those for probabilistic data, are more difficult to interpret. In this work, a variety of validation metrics are used to quantify the accuracy of different calibration methods. Frequentist and Bayesian perspectives are used with both fixed effects and mixed-effects statistical models. Through a quantitative comparison of the resulting distributions, the most accurate calibration method can be selected. Two examples are included which compare the results of various validation metrics for different calibration methods. It is quantitatively shown that, in the presence of significant laboratory biases, a fixed effects calibration is significantly less accurate than a mixed-effects calibration. This is because the mixed-effects statistical model better characterizes the underlying parameter distributions than the fixed effects model. The results suggest that validation metrics can be used to select the most accurate calibration model for a particular empirical model with corresponding experimental data.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Agnieszka Gajewicz-Skretna ◽  
Supratik Kar ◽  
Magdalena Piotrowska ◽  
Jerzy Leszczynski

AbstractThe ability of accurate predictions of biological response (biological activity/property/toxicity) of a given chemical makes the quantitative structure‐activity/property/toxicity relationship (QSAR/QSPR/QSTR) models unique among the in silico tools. In addition, experimental data of selected species can also be used as an independent variable along with other structural as well as physicochemical variables to predict the response for different species formulating quantitative activity–activity relationship (QAAR)/quantitative structure–activity–activity relationship (QSAAR) approach. Irrespective of the models' type, the developed model's quality, and reliability need to be checked through multiple classical stringent validation metrics. Among the validation metrics, error-based metrics are more significant as the basic idea of a good predictive model is to improve the predictions' quality by lowering the predicted residuals for new query compounds. Following the concept, we have checked the predictive quality of the QSAR and QSAAR models employing kernel-weighted local polynomial regression (KwLPR) approach over the traditional linear and non-linear regression-based approaches tools such as multiple linear regression (MLR) and k nearest neighbors (kNN). Five datasets which were previously modeled using linear and non-linear regression method were considered to implement the KwPLR approach, followed by comparison of their validation metrics outcomes. For all five cases, the KwLPR based models reported better results over the traditional approaches. The present study's focus is not to develop a better or improved QSAR/QSAAR model over the previous ones, but to demonstrate the advantage, prediction power, and reliability of the KwLPR algorithm and establishing it as a novel, powerful cheminformatic tool. To facilitate the use of the KwLPR algorithm for QSAR/QSPR/QSTR/QSAAR modeling, the authors provide an in-house developed KwLPR.RMD script under the open-source R programming language.


Mathematics ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 156
Author(s):  
Darío Ramos-López ◽  
Ana D. Maldonado

Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered.


2021 ◽  
Vol 77 (1) ◽  
pp. 48-61
Author(s):  
Dorothee Liebschner ◽  
Pavel V. Afonine ◽  
Nigel W. Moriarty ◽  
Billy K. Poon ◽  
Vincent B. Chen ◽  
...  

The field of electron cryomicroscopy (cryo-EM) has advanced quickly in recent years as the result of numerous technological and methodological developments. This has led to an increase in the number of atomic structures determined using this method. Recently, several tools for the analysis of cryo-EM data and models have been developed within the Phenix software package, such as phenix.real_space_refine for the refinement of atomic models against real-space maps. Also, new validation metrics have been developed for low-resolution cryo-EM models. To understand the quality of deposited cryo-EM structures and how they might be improved, models deposited in the Protein Data Bank that have map resolutions of better than 5 Å were automatically re-refined using current versions of Phenix tools. The results are available on a publicly accessible web page (https://cci.lbl.gov/ceres). The implementation of a Cryo-EM Re-refinement System (CERES) for the improvement of models deposited in the wwPDB, and the results of the re-refinements, are described. Based on these results, contents are proposed for a `cryo-EM Table 1', which summarizes experimental details and validation metrics in a similar way to `Table 1' in crystallography. The consistent use of robust metrics for the evaluation of cryo-EM models and data should accompany every structure deposition and be reported in scientific publications.


Sign in / Sign up

Export Citation Format

Share Document