Reduced Perplexity

2020 ◽  
pp. 325-346
Author(s):  
Kenric P. Nelson

This chapter introduces a simple, intuitive approach to the assessment of probabilistic inferences. The Shannon information metrics are translated to the probability domain. The translation shows that the negative logarithmic score and the geometric mean are equivalent measures of the accuracy of a probabilistic inference. The geometric mean of forecasted probabilities is thus a measure of forecast accuracy and represents the central tendency of the forecasts. The reciprocal of the geometric mean is referred to as the perplexity and defines the number of independent choices needed to resolve the uncertainty. The assessment method introduced in this chapter is intended to reduce the ‘qualitative’ perplexity relative to the potpourri of scoring rules currently used to evaluate machine learning and other probabilistic algorithms. Utilization of this assessment will provide insight into designing algorithms with reduced the ‘quantitative’ perplexity and thus improved the accuracy of probabilistic forecasts. The translation of information metrics to the probability domain is incorporating the generalized entropy functions developed Rényi and Tsallis. Both generalizations translate to the weighted generalized mean. The generalized mean of probabilistic forecasts forms a spectrum of performance metrics referred to as a Risk Profile. The arithmetic mean is used to measure the decisiveness, while the –2/3 mean is used to measure the robustness.

2021 ◽  
Vol 29 (3) ◽  
Author(s):  
Péter Orosz ◽  
Tamás Tóthfalusi

AbstractThe increasing number of Voice over LTE deployments and IP-based voice services raise the demand for their user-centric service quality monitoring. This domain’s leading challenge is measuring user experience quality reliably without performing subjective assessments or applying the standard full-reference objective models. While the former is time- and resource-consuming and primarily executed ad-hoc, the latter depends upon a reference source and processes the voice payload that may offend user privacy. This paper presents a packet-level measurement method (introducing a novel metric set) to objectively assess network and service quality online. It is accomplished without inspecting the voice payload and needing the reference voice sample. The proposal has three contributions: (i) our method focuses on the timeliness of the media traffic. It introduces new performance metrics that describe and measure the service’s time-domain behavior from the voice application viewpoint. (ii) Based on the proposed metrics, we also present a no-reference Quality of Experience (QoE) estimation model. (iii) Additionally, we propose a new method to identify the pace of the speech (slow or dynamic) as long as voice activity detection (VAD) is present between the endpoints. This identification supports the introduced quality model to estimate the perceived quality with higher accuracy. The performance of the proposed model is validated against a full-reference voice quality estimation model called AQuA, using real VoIP traffic (originated in assorted voice samples) in controlled transmission scenarios.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Tawfiq Hasanin ◽  
Taghi M. Khoshgoftaar ◽  
Joffrey L. Leevy ◽  
Richard A. Bauder

AbstractSevere class imbalance between majority and minority classes in Big Data can bias the predictive performance of Machine Learning algorithms toward the majority (negative) class. Where the minority (positive) class holds greater value than the majority (negative) class and the occurrence of false negatives incurs a greater penalty than false positives, the bias may lead to adverse consequences. Our paper incorporates two case studies, each utilizing three learners, six sampling approaches, two performance metrics, and five sampled distribution ratios, to uniquely investigate the effect of severe class imbalance on Big Data analytics. The learners (Gradient-Boosted Trees, Logistic Regression, Random Forest) were implemented within the Apache Spark framework. The first case study is based on a Medicare fraud detection dataset. The second case study, unlike the first, includes training data from one source (SlowlorisBig Dataset) and test data from a separate source (POST dataset). Results from the Medicare case study are not conclusive regarding the best sampling approach using Area Under the Receiver Operating Characteristic Curve and Geometric Mean performance metrics. However, it should be noted that the Random Undersampling approach performs adequately in the first case study. For the SlowlorisBig case study, Random Undersampling convincingly outperforms the other five sampling approaches (Random Oversampling, Synthetic Minority Over-sampling TEchnique, SMOTE-borderline1 , SMOTE-borderline2 , ADAptive SYNthetic) when measuring performance with Area Under the Receiver Operating Characteristic Curve and Geometric Mean metrics. Based on its classification performance in both case studies, Random Undersampling is the best choice as it results in models with a significantly smaller number of samples, thus reducing computational burden and training time.


2020 ◽  
Author(s):  
Raphael Schneider ◽  
Hans Jørgen Henriksen ◽  
Simon Stisen

<p>The Continuous Ranked Probability Score (CRPS) is a popular evaluation tool for probabilistic forecasts. We suggest using it, outside its original scope, as an objective function in the calibration of large-scale groundwater models, due to its robustness to large residuals in the calibration data.</p><p>Groundwater models commonly require their parameters to be estimated in an optimization where some objective function measuring the model’s performance is to be minimized. Many performance metrics are squared error-based, which are known to be sensitive to large values or outliers. Consequently, an optimization algorithm using squared error-based metrics will focus on reducing the very largest residuals of the model. In many cases, for example when working with large-scale groundwater models in combination with calibration data from large datasets of groundwater heads with varying and unknown quality, there are two issues with that focus on the largest residuals: Such outliers are often i) related to observational uncertainty or ii) model structural uncertainty and model scale. Hence, fitting groundwater models to such deficiencies can be undesired, and calibration often results in parameter compensation for such deficiencies.</p><p>Therefore, we suggest the use of a CRPS-based objective function that is less sensitive to (the few) large residuals, and instead is more sensitive to fitting the majority of observations with least bias. We apply the novel CRPS-based objective function to the calibration of large-scale coupled surface-groundwater models and compare to conventional squared error-based objective functions. These calibration tests show that the CRPS-based objective function successfully limits the influence of the largest residuals and reduces overall bias. Moreover, it allows for better identification of areas where the model fails to simulate groundwater heads appropriately (e.g. due to model structural errors), that is, where model structure should be investigated.</p><p>Many real-world large-scale hydrological models face similar optimizations problems related to uncertain model structures and large, uncertain calibration datasets where observation uncertainty is hard to quantify. The CRPS-based objective function is an attempt to practically address the shortcomings of squared error minimization in model optimization, and is expected to also be of relevance outside our context of groundwater models.</p>


2020 ◽  
Author(s):  
Alev Mutlu ◽  
Furkan Goz

Abstract Landslide susceptibility assessment is the problem of determining the likelihood of a landslide occurrence in a particular area with respect to the geographical and morphological properties of the area. This paper presents a hybrid method, namely SkySlide, that incorporates clustering, skyline operator, classification and majority voting principle for region-scale landslide susceptibility assessment. Clustering and skyline operator are utilized to model landslides while classification and majority voting principle are utilized to assess landslide susceptibility. The contribution of the study is 2-fold. First, the proposed method requires properties of landslide-occurring data only to model landslides. Second, the proposed method is evaluated on imbalanced data and experimental results include performance metrics of imbalanced data. Experiments conducted on two real-life datasets show that clustering greatly improves performance of SkySlide. Experiments further demonstrate that SkySlide achieves higher class balance accuracy, Matthews correlation coefficient, geometric mean and bookmaker informedness scores compared with the most commonly used methods for landslide susceptibility assessment such as support vector machines, logistic regression and decision trees.


2020 ◽  
Vol 10 ◽  
pp. 38
Author(s):  
Jordan A. Guerra ◽  
Sophie A. Murray ◽  
D. Shaun Bloomfield ◽  
Peter T. Gallagher

One essential component of operational space weather forecasting is the prediction of solar flares. With a multitude of flare forecasting methods now available online it is still unclear which of these methods performs best, and none are substantially better than climatological forecasts. Space weather researchers are increasingly looking towards methods used by the terrestrial weather community to improve current forecasting techniques. Ensemble forecasting has been used in numerical weather prediction for many years as a way to combine different predictions in order to obtain a more accurate result. Here we construct ensemble forecasts for major solar flares by linearly combining the full-disk probabilistic forecasts from a group of operational forecasting methods (ASAP, ASSA, MAG4, MOSWOC, NOAA, and MCSTAT). Forecasts from each method are weighted by a factor that accounts for the method’s ability to predict previous events, and several performance metrics (both probabilistic and categorical) are considered. It is found that most ensembles achieve a better skill metric (between 5% and 15%) than any of the members alone. Moreover, over 90% of ensembles perform better (as measured by forecast attributes) than a simple equal-weights average. Finally, ensemble uncertainties are highly dependent on the internal metric being optimized and they are estimated to be less than 20% for probabilities greater than 0.2. This simple multi-model, linear ensemble technique can provide operational space weather centres with the basis for constructing a versatile ensemble forecasting system – an improved starting point to their forecasts that can be tailored to different end-user needs.


Axioms ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 144
Author(s):  
Radu Iordanescu ◽  
Florin Felix Nichita ◽  
Ovidiu Pasarescu

The main concepts in this paper are the means and Euler type formulas; the generalized mean which incorporates the harmonic mean, the geometric mean, the arithmetic mean, and the quadratic mean can be further generalized. Results on the Euler’s formula, the (modified) Yang–Baxter equation, coalgebra structures, and non-associative structures are also included in the current paper.


Author(s):  
Ji Hoon Seo ◽  
Hyun Woo Jeon ◽  
Joung Sook Choi ◽  
Jong-Ryeul Sohn

Indoor microbiological air quality, including airborne bacteria and fungi, is associated with hospital-acquired infections (HAIs) and emerging as an environmental issue in hospital environment. Many studies have been carried out based on culture-based methods to evaluate bioaerosol level. However, conventional biomonitoring requires laborious process and specialists, and cannot provide data quickly. In order to assess the concentration of bioaerosol in real-time, particles were subdivided according to the aerodynamic diameter for surrogate measurement. Particle number concentration (PNC) and meteorological conditions selected by analyzing the correlation with bioaerosol were included in the prediction model, and the forecast accuracy of each model was evaluated by the mean absolute percentage error (MAPE). The prediction model for airborne bacteria demonstrated highly accurate prediction (R2 = 0.804, MAPE = 8.5%) from PNC1-3, PNC3-5, and PNC5-10 as independent variables. Meanwhile, the fungal prediction model showed reasonable, but weak, prediction results (R2 = 0.489, MAPE = 42.5%) with PNC3-5, PNC5-10, PNC > 10, and relative humidity. As a result of external verification, even when the model was applied in a similar hospital environment, the bioaerosol concentration could be sufficiently predicted. The prediction model constructed in this study can be used as a pre-assessment method for monitoring microbial contamination in indoor environments.


1973 ◽  
Vol 14 (2) ◽  
pp. 123-127
Author(s):  
P. H. Diananda

Let {an} be a sequence of non-negative real numbers. Suppose thatThen M1,n is the arithmetic mean, MO,n the geometric mean, and Mr,n the generalized mean of order r, of a1, a2, …, an. By a result of Everitt [1] and McLaughlin and Metcalf [5], {n(Mr,n–Ms,n)}, where r ≧ l ≧ s, is a monotonic increasing sequence. It follows that this sequence tends to a finite or an infinite limit as n → ∞. Everitt [2, 3] found a necessary and sufficient condition for the finiteness of this limit in the cases r, s = 1, 0 and r ≧ 1 > s > 0. His results are included in the following theorem.


Author(s):  
Gregg Willcox

The aggregation of individual personality assessments to predict team performance is widely accepted in management theory but has significant limitations: the isolated nature of individual personality surveys fails to capture much of the team dynamics that drive realworld team performance. Artificial Swarm Intelligence (ASI)—a technology that enables networked teams to think together in real-time and answer questions as a unified system—promises a solution to these limitations by enabling teams to collectively complete a personality assessment, whereby the team uses ASI to converge upon answers that best represent the group’s disposition. In the present study, the group personality of 94 small teams was assessed by having teams take a standard Big Five Inventory (BFI) assessment both as individuals, and as a realtime system enabled by an ASI technology known as Swarm AI. The predictive accuracy of each personality assessment method was assessed by correlating the BFI personality traits to a range of real-world performance metrics. The results showed that assessments of personality generated using Swarm AI were far more predictive of team performance than the traditional aggregation methods, showing at least a 91.8% increase in average correlation with the measured outcome variables, and in no case showing a significant decrease in predictive performance. This suggests that Swarm AI technology may be used as a highly effective team personality assessment tool that more accurately predicts future team performance than traditional survey approaches.


Sign in / Sign up

Export Citation Format

Share Document