optimal resampling
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

Biometrika ◽  
2021 ◽  
Author(s):  
Yichao Li ◽  
Wenshuo Wang ◽  
Ke Deng ◽  
Jun S Liu

Abstract Sequential Monte Carlo algorithms have been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays a role of steering the algorithm towards the future dynamics. Several strategies have been used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling, and optimal transport resampling. In the one-dimensional cases, we show that optimal transport resampling is equivalent to stratified resampling on the sorted particles, and they both minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions. In general d-dimensional cases, if the particles are first sorted using the Hilbert curve, we show that the variance of stratified resampling is O(m-(1+2/d)), an improved rate compared to the previously known best rate O(m-(1+1/d)), where m is the number of resampled particles. We show this improved rate is optimal for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019).We also present an almost sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that, for dimension d > 1, the mean square error of sequential quasi-Monte Carlo with n particles can be O(n-1-4/{d(d+4)}) if Hilbert curve resampling is used and a specific low-discrepancy set is chosen. To our knowledge, this is the first known convergence rate lower than o(n-1).


Author(s):  
Kathleen R Kerwin ◽  
Nathaniel D Bastian

Predicting fraud is challenging due to inherent issues in the fraud data structure, since the crimes are committed through trickery or deceit with an ever-present moving target of changing modus operandi to circumvent human and system controls. As a national security challenge, criminals continually exploit the electronic financial system to defraud consumers and businesses by finding weaknesses in the system, including in audit controls. This study uses stacked generalization using meta or super learners for improving the performance of algorithms in step one (minimizing the algorithm error rate to reduce its bias in the learning set) and then in step two the results are input into the meta learner with its stacked blended output (with the weakest algorithms learning better). A fundamental key to fraud data is that it is inherently not systematic, and an optimal resampling methodology has yet not been identified. Building a test harness, for all permutations of algorithm sample set pairs, demonstrates that the complex, intrinsic data structures are all thoroughly tested. A comparative analysis on fraud data that applies stacked generalizations provides useful insight to find the optimal mathematical formula for imbalanced fraud data sets necessary to improve upon fraud detection for national security.


Author(s):  
Filippo Galli ◽  
Marco Vannucci ◽  
Valentina Colla

Classification of imbalanced datasets is a critical problem in numerous contexts. In these applications, standard methods are not able to satisfactorily detect rare patterns due to multiple factors that bias the classifiers toward the frequent class. This paper overview a novel family of methods for the resampling of an imbalanced dataset in order to maximize the performance of arbitrary data-driven classifiers. The presented approaches exploit genetic algorithms (GA) for the optimization of the data selection process according to a set of criteria that assess each candidate sample suitability. A comparison among the presented techniques on a set of industrial and literature datasets put into evidence the validity of this family of approaches, which is able not only to improve the performance of a standard classifier but also to determine the optimal resampling rate automatically. Future activities for the improvement of the proposed approach will include the development of new criteria for the assessment of sample suitability.


2015 ◽  
Vol 42 (6Part40) ◽  
pp. 3694-3694 ◽  
Author(s):  
S Shrestha ◽  
S Vedantham ◽  
A Karellas ◽  
R Bellazzini ◽  
G Spandre ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-6
Author(s):  
Zhou Xing ◽  
Diao Xingchun ◽  
Cao Jianjun

Classifiers are often used in entity resolution to classify record pairs into matches, nonmatches, and possible matches, the performance of classifiers is directly related to the performance of entity resolution. In this paper, we develop a multiple classifier system using resampling and ensemble selection. We make full use of the characteristics of entity resolution to distinguish ambiguous instances before classification, so that the algorithm can focus on the ambiguous instances in parallel. Instead of developing an empirical optimal resampling ratio, we vary the ratio in a range to generate multiple resampled data. Further, we use the resampled data to train multiple classifiers and then use ensemble selection to select the best classifiers subset, which is also the best resampling ratio combination. Empirical study shows our method has a relatively high accuracy compared to other state-of-the-art multiple classifiers systems.


Biometrics ◽  
2013 ◽  
Vol 69 (3) ◽  
pp. 693-702 ◽  
Author(s):  
Christoph Bernau ◽  
Thomas Augustin ◽  
Anne-Laure Boulesteix

Sign in / Sign up

Export Citation Format

Share Document