Bootstrap‐Based Inference for Cube Root Asymptotics

This paper proposes a valid bootstrap‐based distributional approximation for M‐estimators exhibiting a Chernoff (1964)‐type limiting distribution. For estimators of this kind, the standard nonparametric bootstrap is inconsistent. The method proposed herein is based on the nonparametric bootstrap, but restores consistency by altering the shape of the criterion function defining the estimator whose distribution we seek to approximate. This modification leads to a generic and easy‐to‐implement resampling method for inference that is conceptually distinct from other available distributional approximations. We illustrate the applicability of our results with four examples in econometrics and machine learning.

Download Full-text

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Download Full-text

ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION

BACA JURNAL DOKUMENTASI DAN INFORMASI ◽

10.14203/j.baca.v41i2.702 ◽

2020 ◽

Vol 41 (2) ◽

pp. 133

Author(s):

Ariani Indrawati ◽

Hendro Subagyo ◽

Andre Sihombing ◽

Wagiyah Wagiyah ◽

Sjaeful Afandi

Keyword(s):

Machine Learning ◽

Comparative Research ◽

Scientific Journal ◽

Imbalanced Data ◽

Machine Learning Algorithms ◽

Unstructured Data ◽

Resampling Methods ◽

Classifier Performance ◽

Resampling Method ◽

The Impact

The extremely skewed data in artificial intelligence, machine learning, and data mining cases are often given misleading results. It is caused because machine learning algorithms are designated to work best with balanced data. However, we often meet with imbalanced data in the real situation. To handling imbalanced data issues, the most popular technique is resampling the dataset to modify the number of instances in the majority and minority classes into a standard balanced data. Many resampling techniques, oversampling, undersampling, or combined both of them, have been proposed and continue until now. Resampling techniques may increase or decrease the classifier performance. Comparative research on resampling methods in structured data has been widely carried out, but studies that compare resampling methods with unstructured data are very rarely conducted. That raises many questions, one of which is whether this method is applied to unstructured data such as text that has large dimensions and very diverse characters. To understand how different resampling techniques will affect the learning of classifiers for imbalanced data text, we perform an experimental analysis using various resampling methods with several classification algorithms to classify articles at the Indonesian Scientific Journal Database (ISJD). From this experiment, it is known resampling techniques on imbalanced data text generally to improve the classifier performance but they are doesn’t give significant result because data text has very diverse and large dimensions.

Download Full-text

Bootstrap Resampling of Temporal Dominance of Sensations Curves to Compute Uncertainties

Foods ◽

10.3390/foods10102472 ◽

2021 ◽

Vol 10 (10) ◽

pp. 2472

Author(s):

Shogo Okamoto

Keyword(s):

Machine Learning ◽

Monte Carlo Simulation ◽

Monte Carlo ◽

Large Sample Size ◽

Bootstrap Resampling ◽

Learning Approaches ◽

Simulation Studies ◽

Normal Distributions ◽

Resampling Method ◽

Temporal Dominance

In the last decade, temporal dominance of sensations (TDS) methods have proven to be potent approaches in the field of food sciences. Accordingly, thus far, methods for analyzing TDS curves, which are the major outputs of TDS methods, have been developed. This study proposes a method of bootstrap resampling for TDS tasks. The proposed method enables the production of random TDS curves to estimate the uncertainties, that is, the 95% confidence interval and standard error of the curves. Based on Monte Carlo simulation studies, the estimated uncertainties are considered valid and match those estimated by approximated normal distributions with the number of independent TDS tasks or samples being 50–100 or greater. The proposed resampling method enables researchers to apply statistical analyses and machine-learning approaches that require a large sample size of TDS curves.

Download Full-text

The limiting distribution of a recursive resampling procedure

Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics ◽

10.1017/s1446788700038088 ◽

1995 ◽

Vol 58 (1) ◽

pp. 47-53

Author(s):

Zheng Zukang ◽

Wu Lipeng

Keyword(s):

Distribution Function ◽

Beta Distribution ◽

Empirical Distribution Function ◽

Empirical Distribution ◽

Random Variable ◽

Limiting Distribution ◽

Wide Sense ◽

Resampling Method ◽

Resampling Procedure ◽

Unconditional Variance

AbstractA recursive resampling method is discussed in this paper. Let X1, X2,…, Xn, be i.i.d. random variables with distribution function F and construct the empirical distribution function Fn. A new sample Xn+1 is drawn from Fn and the new empirical distribution function 1 in the wide sense, is computed from X1, X2,…, Xn, Xn+1. Then Xn+2 is drawn from 1 and 2 is obtained. In this way, Xn+m and m are found. It will be proved that m converges to a random variable almost surely as m goes to infinity and the limiting distribution is a compound beta distribution. In comparison with the usual non-recursive bootstrap, the main advantage of this procedure is a reduction in unconditional variance.

Download Full-text

Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method

Statistics in Medicine ◽

10.1002/sim.7263 ◽

2017 ◽

Cited By ~ 14

Author(s):

Alok Kumar Dwivedi ◽

Indika Mallawaarachchi ◽

Luis A. Alvarado

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

Nonparametric Bootstrap ◽

Bootstrap Test ◽

Resampling Method

Download Full-text

ASYMPTOTIC THEORY ON THE LEAST SQUARES ESTIMATION OF THRESHOLD MOVING-AVERAGE MODELS

Econometric Theory ◽

10.1017/s026646661200045x ◽

2012 ◽

Vol 29 (3) ◽

pp. 482-516 ◽

Cited By ~ 12

Author(s):

Dong Li ◽

Shiqing Ling ◽

Wai Keung Li

Keyword(s):

Least Squares ◽

Asymptotic Theory ◽

Moving Average ◽

Compound Poisson Process ◽

Least Squares Estimation ◽

Limiting Distribution ◽

Asymptotically Normal ◽

Finite Samples ◽

Resampling Method ◽

Moving Average Model

This paper studies the asymptotic theory of least squares estimation in a threshold moving average model. Under some mild conditions, it is shown that the estimator of the threshold is n-consistent and its limiting distribution is related to a two-sided compound Poisson process, whereas the estimators of other coefficients are strongly consistent and asymptotically normal. This paper also provides a resampling method to tabulate the limiting distribution of the estimated threshold in practice, which is the first successful effort in this direction. This resampling method contributes to threshold literature. Simultaneously, simulation studies are carried out to assess the performance of least squares estimation in finite samples.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text