Prediction Games and Arcing Algorithms

The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax theorem is an essential ingredient of the convergence proofs. An arcing algorithm is described that converges to the optimal strategy. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Schapire, Freund, Bartlett, and Lee (1997) offered an explanation of why Adaboost works in terms of its ability to produce generally high margins. The empirical comparison of Adaboost to the optimal arcing algorithm shows that their explanation is not complete.

Download Full-text

Variable and sub-optimal responses to a choice problem are a persistent default mode

10.31234/osf.io/p37tg ◽

2019 ◽

Author(s):

Amelia R. Hunt ◽

Warren James ◽

Josephine Reuther ◽

Melissa Spilioti ◽

Eleanor Mackay ◽

...

Keyword(s):

Optimal Strategy ◽

Decision Rules ◽

Optimal Solution ◽

The Other ◽

Rational Decision ◽

Choice Problem ◽

Default Mode ◽

Potential Benefits ◽

Simple Logic ◽

Simple Decision Rule

Here we report persistent choice variability in the presence of a simple decision rule. Two analogous choice problems are presented, both of which involve making decisions about how to prioritize goals. In one version, participants choose a place to stand to throw a beanbag into one of two hoops. In the other, they must choose a place to fixate to detect a target that could appear in one of two boxes. In both cases, participants do not know which of the locations will be the target when they make their choice. The optimal solution to both problems follows the same, simple logic: when targets are close together, standing at/fixating the midpoint is the best choice. When the targets are far apart, accuracy from the midpoint falls, and standing/fixating close to one potential target achieves better accuracy. People do not follow, or even approach, this optimal strategy, despite substantial potential benefits for performance. Two interventions were introduced to try and shift participants from sub-optimal, variable responses to following a fixed, rational rule. First, we put participants into circumstances in which the solution was obvious. After participants correctly solved the problem there, we immediately presented the slightly-less-obvious context. Second, we guided participants to make choices that followed an optimal strategy, and then removed the guidance and let them freely choose. Following both of these interventions, participants immediately returned to a variable, sub-optimal pattern of responding. The results show that while constructing and implementing rational decision rules is possible, making variable responses to choice problems is a strong and persistent default mode. Borrowing concepts from classic animal learning studies, we suggest this default may persist because choice variability can provide opportunities for reinforcement learning.

Download Full-text

Flat Minima

Neural Computation ◽

10.1162/neco.1997.9.1.1 ◽

1997 ◽

Vol 9 (1) ◽

pp. 1-42 ◽

Cited By ~ 156

Author(s):

Sepp Hochreiter ◽

Jürgen Schmidhuber

Keyword(s):

Neural Networks ◽

Error Function ◽

Low Complexity ◽

Generalization Error ◽

Input Output ◽

Generalization Capability ◽

Training Set ◽

Weight Decay ◽

Optimal Brain Surgeon ◽

And Training

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”

Download Full-text

Skunk – The Problem of Short-Termism in Probability

Inquiry@Queen's Undergraduate Research Conference Proceedings ◽

10.24908/iqurcp.8532 ◽

2018 ◽

Author(s):

David Kong

Keyword(s):

Optimal Strategy ◽

The Other ◽

Indefinite Future ◽

Short Termism ◽

Better Than

In the game of Skunk a pair of dice is rolled again and again and as long as you remain “standing” you can keep adding the totals to your score. At any time you can “sit” and then you take home what you have won. However if you are standing and a “one” comes up on either die, the game is over and you lose everything. An optimal strategy is traditionally developed by comparing the expected score of standing with the expected score of sitting. As long as E(standing) > E(sitting), you would continue to roll the dice. As you accumulate points, you begin risking more points. At one point it becomes too risky to go forward. However, we found this traditional methodology to be flawed (though the answer remains the same). This solution focuses only on the expected score in the next roll, instead of factoring in the expected total score that can be gained over indefinite future. The error does not come to light until weanalyze a variation where you are also allowed to choose the number of dice to use. All the equations (using the idea E(standing) > E(sitting)) we solved led us down a misleading and often intractable road. Yet, through reasoning, we figured that there is no strategy better than throwing 1-dice at a time. Proving this is quite difficult, because it had to be shown that the pure 1-dice strategy is better than all of the other (infinite) strategies, since the game can go on forever.

Download Full-text

Abstract WP239: Development and Validation of The California Stroke Subtype (CASS) Scale for Prehospital Identification of Intracerebral Hemorrhage

Stroke ◽

10.1161/str.47.suppl_1.wp239 ◽

2016 ◽

Vol 47 (suppl_1) ◽

Author(s):

Nerses Sanossian ◽

David S Liebeskind ◽

Sidney Starkman ◽

Marc Eckstein ◽

Samuel Stratton ◽

...

Keyword(s):

Atrial Fibrillation ◽

Los Angeles ◽

Intracerebral Hemorrhage ◽

Classification Tree ◽

The Other ◽

Motor Score ◽

Training Set ◽

Hispanic Ethnicity ◽

History Of ◽

Validation Set

Background: We sought to develop a clinical scale identifying intracerebral hemorrhage (ICH) using prospectively collected data elicited by paramedics in the field. Methods: Subjects were enrolled in the Field Administration of Stroke Therapy Magnesium (FAST-MAG) trial of prehospital neuroprotective therapy. Data obtained by paramedics in the field including vital signs, examination, demographic information, and medications. Subjects were randomly put into training (n=1133) and validation (n=567) observations. Logistic regression model using all 26 potential predictors as candidates was fit to the training data using backward stepwise variable selection with a liberal p < 0.10 retention criterion. A classification tree model using 26 potential predictors as candidates was used with a GINI equivalent to a p < 0.10 splitting criterion. Results: 1700 cases were assessed by paramedics a median of 23 (IQR 14-42) minutes after symptom onset and 23% had ICH. Of the 26 candidates, 12 variables were retained in the logistic model. Holding the other factors constant, increasing Los Angeles Motor Score (LAMS), Glasgow Coma Scale (GCS) verbal sub-score, history of hypertension, Hispanic ethnicity, field systolic BP and taking anti platelet medication are associated with increasing risk of ICH. Increasing GCS eyes sub-score, diabetes, atrial fibrillation, valvular heart disease, age, female gender, and Black race are associated with decreasing risk of ICH. The training set model accuracy is 74%, sensitivity 65%, specificity 83% with C= 0.81 and the validation set accuracy is 67%, sensitivity 76%, specificity 58% with C= 0.73. For the classification tree seven predictors are retained: prehospital systolic BP, diastolic BP, Hispanic, ethnicity, history of atrial fibrillation, GCS verbal sub-score, age and Los Angeles Motor Score (LAMS). The tree forms 10 prediction groups (terminal nodes). Five of these ten predict ICH (“positive”) and the other five predict no ICH (“negative”). The training set accuracy for the tree was 71.9% with C=0.782 and the validation set accuracy was 63.2% with C= 0.632. Conclusion: Paramedics can identify ICH in the field with moderate accuracy, allowing the opportunity to develop targeted prehospital therapeutics and care delivery.

Download Full-text

Regularization strategies for deep-learning-based salt model building

Interpretation ◽

10.1190/int-2018-0229.1 ◽

2019 ◽

Vol 7 (4) ◽

pp. T911-T922

Author(s):

Satyakee Sen ◽

Sribharath Kainkaryam ◽

Cen Ong ◽

Arvind Sharma

Keyword(s):

Deep Learning ◽

Large Scale ◽

Model Building ◽

Ground Truth ◽

Real Data ◽

Test Time ◽

Generalization Error ◽

Training Set ◽

Production Scale ◽

Ensemble Strategy

Salt model building has long been considered a severe bottleneck for large-scale 3D seismic imaging projects. It is one of the most time-consuming, labor-intensive, and difficult-to-automate processes in the entire depth imaging workflow requiring significant intervention by domain experts to manually interpret the salt bodies on noisy, low-frequency, and low-resolution seismic images at each iteration of the salt model building process. The difficulty and need for automating this task is well-recognized by the imaging community and has propelled the use of deep-learning-based convolutional neural network (CNN) architectures to carry out this task. However, significant challenges remain for reliable production-scale deployment of CNN-based methods for salt model building. This is mainly due to the poor generalization capabilities of these networks. When used on new surveys, never seen by the CNN models during the training stage, the interpretation accuracy of these models drops significantly. To remediate this key problem, we have introduced a U-shaped encoder-decoder type CNN architecture trained using a specialized regularization strategy aimed at reducing the generalization error of the network. Our regularization scheme perturbs the ground truth labels in the training set. Two different perturbations are discussed: one that randomly changes the labels of the training set, flipping salt labels to sediments and vice versa and the second that smooths the labels. We have determined that such perturbations act as a strong regularizer preventing the network from making highly confident predictions on the training set and thus reducing overfitting. An ensemble strategy is also used for test time augmentation that is shown to further improve the accuracy. The robustness of our CNN models, in terms of reduced generalization error and improved interpretation accuracy is demonstrated with real data examples from the Gulf of Mexico.

Download Full-text

A theoretical and empirical comparison of the temporal exponential random graph model and the stochastic actor-oriented model

Network Science ◽

10.1017/nws.2018.26 ◽

2019 ◽

Vol 7 (1) ◽

pp. 20-51 ◽

Cited By ~ 8

Author(s):

Philip Leifeld ◽

Skyler J. Cranmer

Keyword(s):

Network Analysis ◽

Random Graph ◽

Graph Model ◽

Real Data ◽

The Other ◽

Exponential Random Graph Model ◽

Empirical Comparison ◽

Random Graph Model ◽

Exponential Random Graph ◽

Longitudinal Network Analysis

AbstractThe temporal exponential random graph model (TERGM) and the stochastic actor-oriented model (SAOM, e.g., SIENA) are popular models for longitudinal network analysis. We compare these models theoretically, via simulation, and through a real-data example in order to assess their relative strengths and weaknesses. Though we do not aim to make a general claim about either being superior to the other across all specifications, we highlight several theoretical differences the analyst might consider and find that with some specifications, the two models behave very similarly, while each model out-predicts the other one the more the specific assumptions of the respective model are met.

Download Full-text

Empirical Comparison of Various Discretization Procedures

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001498000567 ◽

1998 ◽

Vol 12 (07) ◽

pp. 1017-1032 ◽

Cited By ~ 10

Author(s):

Petr Berka ◽

Ivan Bruha

Keyword(s):

Machine Learning ◽

Real World ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

The Other ◽

Machine Learning Algorithm ◽

Empirical Comparison ◽

Numerical Attributes ◽

Real World Problems ◽

Discretization Procedure

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.

Download Full-text

Estimation of Visual Performance in Children with and without Binocular Anomalies by Means of a Computerised Coding Test

Perception ◽

10.1068/v970221 ◽

1997 ◽

Vol 26 (1_suppl) ◽

pp. 313-313

Author(s):

T A Podugolnikova ◽

G I Rozhkova ◽

I S Kondakova

Keyword(s):

Optimal Strategy ◽

Reading Rate ◽

The Other ◽

Visual Performance ◽

Hand Movements ◽

Mental Work ◽

Normal Children ◽

The Difference ◽

Test Objects ◽

Better Than

Coding tests are regularly used to estimate the capacity for mental work in children entering school and for younger schoolchildren. The task of the child is to fill a special form by putting conventional symbols (codes) under the rows of test objects in accordance with a sample. The results of such testing reflect both visuomotor and intellectual capabilities since, on one hand, a subject has to perform fast eye and hand movements comparing test objects with the sample and drawing codes but, on the other hand, it is not forbidden to memorise codes and to use an optimal strategy for filling the form. In order to make the coding test more suitable for estimating purely visual capabilities, we evolved a computerised version in which codes were changing at each step, thus making their memorisation useless. Such a coding test was used in an examination of 22 children (age 6 – 7 years) with binocular anomalies (strabismus, amblyopia) from special kindergartens and 190 normal children (aged 6 – 9 years) (63 from kindergartens and 127 from school forms 1 – 3). The difference between children with binocular anomalies and normal children of the same age was statistically significant ( p<0.005). The average indices for normal children of different ages differed significantly increasing from 11.8 (at 6 years) to 24.6 (at 9 years) symbols per minute. The effect of learning was also evident: the indexes of 7-year-old children from the first school form were better than in children of the same age from a kindergarten. The correlation between coding indexes and reading rate was positive but rather weak (0.28) in 52 first-form children tested.

Download Full-text

Problems of decision making for a sustainable development

Water Science & Technology ◽

10.2166/wst.1998.0430 ◽

1998 ◽

Vol 38 (11) ◽

pp. 31-39 ◽

Cited By ~ 4

Author(s):

W. Rauch

Keyword(s):

Decision Making ◽

Sustainable Development ◽

Optimal Strategy ◽

Cost Benefit Analysis ◽

Cost Benefit ◽

The Other ◽

Benefit Analysis ◽

Lake Region ◽

Environmental Interactions ◽

Overall Performance

Current environmental policy guidelines are mainly based on cost-benefit analysis and concerned with the restriction of emissions. Sustainable development, on the other hand, is focusing on determining the optimal strategy for the overall performance of both the environment and the socio-economic system. This paper highlights some of the basic problems when developing strategies with the above aim in mind. The implications for decision making are investigated by means of a fictitious model of the economical and environmental interactions in a lake region.

Download Full-text

Impossible Languages

10.7551/mitpress/9780262034890.001.0001 ◽

2016 ◽

Cited By ~ 12

Author(s):

Andrea Moro

Keyword(s):

Neural Networks ◽

Laws Of Nature ◽

The Other ◽

Human Language ◽

Finite Set ◽

The One ◽

Physical Laws ◽

Flow Of Information ◽

Novel Structures

Understanding the nature and the structure of human language coincides with capturing the constraints which make a conceivable language possible or, equivalently, with discovering whether there can be any impossible languages at all. This book explores these related issues, paralleling the effort of a biologist who attempts at describing the class of impossible animals. In biology, one can appeal for example to physical laws of nature (such as entropy or gravity) but when it comes to language the path becomes intricate and difficult for the physical laws cannot be exploited. In linguistics, in fact, there are two distinct empirical domains to explore: on the one hand, the formal domain of syntax, where different languages are compared trying to understand how much they can differ; on the other, the neurobiological domain, where the flow of information through the complex neural networks and the electric code exploited by neurons is uncovered and measured. By referring to the most advanced experiments in Neurolinguistics the book in fact offers an updated descriptions of modern linguistics and allows the reader to formulate new and surprising questions. Moreover, since syntax - the capacity to generate novel structures (sentences) by recombining a finite set of elements (words) - is the fingerprint of all and only human languages this books ultimately deals with the fundamental questions which characterize the search for our origins.

Download Full-text