Prediction Games and Arcing Algorithms

1999 ◽  
Vol 11 (7) ◽  
pp. 1493-1517 ◽  
Author(s):  
Leo Breiman

The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax theorem is an essential ingredient of the convergence proofs. An arcing algorithm is described that converges to the optimal strategy. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Schapire, Freund, Bartlett, and Lee (1997) offered an explanation of why Adaboost works in terms of its ability to produce generally high margins. The empirical comparison of Adaboost to the optimal arcing algorithm shows that their explanation is not complete.

2019 ◽  
Author(s):  
Amelia R. Hunt ◽  
Warren James ◽  
Josephine Reuther ◽  
Melissa Spilioti ◽  
Eleanor Mackay ◽  
...  

Here we report persistent choice variability in the presence of a simple decision rule. Two analogous choice problems are presented, both of which involve making decisions about how to prioritize goals. In one version, participants choose a place to stand to throw a beanbag into one of two hoops. In the other, they must choose a place to fixate to detect a target that could appear in one of two boxes. In both cases, participants do not know which of the locations will be the target when they make their choice. The optimal solution to both problems follows the same, simple logic: when targets are close together, standing at/fixating the midpoint is the best choice. When the targets are far apart, accuracy from the midpoint falls, and standing/fixating close to one potential target achieves better accuracy. People do not follow, or even approach, this optimal strategy, despite substantial potential benefits for performance. Two interventions were introduced to try and shift participants from sub-optimal, variable responses to following a fixed, rational rule. First, we put participants into circumstances in which the solution was obvious. After participants correctly solved the problem there, we immediately presented the slightly-less-obvious context. Second, we guided participants to make choices that followed an optimal strategy, and then removed the guidance and let them freely choose. Following both of these interventions, participants immediately returned to a variable, sub-optimal pattern of responding. The results show that while constructing and implementing rational decision rules is possible, making variable responses to choice problems is a strong and persistent default mode. Borrowing concepts from classic animal learning studies, we suggest this default may persist because choice variability can provide opportunities for reinforcement learning.


1997 ◽  
Vol 9 (1) ◽  
pp. 1-42 ◽  
Author(s):  
Sepp Hochreiter ◽  
Jürgen Schmidhuber

We present a new algorithm for finding low-complexity neural networks with high generalization capability. The algorithm searches for a “flat” minimum of the error function. A flat minimum is a large connected region in weight space where the error remains approximately constant. An MDL-based, Bayesian argument suggests that flat minima correspond to “simple” networks and low expected overfitting. The argument is based on a Gibbs algorithm variant and a novel way of splitting generalization error into underfitting and overfitting error. Unlike many previous approaches, ours does not require gaussian assumptions and does not depend on a “good” weight prior. Instead we have a prior over input output functions, thus taking into account net architecture and training set. Although our algorithm requires the computation of second-order derivatives, it has backpropagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feedforward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms conventional backprop, weight decay, and “optimal brain surgeon/optimal brain damage.”


Author(s):  
David Kong

In the game of Skunk a pair of dice is rolled again and again and as long as you remain “standing” you can keep adding the totals to your score. At any time you can “sit” and then you take home what you have won. However if you are standing and a “one” comes up on either die, the game is over and you lose everything. An optimal strategy is traditionally developed by comparing the expected score of standing with the expected score of sitting. As long as E(standing) > E(sitting), you would continue to roll the dice. As you accumulate points, you begin risking more points. At one point it becomes too risky to go forward. However, we found this traditional methodology to be flawed (though the answer remains the same). This solution focuses only on the expected score in the next roll, instead of factoring in the expected total score that can be gained over indefinite future. The error does not come to light until weanalyze a variation where you are also allowed to choose the number of dice to use. All the equations (using the idea E(standing) > E(sitting)) we solved led us down a misleading and often intractable road. Yet, through reasoning, we figured that there is no strategy better than throwing 1-dice at a time. Proving this is quite difficult, because it had to be shown that the pure 1-dice strategy is better than all of the other (infinite) strategies, since the game can go on forever.


Stroke ◽  
2016 ◽  
Vol 47 (suppl_1) ◽  
Author(s):  
Nerses Sanossian ◽  
David S Liebeskind ◽  
Sidney Starkman ◽  
Marc Eckstein ◽  
Samuel Stratton ◽  
...  

Background: We sought to develop a clinical scale identifying intracerebral hemorrhage (ICH) using prospectively collected data elicited by paramedics in the field. Methods: Subjects were enrolled in the Field Administration of Stroke Therapy Magnesium (FAST-MAG) trial of prehospital neuroprotective therapy. Data obtained by paramedics in the field including vital signs, examination, demographic information, and medications. Subjects were randomly put into training (n=1133) and validation (n=567) observations. Logistic regression model using all 26 potential predictors as candidates was fit to the training data using backward stepwise variable selection with a liberal p < 0.10 retention criterion. A classification tree model using 26 potential predictors as candidates was used with a GINI equivalent to a p < 0.10 splitting criterion. Results: 1700 cases were assessed by paramedics a median of 23 (IQR 14-42) minutes after symptom onset and 23% had ICH. Of the 26 candidates, 12 variables were retained in the logistic model. Holding the other factors constant, increasing Los Angeles Motor Score (LAMS), Glasgow Coma Scale (GCS) verbal sub-score, history of hypertension, Hispanic ethnicity, field systolic BP and taking anti platelet medication are associated with increasing risk of ICH. Increasing GCS eyes sub-score, diabetes, atrial fibrillation, valvular heart disease, age, female gender, and Black race are associated with decreasing risk of ICH. The training set model accuracy is 74%, sensitivity 65%, specificity 83% with C= 0.81 and the validation set accuracy is 67%, sensitivity 76%, specificity 58% with C= 0.73. For the classification tree seven predictors are retained: prehospital systolic BP, diastolic BP, Hispanic, ethnicity, history of atrial fibrillation, GCS verbal sub-score, age and Los Angeles Motor Score (LAMS). The tree forms 10 prediction groups (terminal nodes). Five of these ten predict ICH (“positive”) and the other five predict no ICH (“negative”). The training set accuracy for the tree was 71.9% with C=0.782 and the validation set accuracy was 63.2% with C= 0.632. Conclusion: Paramedics can identify ICH in the field with moderate accuracy, allowing the opportunity to develop targeted prehospital therapeutics and care delivery.


2019 ◽  
Vol 7 (4) ◽  
pp. T911-T922
Author(s):  
Satyakee Sen ◽  
Sribharath Kainkaryam ◽  
Cen Ong ◽  
Arvind Sharma

Salt model building has long been considered a severe bottleneck for large-scale 3D seismic imaging projects. It is one of the most time-consuming, labor-intensive, and difficult-to-automate processes in the entire depth imaging workflow requiring significant intervention by domain experts to manually interpret the salt bodies on noisy, low-frequency, and low-resolution seismic images at each iteration of the salt model building process. The difficulty and need for automating this task is well-recognized by the imaging community and has propelled the use of deep-learning-based convolutional neural network (CNN) architectures to carry out this task. However, significant challenges remain for reliable production-scale deployment of CNN-based methods for salt model building. This is mainly due to the poor generalization capabilities of these networks. When used on new surveys, never seen by the CNN models during the training stage, the interpretation accuracy of these models drops significantly. To remediate this key problem, we have introduced a U-shaped encoder-decoder type CNN architecture trained using a specialized regularization strategy aimed at reducing the generalization error of the network. Our regularization scheme perturbs the ground truth labels in the training set. Two different perturbations are discussed: one that randomly changes the labels of the training set, flipping salt labels to sediments and vice versa and the second that smooths the labels. We have determined that such perturbations act as a strong regularizer preventing the network from making highly confident predictions on the training set and thus reducing overfitting. An ensemble strategy is also used for test time augmentation that is shown to further improve the accuracy. The robustness of our CNN models, in terms of reduced generalization error and improved interpretation accuracy is demonstrated with real data examples from the Gulf of Mexico.


2019 ◽  
Vol 7 (1) ◽  
pp. 20-51 ◽  
Author(s):  
Philip Leifeld ◽  
Skyler J. Cranmer

AbstractThe temporal exponential random graph model (TERGM) and the stochastic actor-oriented model (SAOM, e.g., SIENA) are popular models for longitudinal network analysis. We compare these models theoretically, via simulation, and through a real-data example in order to assess their relative strengths and weaknesses. Though we do not aim to make a general claim about either being superior to the other across all specifications, we highlight several theoretical differences the analyst might consider and find that with some specifications, the two models behave very similarly, while each model out-predicts the other one the more the specific assumptions of the respective model are met.


Author(s):  
Petr Berka ◽  
Ivan Bruha

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.


Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 313-313
Author(s):  
T A Podugolnikova ◽  
G I Rozhkova ◽  
I S Kondakova

Coding tests are regularly used to estimate the capacity for mental work in children entering school and for younger schoolchildren. The task of the child is to fill a special form by putting conventional symbols (codes) under the rows of test objects in accordance with a sample. The results of such testing reflect both visuomotor and intellectual capabilities since, on one hand, a subject has to perform fast eye and hand movements comparing test objects with the sample and drawing codes but, on the other hand, it is not forbidden to memorise codes and to use an optimal strategy for filling the form. In order to make the coding test more suitable for estimating purely visual capabilities, we evolved a computerised version in which codes were changing at each step, thus making their memorisation useless. Such a coding test was used in an examination of 22 children (age 6 – 7 years) with binocular anomalies (strabismus, amblyopia) from special kindergartens and 190 normal children (aged 6 – 9 years) (63 from kindergartens and 127 from school forms 1 – 3). The difference between children with binocular anomalies and normal children of the same age was statistically significant ( p<0.005). The average indices for normal children of different ages differed significantly increasing from 11.8 (at 6 years) to 24.6 (at 9 years) symbols per minute. The effect of learning was also evident: the indexes of 7-year-old children from the first school form were better than in children of the same age from a kindergarten. The correlation between coding indexes and reading rate was positive but rather weak (0.28) in 52 first-form children tested.


1998 ◽  
Vol 38 (11) ◽  
pp. 31-39 ◽  
Author(s):  
W. Rauch

Current environmental policy guidelines are mainly based on cost-benefit analysis and concerned with the restriction of emissions. Sustainable development, on the other hand, is focusing on determining the optimal strategy for the overall performance of both the environment and the socio-economic system. This paper highlights some of the basic problems when developing strategies with the above aim in mind. The implications for decision making are investigated by means of a fictitious model of the economical and environmental interactions in a lake region.


Author(s):  
Andrea Moro

Understanding the nature and the structure of human language coincides with capturing the constraints which make a conceivable language possible or, equivalently, with discovering whether there can be any impossible languages at all. This book explores these related issues, paralleling the effort of a biologist who attempts at describing the class of impossible animals. In biology, one can appeal for example to physical laws of nature (such as entropy or gravity) but when it comes to language the path becomes intricate and difficult for the physical laws cannot be exploited. In linguistics, in fact, there are two distinct empirical domains to explore: on the one hand, the formal domain of syntax, where different languages are compared trying to understand how much they can differ; on the other, the neurobiological domain, where the flow of information through the complex neural networks and the electric code exploited by neurons is uncovered and measured. By referring to the most advanced experiments in Neurolinguistics the book in fact offers an updated descriptions of modern linguistics and allows the reader to formulate new and surprising questions. Moreover, since syntax - the capacity to generate novel structures (sentences) by recombining a finite set of elements (words) - is the fingerprint of all and only human languages this books ultimately deals with the fundamental questions which characterize the search for our origins.


Sign in / Sign up

Export Citation Format

Share Document