What Size Net Gives Valid Generalization?

1989 ◽  
Vol 1 (1) ◽  
pp. 151-160 ◽  
Author(s):  
Eric B. Baum ◽  
David Haussler

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ ≤ 1/8. We show that if m ≥ O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 − ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 − ∊ of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/∊) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 − ∊ fraction of the future test examples.

1971 ◽  
Vol 8 (03) ◽  
pp. 573-588 ◽  
Author(s):  
Barry Belkin

The problem of computing the distribution of the time of first passage to a constant threshold for special classes of stochastic processes has been the subject of considerable study. For example, Baxter and Donsker (1957) have considered the problem for processes with stationary, independent increments, Darling and Siegert (1953) for continuous Markov processes, Mehr and McFadden (1965) for Gauss-Markov processes, and Stone (1969) for semi-Markov processes. The results, however, generally express the first passage distribution in terms of transforms which can be inverted only in a relatively few special cases, such as in the classical case of the Weiner process and for certain stable and compound Poisson processes. For linear threshold functions and processes with non-negative interchangeable increments the first passage problem has been studied by Takács (1957) (an explicit result was obtained by Pyke (1959) in the special case of a Poisson process). Again in the case of a linear threshold, an explicit form for the first passage distribution was found by Slepian (1961) for the Weiner process. For the Ornstein-Uhlenbeck process and certain U-shaped thresholds the problem has recently been studied by Daniels (1969).


1992 ◽  
Vol 03 (01) ◽  
pp. 19-30 ◽  
Author(s):  
AKIRA NAMATAME ◽  
YOSHIAKI TSUKAMOTO

We propose a new learning algorithm, structural learning with the complementary coding for concept learning problems. We introduce the new grouping measure that forms the similarity matrix over the training set and show this similarity matrix provides a sufficient condition for the linear separability of the set. Using the sufficient condition one should figure out a suitable composition of linearly separable threshold functions that classify exactly the set of labeled vectors. In the case of the nonlinear separability, the internal representation of connectionist networks, the number of the hidden units and value-space of these units, is pre-determined before learning based on the structure of the similarity matrix. A three-layer neural network is then constructed where each linearly separable threshold function is computed by a linear-threshold unit whose weights are determined by the one-shot learning algorithm that requires a single presentation of the training set. The structural learning algorithm proceeds to capture the connection weights so as to realize the pre-determined internal representation. The pre-structured internal representation, the activation value spaces at the hidden layer, defines intermediate-concepts. The target-concept is then learned as a combination of those intermediate-concepts. The ability to create the pre-structured internal representation based on the grouping measure distinguishes the structural learning from earlier methods such as backpropagation.


Author(s):  
Sourav Chakraborty ◽  
Sushrut Karmalkar ◽  
Srijita Kundu ◽  
Satyanarayana V. Lokam ◽  
Nitin Saurabh

2012 ◽  
Vol 22 (3) ◽  
pp. 623-677 ◽  
Author(s):  
Ilias Diakonikolas ◽  
Rocco A. Servedio

2020 ◽  
Vol 1441 ◽  
pp. 012138
Author(s):  
Sh Fazilov ◽  
R Khamdamov ◽  
G Mirzaeva ◽  
D Gulyamova ◽  
N Mirzaev

Sign in / Sign up

Export Citation Format

Share Document