What Size Net Gives Valid Generalization?

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ ≤ 1/8. We show that if m ≥ O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 − ∊/2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 − ∊ of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/∊) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 − ∊ fraction of the future test examples.

Download Full-text

Regret Bounds for Hierarchical Classification with Linear-Threshold Functions

Learning Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-27819-1_7 ◽

2004 ◽

pp. 93-108 ◽

Cited By ~ 6

Author(s):

Nicolò Cesa-Bianchi ◽

Alex Conconi ◽

Claudio Gentile

Keyword(s):

Hierarchical Classification ◽

Threshold Functions ◽

Linear Threshold Functions ◽

Linear Threshold ◽

Regret Bounds

Download Full-text

Balanced And/Or trees and linear threshold functions

2009 Proceedings of the Sixth Workshop on Analytic Algorithmics and Combinatorics (ANALCO) ◽

10.1137/1.9781611972993.8 ◽

2009 ◽

Cited By ~ 2

Author(s):

Hervé Fournier ◽

Danièle Gardy ◽

Antoine Genitrini

Keyword(s):

Threshold Functions ◽

Linear Threshold Functions ◽

Linear Threshold

Download Full-text

First passage to a general threshold for a process corresponding to sampling at Poisson times

Journal of Applied Probability ◽

10.1017/s0021900200035658 ◽

1971 ◽

Vol 8 (03) ◽

pp. 573-588 ◽

Cited By ~ 1

Author(s):

Barry Belkin

Keyword(s):

Markov Processes ◽

Classical Case ◽

First Passage ◽

Explicit Result ◽

Threshold Functions ◽

Linear Threshold Functions ◽

Linear Threshold ◽

Special Cases ◽

Weiner Process ◽

Semi Markov Processes

The problem of computing the distribution of the time of first passage to a constant threshold for special classes of stochastic processes has been the subject of considerable study. For example, Baxter and Donsker (1957) have considered the problem for processes with stationary, independent increments, Darling and Siegert (1953) for continuous Markov processes, Mehr and McFadden (1965) for Gauss-Markov processes, and Stone (1969) for semi-Markov processes. The results, however, generally express the first passage distribution in terms of transforms which can be inverted only in a relatively few special cases, such as in the classical case of the Weiner process and for certain stable and compound Poisson processes. For linear threshold functions and processes with non-negative interchangeable increments the first passage problem has been studied by Takács (1957) (an explicit result was obtained by Pyke (1959) in the special case of a Poisson process). Again in the case of a linear threshold, an explicit form for the first passage distribution was found by Slepian (1961) for the Weiner process. For the Ornstein-Uhlenbeck process and certain U-shaped thresholds the problem has recently been studied by Daniels (1969).

Download Full-text

STRUCTURAL CONNECTIONIST LEARNING WITH COMPLEMENTARY CODING

International Journal of Neural Systems ◽

10.1142/s0129065792000036 ◽

1992 ◽

Vol 03 (01) ◽

pp. 19-30 ◽

Cited By ~ 10

Author(s):

AKIRA NAMATAME ◽

YOSHIAKI TSUKAMOTO

Keyword(s):

Learning Algorithm ◽

Internal Representation ◽

Threshold Function ◽

Sufficient Condition ◽

Structural Learning ◽

Similarity Matrix ◽

Training Set ◽

Threshold Functions ◽

Connectionist Networks ◽

Hidden Layer

We propose a new learning algorithm, structural learning with the complementary coding for concept learning problems. We introduce the new grouping measure that forms the similarity matrix over the training set and show this similarity matrix provides a sufficient condition for the linear separability of the set. Using the sufficient condition one should figure out a suitable composition of linearly separable threshold functions that classify exactly the set of labeled vectors. In the case of the nonlinear separability, the internal representation of connectionist networks, the number of the hidden units and value-space of these units, is pre-determined before learning based on the structure of the similarity matrix. A three-layer neural network is then constructed where each linearly separable threshold function is computed by a linear-threshold unit whose weights are determined by the one-shot learning algorithm that requires a single presentation of the training set. The structural learning algorithm proceeds to capture the connection weights so as to realize the pre-determined internal representation. The pre-structured internal representation, the activation value spaces at the hidden layer, defines intermediate-concepts. The target-concept is then learned as a combination of those intermediate-concepts. The ability to create the pre-structured internal representation based on the grouping measure distinguishes the structural learning from earlier methods such as backpropagation.

Download Full-text