Hierarchical Learning Machines and Neuroscience of Visual Cortex

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15880-3_5 ◽

2010 ◽

pp. 5-5

Author(s):

Tomaso Poggio

Keyword(s):

Visual Cortex ◽

Hierarchical Learning ◽

Learning Machines

Download Full-text

Algebraic geometrical methods for hierarchical learning machines

Neural Networks ◽

10.1016/s0893-6080(01)00069-7 ◽

2001 ◽

Vol 14 (8) ◽

pp. 1049-1060 ◽

Cited By ~ 43

Author(s):

Sumio Watanabe

Keyword(s):

Hierarchical Learning ◽

Learning Machines

Download Full-text

Learning Coefficients of Layered Models When the True Distribution Mismatches the Singularities

Neural Computation ◽

10.1162/089976603765202640 ◽

2003 ◽

Vol 15 (5) ◽

pp. 1013-1033 ◽

Cited By ~ 15

Author(s):

Sumio Watanabe ◽

Shun-ichi Amari

Keyword(s):

Information Matrix ◽

Bayes Estimation ◽

Generalization Error ◽

Parameter Spaces ◽

Hierarchical Learning ◽

Learning Machines ◽

Regular Model ◽

Training Samples ◽

Learning Machine ◽

True Distribution

Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, with the result that the conventional learning theory of regular statistical models does not hold. Recently, it was proved that if the parameter of the true distribution is contained in the singularities of the learning machine, the generalization error in Bayes estimation is asymptotically equal toλ/n, where 2λ is smaller than the dimension of the parameter andn is the number of training samples. However, the constantλ strongly depends on the local geometrical structure of singularities; hence, the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this article, in order to analyze such cases, we study the Bayes generalization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to 1/n and show two results. First, if the dimension of the parameter from inputs to hidden units is not larger than three, then there exists a region of true parameters such that the generalization error is larger than that of the corresponding regular model. Second, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.

Download Full-text