scholarly journals Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

Mathematics ◽  
2019 ◽  
Vol 7 (10) ◽  
pp. 992 ◽  
Author(s):  
Boris Hanin

This article concerns the expressive power of depth in neural nets with ReLU activations and a bounded width. We are particularly interested in the following questions: What is the minimal width w min ( d ) so that ReLU nets of width w min ( d ) (and arbitrary depth) can approximate any continuous function on the unit cube [ 0 , 1 ] d arbitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? We obtain an essentially complete answer to these questions for convex functions. Our approach is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well suited to represent convex functions. In particular, we prove that ReLU nets with width d + 1 can approximate any continuous convex function of d variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the d-dimensional cube [ 0 , 1 ] d by ReLU nets with width d + 3 .

1990 ◽  
Vol 42 (2) ◽  
pp. 201-213 ◽  
Author(s):  
Bernice Sharp

In this paper topological linear spaces are categorised according to the differentiability properties of their continuous convex functions. Mazur's Theorem for Banach spaces is generalised: all separable Baire topological linear spaces are weak Asplund. A class of spaces is given for which Gateaux and Fréchet differentiability of a continuous convex function coincide, which with Mazur's theorem, implies that all Montel Fréchet spaces are Asplund spaces. The effect of weakening the topology of a given space is studied in terms of the space's classification. Any topological linear space with its weak topology is an Asplund space; at the opposite end of the topological spectrum, an example is given of the inductive limit of Asplund spaces which is not even a Gateaux differentiability space.


1994 ◽  
Vol 23 (484) ◽  
Author(s):  
Brian H. Mayoh

Two papers in one! The first, ``On Patterns and Graphs´´, describes the pattern version of context-free grammars for many kinds of substitution structures. By giving striking examples, particularly for emotional neural nets and other forms of graph grammars, it shows that the extra expressive power of pattern multigrammars is worth having. The second paper, ``DNA pattern multigrammars´´ describes the pattern approach to the analysis of the secondary structure of DNA and RNA; in particular it gives an analysis of Tobacco Mosaic Virus RNA, Transfer RNA, and genetic switching in the bacteriophage l. The genetic algorithm approach to the machine learning of DNA and RNA grammars is also discussed.


2021 ◽  
Vol 11 (1) ◽  
pp. 427
Author(s):  
Sunghwan Moon

Deep neural networks have shown very successful performance in a wide range of tasks, but a theory of why they work so well is in the early stage. Recently, the expressive power of neural networks, important for understanding deep learning, has received considerable attention. Classic results, provided by Cybenko, Barron, etc., state that a network with a single hidden layer and suitable activation functions is a universal approximator. A few years ago, one started to study how width affects the expressiveness of neural networks, i.e., a universal approximation theorem for a deep neural network with a Rectified Linear Unit (ReLU) activation function and bounded width. Here, we show how any continuous function on a compact set of Rnin,nin∈N can be approximated by a ReLU network having hidden layers with at most nin+5 nodes in view of an approximate identity.


2018 ◽  
Vol 25 (3) ◽  
pp. 291-311
Author(s):  
Mikhail V. Nevskii ◽  
Alexey Yu. Ukhalov

Let \(n\in{\mathbb N}\), and let \(Q_n\) be the unit cube \([0,1]^n\). By \(C(Q_n)\) we denote the space of continuous functions \(f:Q_n\to{\mathbb R}\) with the norm \(\|f\|_{C(Q_n)}:=\max\limits_{x\in Q_n}|f(x)|,\) by \(\Pi_1\left({\mathbb R}^n\right)\) --- the set of polynomials of \(n\) variables of degree \(\leq 1\) (or linear functions). Let \(x^{(j)},\) \(1\leq j\leq n+1,\) be the vertices of \(n\)-dimnsional nondegenerate simplex \(S\subset Q_n\). An interpolation projector \(P:C(Q_n)\to \Pi_1({\mathbb R}^n)\) corresponding to the simplex \(S\) is defined by equalities \(Pf\left(x^{(j)}\right)= f\left(x^{(j)}\right).\) The norm of \(P\) as an operator from \(C(Q_n)\) to \(C(Q_n)\) may be calculated by the formula \(\|P\|=\max\limits_{x\in ver(Q_n)} \sum\limits_{j=1}^{n+1} |\lambda_j(x)|.\) Here \(\lambda_j\) are the basic Lagrange polynomials with respect to \(S,\) \(ver(Q_n)\) is the set of vertices of \(Q_n\). Let us denote by \(\theta_n\) the minimal possible value of \(\|P\|.\) Earlier, the first author proved various relations and estimates for values \(\|P\|\) and \(\theta_n\), in particular, having geometric character. The equivalence \(\theta_n\asymp \sqrt{n}\) takes place. For example, the appropriate, according to dimension \(n\), inequalities may be written in the form \linebreak \(\frac{1}{4}\sqrt{n}\) \(<\theta_n\) \(<3\sqrt{n}.\) If the nodes of the projector \(P^*\) coincide with vertices of an arbitrary simplex with maximum possible volume, we have \(\|P^*\|\asymp\theta_n.\)When an Hadamard matrix of order \(n+1\) exists, holds \(\theta_n\leq\sqrt{n+1}.\) In the paper, we give more precise upper bounds of numbers \(\theta_n\) for \(21\leq n \leq 26\). These estimates were obtained with the application of maximum volume simplices in the cube. For constructing such simplices, we utilize maximum determinants containing the elements \(\pm 1.\) Also, we systematize and comment the best nowaday upper and low estimates of numbers \(\theta_n\) for a concrete \(n.\)


2020 ◽  
Vol 4 (2) ◽  
pp. 1-14
Author(s):  
Pardeep Kaur ◽  
◽  
Sukhwinder Singh Billing ◽  

Author(s):  
Richard C. Kittler

Abstract Analysis of manufacturing data as a tool for failure analysts often meets with roadblocks due to the complex non-linear behaviors of the relationships between failure rates and explanatory variables drawn from process history. The current work describes how the use of a comprehensive engineering database and data mining technology over-comes some of these difficulties and enables new classes of problems to be solved. The characteristics of the database design necessary for adequate data coverage and unit traceability are discussed. Data mining technology is explained and contrasted with traditional statistical approaches as well as those of expert systems, neural nets, and signature analysis. Data mining is applied to a number of common problem scenarios. Finally, future trends in data mining technology relevant to failure analysis are discussed.


Sign in / Sign up

Export Citation Format

Share Document