Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

This article concerns the expressive power of depth in neural nets with ReLU activations and a bounded width. We are particularly interested in the following questions: What is the minimal width w min ( d ) so that ReLU nets of width w min ( d ) (and arbitrary depth) can approximate any continuous function on the unit cube [ 0 , 1 ] d arbitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? We obtain an essentially complete answer to these questions for convex functions. Our approach is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well suited to represent convex functions. In particular, we prove that ReLU nets with width d + 1 can approximate any continuous convex function of d variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the d-dimensional cube [ 0 , 1 ] d by ReLU nets with width d + 3 .

Download Full-text

The differentiability of convex functions on topological linear spaces

Bulletin of the Australian Mathematical Society ◽

10.1017/s0004972700028379 ◽

1990 ◽

Vol 42 (2) ◽

pp. 201-213 ◽

Cited By ~ 2

Author(s):

Bernice Sharp

Keyword(s):

Convex Functions ◽

Inductive Limit ◽

Asplund Space ◽

Fréchet Spaces ◽

Linear Spaces ◽

Asplund Spaces ◽

Continuous Convex Function ◽

Mazur’S Theorem ◽

Topological Linear ◽

Differentiability Properties

In this paper topological linear spaces are categorised according to the differentiability properties of their continuous convex functions. Mazur's Theorem for Banach spaces is generalised: all separable Baire topological linear spaces are weak Asplund. A class of spaces is given for which Gateaux and Fréchet differentiability of a continuous convex function coincide, which with Mazur's theorem, implies that all Montel Fréchet spaces are Asplund spaces. The effect of weakening the topology of a given space is studied in terms of the space's classification. Any topological linear space with its weak topology is an Asplund space; at the opposite end of the topological spectrum, an example is given of the inductive limit of Asplund spaces which is not even a Gateaux differentiability space.

Download Full-text

Patterns, Graphs and DNA

DAIMI Report Series ◽

10.7146/dpb.v23i484.6977 ◽

1994 ◽

Vol 23 (484) ◽

Cited By ~ 1

Author(s):

Brian H. Mayoh

Keyword(s):

Expressive Power ◽

Transfer Rna ◽

Neural Nets ◽

Graph Grammars ◽

Dna And Rna ◽

Virus Rna ◽

Pattern Approach ◽

Genetic Algorithm Approach ◽

Dna Pattern ◽

Context Free

Two papers in one! The first, ``On Patterns and Graphs´´, describes the pattern version of context-free grammars for many kinds of substitution structures. By giving striking examples, particularly for emotional neural nets and other forms of graph grammars, it shows that the extra expressive power of pattern multigrammars is worth having. The second paper, ``DNA pattern multigrammars´´ describes the pattern approach to the analysis of the secondary structure of DNA and RNA; in particular it gives an analysis of Tobacco Mosaic Virus RNA, Transfer RNA, and genetic switching in the bacteriophage l. The genetic algorithm approach to the machine learning of DNA and RNA grammars is also discussed.

Download Full-text

ReLU Network with Bounded Width Is a Universal Approximator in View of an Approximate Identity

Applied Sciences ◽

10.3390/app11010427 ◽

2021 ◽

Vol 11 (1) ◽

pp. 427

Author(s):

Sunghwan Moon

Keyword(s):

Neural Networks ◽

Early Stage ◽

Expressive Power ◽

Activation Function ◽

Approximate Identity ◽

Compact Set ◽

Universal Approximator ◽

Wide Range ◽

Hidden Layer ◽

Bounded Width

Deep neural networks have shown very successful performance in a wide range of tasks, but a theory of why they work so well is in the early stage. Recently, the expressive power of neural networks, important for understanding deep learning, has received considerable attention. Classic results, provided by Cybenko, Barron, etc., state that a network with a single hidden layer and suitable activation functions is a universal approximator. A few years ago, one started to study how width affects the expressiveness of neural networks, i.e., a universal approximation theorem for a deep neural network with a Rectified Linear Unit (ReLU) activation function and bounded width. Here, we show how any continuous function on a compact set of Rnin,nin∈N can be approximated by a ReLU network having hidden layers with at most nin+5 nodes in view of an approximate identity.

Download Full-text

On Optimal Interpolation by Linear Functions on an n-Dimensional Cube

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2018-3-291-311 ◽

2018 ◽

Vol 25 (3) ◽

pp. 291-311

Author(s):

Mikhail V. Nevskii ◽

Alexey Yu. Ukhalov

Keyword(s):

Hadamard Matrix ◽

Continuous Functions ◽

Unit Cube ◽

Linear Functions ◽

Optimal Interpolation ◽

Maximum Volume ◽

Space Of Continuous Functions ◽

Lagrange Polynomials ◽

Geometric Character ◽

Dimensional Cube

Let \(n\in{\mathbb N}\), and let \(Q_n\) be the unit cube \([0,1]^n\). By \(C(Q_n)\) we denote the space of continuous functions \(f:Q_n\to{\mathbb R}\) with the norm \(\|f\|_{C(Q_n)}:=\max\limits_{x\in Q_n}|f(x)|,\) by \(\Pi_1\left({\mathbb R}^n\right)\) --- the set of polynomials of \(n\) variables of degree \(\leq 1\) (or linear functions). Let \(x^{(j)},\) \(1\leq j\leq n+1,\) be the vertices of \(n\)-dimnsional nondegenerate simplex \(S\subset Q_n\). An interpolation projector \(P:C(Q_n)\to \Pi_1({\mathbb R}^n)\) corresponding to the simplex \(S\) is defined by equalities \(Pf\left(x^{(j)}\right)= f\left(x^{(j)}\right).\) The norm of \(P\) as an operator from \(C(Q_n)\) to \(C(Q_n)\) may be calculated by the formula \(\|P\|=\max\limits_{x\in ver(Q_n)} \sum\limits_{j=1}^{n+1} |\lambda_j(x)|.\) Here \(\lambda_j\) are the basic Lagrange polynomials with respect to \(S,\) \(ver(Q_n)\) is the set of vertices of \(Q_n\). Let us denote by \(\theta_n\) the minimal possible value of \(\|P\|.\) Earlier, the first author proved various relations and estimates for values \(\|P\|\) and \(\theta_n\), in particular, having geometric character. The equivalence \(\theta_n\asymp \sqrt{n}\) takes place. For example, the appropriate, according to dimension \(n\), inequalities may be written in the form \linebreak \(\frac{1}{4}\sqrt{n}\) \(<\theta_n\) \(<3\sqrt{n}.\) If the nodes of the projector \(P^*\) coincide with vertices of an arbitrary simplex with maximum possible volume, we have \(\|P^*\|\asymp\theta_n.\)When an Hadamard matrix of order \(n+1\) exists, holds \(\theta_n\leq\sqrt{n+1}.\) In the paper, we give more precise upper bounds of numbers \(\theta_n\) for \(21\leq n \leq 26\). These estimates were obtained with the application of maximum volume simplices in the cube. For constructing such simplices, we utilize maximum determinants containing the elements \(\pm 1.\) Also, we systematize and comment the best nowaday upper and low estimates of numbers \(\theta_n\) for a concrete \(n.\)

Download Full-text

Generalized Uniformly Close-to-Convex Functions of Order gamma and Type beta

Hacettepe Journal of Mathematics and Statistics ◽

10.15672/hjms.2015449085 ◽

2015 ◽

Vol 1 (1042) ◽

Author(s):

F.M. Al-Oboudi

Keyword(s):

Convex Functions

Download Full-text

Generalization of Favard’s and Berwald’s Inequalities for Strongly Convex Functions

Communications in Mathematics and Applications ◽

10.26713/cma.v10i4.1210 ◽

2019 ◽

Vol 10 (4) ◽

Author(s):

Muhammad Adil Khan ◽

Syed Zaheer Ullah ◽

Yuming Chu

Keyword(s):

Convex Functions ◽

Strongly Convex Functions ◽

Strongly Convex

Download Full-text

Certain results on starlike and convex functions

Open Journal of Mathematical Analysis ◽

10.30538/psrp-oma2020.0057 ◽

2020 ◽

Vol 4 (2) ◽

pp. 1-14

Author(s):

Pardeep Kaur ◽

◽

Sukhwinder Singh Billing ◽

Keyword(s):

Convex Functions ◽

Starlike And Convex Functions

Download Full-text

Advanced Statistical Tools for Improving Yield and Reliability

ISTFA 1999: Conference Proceedings from the 25th International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa1999p0233 ◽

1999 ◽

Author(s):

Richard C. Kittler

Keyword(s):

Data Mining ◽

Analysis Data ◽

Neural Nets ◽

Future Trends ◽

Statistical Tools ◽

Mining Technology ◽

Explanatory Variables ◽

Data Coverage ◽

Statistical Approaches ◽

Adequate Data

Abstract Analysis of manufacturing data as a tool for failure analysts often meets with roadblocks due to the complex non-linear behaviors of the relationships between failure rates and explanatory variables drawn from process history. The current work describes how the use of a comprehensive engineering database and data mining technology over-comes some of these difficulties and enables new classes of problems to be solved. The characteristics of the database design necessary for adequate data coverage and unit traceability are discussed. Data mining technology is explained and contrasted with traditional statistical approaches as well as those of expert systems, neural nets, and signature analysis. Data mining is applied to a number of common problem scenarios. Finally, future trends in data mining technology relevant to failure analysis are discussed.

Download Full-text