Learning without loss

AbstractWe explore a new approach for training neural networks where all loss functions are replaced by hard constraints. The same approach is very successful in phase retrieval, where signals are reconstructed from magnitude constraints and general characteristics (sparsity, support, etc.). Instead of taking gradient steps, the optimizer in the constraint based approach, called relaxed–reflect–reflect (RRR), derives its steps from projections to local constraints. In neural networks one such projection makes the minimal modification to the inputs x, the associated weights w, and the pre-activation value y at each neuron, to satisfy the equation $x\cdot w=y$ x ⋅ w = y . These projections, along with a host of other local projections (constraining pre- and post-activations, etc.) can be partitioned into two sets such that all the projections in each set can be applied concurrently—across the network and across all data in the training batch. This partitioning into two sets is analogous to the situation in phase retrieval and the setting for which the general purpose RRR optimizer was designed. Owing to the novelty of the method, this paper also serves as a self-contained tutorial. Starting with a single-layer network that performs nonnegative matrix factorization, and concluding with a generative model comprising an autoencoder and classifier, all applications and their implementations by projections are described in complete detail. Although the new approach has the potential to extend the scope of neural networks (e.g. by defining activation not through functions but constraint sets), most of the featured models are standard to allow comparison with stochastic gradient descent.

Download Full-text

Asymptotics of Reinforcement Learning with Neural Networks

Stochastic Systems ◽

10.1287/stsy.2021.0072 ◽

2021 ◽

Author(s):

Justin Sirignano ◽

Konstantinos Spiliopoulos

Keyword(s):

Differential Equation ◽

Neural Networks ◽

Stationary Solution ◽

Gradient Descent ◽

Learning Algorithm ◽

Single Layer ◽

Stochastic Gradient Descent ◽

Distributed Data ◽

Limiting Behavior ◽

Q Learning

We prove that a single-layer neural network trained with the Q-learning algorithm converges in distribution to a random ordinary differential equation as the size of the model and the number of training steps become large. Analysis of the limit differential equation shows that it has a unique stationary solution that is the solution of the Bellman equation, thus giving the optimal control for the problem. In addition, we study the convergence of the limit differential equation to the stationary solution. As a by-product of our analysis, we obtain the limiting behavior of single-layer neural networks when trained on independent and identically distributed data with stochastic gradient descent under the widely used Xavier initialization.

Download Full-text

Solving Nonlinearly Separable Classifications in a Single-Layer Neural Network

Neural Computation ◽

10.1162/neco_a_00931 ◽

2017 ◽

Vol 29 (3) ◽

pp. 861-866 ◽

Cited By ~ 1

Author(s):

Nolan Conaway ◽

Kenneth J. Kurtz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Discriminant Function ◽

Single Layer ◽

Classification Problems ◽

Learning Capabilities ◽

Separable Problems ◽

Hidden Layer ◽

Single Layer Network

Since the work of Minsky and Papert ( 1969 ), it has been understood that single-layer neural networks cannot solve nonlinearly separable classifications (i.e., XOR). We describe and test a novel divergent autoassociative architecture capable of solving nonlinearly separable classifications with a single layer of weights. The proposed network consists of class-specific linear autoassociators. The power of the model comes from treating classification problems as within-class feature prediction rather than directly optimizing a discriminant function. We show unprecedented learning capabilities for a simple, single-layer network (i.e., solving XOR) and demonstrate that the famous limitation in acquiring nonlinearly separable problems is not just about the need for a hidden layer; it is about the choice between directly predicting classes or learning to classify indirectly by predicting features.

Download Full-text

Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection

IEEE Transactions on Information Forensics and Security ◽

10.1109/tifs.2018.2825953 ◽

2018 ◽

Vol 13 (11) ◽

pp. 2691-2706 ◽

Cited By ~ 71

Author(s):

Belhassen Bayar ◽

Matthew C. Stamm

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

General Purpose ◽

Image Manipulation ◽

New Approach ◽

Manipulation Detection

Download Full-text

Artificial intelligence in a rugged design based on multi-bit rules

Journal of Physics Conference Series ◽

10.1088/1742-6596/2094/3/032009 ◽

2021 ◽

Vol 2094 (3) ◽

pp. 032009

Author(s):

T A Zolotareva

Keyword(s):

Neural Network ◽

Neural Networks ◽

Decision Rule ◽

Decision Rules ◽

Single Layer ◽

Image Data ◽

National Standard ◽

Output Code ◽

Trained Neural Network ◽

Single Layer Network

Abstract In this paper, the technologies for training large artificial neural networks are considered: the first technology is based on the use of multilayer “deep” neural networks; the second technology involves the use of a “wide” single-layer network of neurons giving 256 private binary solutions. A list of attacks aimed at the simplest one-bit neural network decision rule is given: knowledge extraction attacks and software data modification attacks; their content is considered. All single-bit decision rules are unsafe for applying. It is necessary to use other decision rules. The security of applying neural network decision rules in relation to deliberate hacker attacks is significantly reduced if you use a decision rule of a large number of output bits. The most important property of neural network transducers is that when it is trained using 20 examples of the “Friend” image, the “Friend” output code of 256 bits long is correctly reproduced with a confidence level of 0.95. This means that the entropy of the “Friend” output codes is close to zero. A well-trained neural network virtually eliminates the ambiguity of the “Friend” image data. On the contrary, for the “Foe” images, their initial natural entropy is enhanced by the neural network. The considered works made it possible to create a draft of the second national standard for automatic training of networks of quadratic neurons with multilevel quantizers.

Download Full-text

A New Approach to the Development of Genetically Optimized Multilayer Fuzzy Polynomial Neural Networks

IEEE Transactions on Industrial Electronics ◽

10.1109/tie.2006.878300 ◽

2006 ◽

Vol 53 (4) ◽

pp. 1309-1321 ◽

Cited By ~ 13

Author(s):

S.-K. Oh ◽

W. Pedrycz ◽

H.-S. Park

Keyword(s):

Neural Networks ◽

New Approach

Download Full-text

DeepFogSim: A Toolbox for Execution and Performance Evaluation of the Inference Phase of Conditional Deep Neural Networks with Early Exits Atop Distributed Fog Platforms

Applied Sciences ◽

10.3390/app11010377 ◽

2021 ◽

Vol 11 (1) ◽

pp. 377

Author(s):

Michele Scarpiniti ◽

Enzo Baccarelli ◽

Alireza Momenzadeh ◽

Sima Sarv Ahrabi

Keyword(s):

Neural Networks ◽

Real Time ◽

Future Internet ◽

Operating Conditions ◽

Graphic User Interface ◽

Energy Aware ◽

Internet Applications ◽

The Real ◽

Delay Performance ◽

Hard Constraints

The recent introduction of the so-called Conditional Neural Networks (CDNNs) with multiple early exits, executed atop virtualized multi-tier Fog platforms, makes feasible the real-time and energy-efficient execution of analytics required by future Internet applications. However, until now, toolkits for the evaluation of energy-vs.-delay performance of the inference phase of CDNNs executed on such platforms, have not been available. Motivated by these considerations, in this contribution, we present DeepFogSim. It is a MATLAB-supported software toolbox aiming at testing the performance of virtualized technological platforms for the real-time distributed execution of the inference phase of CDNNs with early exits under IoT realms. The main peculiar features of the proposed DeepFogSim toolbox are that: (i) it allows the joint dynamic energy-aware optimization of the Fog-hosted computing-networking resources under hard constraints on the tolerated inference delays; (ii) it allows the repeatable and customizable simulation of the resulting energy-delay performance of the overall Fog execution platform; (iii) it allows the dynamic tracking of the performed resource allocation under time-varying operating conditions and/or failure events; and (iv) it is equipped with a user-friendly Graphic User Interface (GUI) that supports a number of graphic formats for data rendering. Some numerical results give evidence for about the actual capabilities of the proposed DeepFogSim toolbox.

Download Full-text

Reliability Models of GSM-R Redundant Network on High-Speed Railway

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.1783 ◽

2012 ◽

Vol 198-199 ◽

pp. 1783-1788

Author(s):

Jun Ting Lin ◽

Jian Wu Dang

Keyword(s):

Network Architecture ◽

High Speed ◽

Failure Time ◽

Network Reliability ◽

Single Layer ◽

Double Layers ◽

High Speed Railway ◽

Reliability Models ◽

Single Layer Network ◽

Wireless Architectures

As a dedicated digital mobile communication system designed for railway application, GSM-R must provide reliable bidirectional channel for transmitting security data between trackside equipments and on-train computer on high-speed railways. To ensure the safety of running trains, redundant network architecture is commonly used to guarantee the reliability of GSM-R. Because of the rigid demands of railway security, it is important to build reliability mathematical models, predict the network reliability and select a suitable one. Two common GSM-R wireless architectures, co-sited double layers network and intercross single layer network, are modeled and contrasted in this paper. By calculating the reliabilities of each reliable model, it is clear that more redundant the architecture is, more reliable the system will be, the whole system will bear a less failure time per year as the benefit. Meanwhile, as the redundancy of GSM-R system raises, its equipment and maintenance will cost much, but the reliability raise gently. From the standpoint of transmission system interruption and network equipment failure, the reliability of co-sited double layer network architecture is higher than the intercross single layer one, while the viability and cost of the intercross redundant network is better than co-sited one in natural disasters such as flood and lightning. Taking fully into account reliability, viability and cost, we suggest that intercross redundant network should be chosen on high-speed railway.

Download Full-text