Network Approximation using Tensor Sketching

Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a `smaller' network architecture that 'approximates' the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.

Download Full-text

Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/88 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yunpeng Chen ◽

Xiaojie Jin ◽

Bingyi Kang ◽

Jiashi Feng ◽

Shuicheng Yan

Keyword(s):

Neural Networks ◽

Network Architecture ◽

Deep Neural Networks ◽

Tensor Decomposition ◽

Classification Performance ◽

Model Parameters ◽

Tensor Factorization ◽

Unified Framework ◽

Benchmark Datasets ◽

Basic Network

The residual unit and its variations are wildly used in building very deep neural networks for alleviating optimization difficulty. In this work, we revisit the standard residual function as well as its several successful variants and propose a unified framework based on tensor Block Term Decomposition (BTD) to explain these apparently different residual functions from the tensor decomposition view. With the BTD framework, we further propose a novel basic network architecture, named the Collective Residual Unit (CRU). CRU further enhances parameter efficiency of deep residual neural networks by sharing core factors derived from collective tensor factorization over the involved residual units. It enables efficient knowledge sharing across multiple residual units, reduces the number of model parameters, lowers the risk of over-fitting, and provides better generalization ability. Extensive experimental results show that our proposed CRU network brings outstanding parameter efficiency -- it achieves comparable classification performance with ResNet-200 while using a model size as small as ResNet-50 on the ImageNet-1k and Places365-Standard benchmark datasets.

Download Full-text

Using Summary Layers to Probe Neural Network Behaviour

South African Computer Journal ◽

10.18489/sacj.v32i2.861 ◽

2020 ◽

Vol 32 (2) ◽

Author(s):

Marelie Hattingh Davel

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Network Architectures ◽

Feedforward Networks ◽

Empirical Results ◽

Analysis Process ◽

Generalisation Ability ◽

Neural Network Architectures ◽

Fully Connected

No framework exists that can explain and predict the generalisation ability of deep neural networks in general circumstances. In fact, this question has not been answered for some of the least complicated of neural network architectures: fully-connected feedforward networks with rectified linear activations and a limited number of hidden layers. For such an architecture, we show how adding a summary layer to the network makes it more amenable to analysis, and allows us to define the conditions that are required to guarantee that a set of samples will all be classified correctly. This process does not describe the generalisation behaviour of these networks, but produces a number of metrics that are useful for probing their learning and generalisation behaviour. We support the analytical conclusions with empirical results, both to confirm that the mathematical guarantees hold in practice, and to demonstrate the use of the analysis process.

Download Full-text

Physics Inspired Deep Neural Networks for Top Quark Reconstruction

EPJ Web of Conferences ◽

10.1051/epjconf/202024506029 ◽

2020 ◽

Vol 245 ◽

pp. 06029

Author(s):

Kevin Greif ◽

Kevin Lannon

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Top Quark ◽

Deep Neural Networks ◽

Great Success ◽

Fully Connected

Deep neural networks (DNNs) have been applied to the fields of computer vision and natural language processing with great success in recent years. The success of these applications has hinged on the development of specialized DNN architectures that take advantage of specific characteristics of the problem to be solved, namely convolutional neural networks for computer vision and recurrent neural networks for natural language processing. This research explores whether a neural network architecture specific to the task of identifying t → Wb decays in particle collision data yields better performance than a generic, fully-connected DNN. Although applied here to resolved top quark decays, this approach is inspired by an DNN technique for tagging boosted top quarks, which consists of defining custom neural network layers known as the combination and Lorentz layers. These layers encode knowledge of relativistic kinematics applied to combinations of particles, and the output of these specialized layers can then be fed into a fully connected neural network to learn tasks such as classification. This research compares the performance of these physics inspired networks to that of a generic, fully-connected DNN, to see if there is any advantage in terms of classification performance, size of the network, or ease of training.

Download Full-text

A Survey on Bias in Deep NLP

Applied Sciences ◽

10.3390/app11073184 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3184

Author(s):

Ismael Garrido-Muñoz ◽

Arturo Montejo-Ráez ◽

Fernando Martínez-Santiago ◽

L. Alfonso Ureña-López

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Natural Language Processing ◽

Probability Distribution ◽

Natural Language ◽

Network Design ◽

Language Processing ◽

Deep Neural Networks ◽

Learning Processes ◽

Relevant Issue

Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

Download Full-text

Reynolds averaged turbulence modelling using deep neural networks with embedded invariance

Journal of Fluid Mechanics ◽

10.1017/jfm.2016.615 ◽

2016 ◽

Vol 807 ◽

pp. 155-166 ◽

Cited By ~ 274

Author(s):

Julia Ling ◽

Andrew Kurzawski ◽

Jeremy Templeton

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reynolds Stress ◽

Network Architecture ◽

Eddy Viscosity ◽

Deep Neural Networks ◽

Test Cases ◽

Neural Network Architecture ◽

Stress Anisotropy ◽

Anisotropy Tensor

There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property. The Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

A Unified Framework for Improving Misclassifications in Modern Deep Neural Networks for Sentiment Analysis

10.1109/ijcnn52387.2021.9534168 ◽

2021 ◽

Author(s):

Ahoud Alhazmi ◽

Abdulwahab Aljubairy ◽

Wei Emma Zhang ◽

Quan Z Sheng ◽

Elaf Alhazmi

Keyword(s):

Neural Networks ◽

Sentiment Analysis ◽

Deep Neural Networks ◽

Unified Framework

Download Full-text

State-Space Representations of Deep Neural Networks

Neural Computation ◽

10.1162/neco_a_01165 ◽

2019 ◽

Vol 31 (3) ◽

pp. 538-554

Author(s):

Michael Hauser ◽

Sean Gunn ◽

Samer Saab ◽

Asok Ray

Keyword(s):

Neural Networks ◽

State Space ◽

Deep Neural Networks ◽

The State ◽

Network Architectures ◽

Embedding Dimension ◽

Dynamical Equations ◽

Closed Form Solutions ◽

Dense Networks ◽

Finite Difference Equations

This letter deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of [Formula: see text]-many skip connections into network architectures, such as residual networks and additive dense networks, defines [Formula: see text]th order dynamical equations on the layer-wise transformations. Closed-form solutions for the state-space representations of general [Formula: see text]th order additive dense networks, where the concatenation operation is replaced by addition, as well as [Formula: see text]th order smooth networks, are found. The developed provision endows deep neural networks with an algebraic structure. Furthermore, it is shown that imposing [Formula: see text]th order smoothness on network architectures with [Formula: see text]-many nodes per layer increases the state-space dimension by a multiple of [Formula: see text], and so the effective embedding dimension of the data manifold by the neural network is [Formula: see text]-many dimensions. It follows that network architectures of these types reduce the number of parameters needed to maintain the same embedding dimension by a factor of [Formula: see text] when compared to an equivalent first-order, residual network. Numerical simulations and experiments on CIFAR10, SVHN, and MNIST have been conducted to help understand the developed theory and efficacy of the proposed concepts.

Download Full-text

Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop

Natural Language Engineering ◽

10.1017/s135132491900024x ◽

2019 ◽

Vol 25 (4) ◽

pp. 543-557 ◽

Cited By ~ 3

Author(s):

Afra Alishahi ◽

Grzegorz Chrupała ◽

Tal Linzen

Keyword(s):

Neural Network ◽

Neural Networks ◽

Natural Language Processing ◽

Language Processing ◽

Performance Testing ◽

Network Architectures ◽

Empirical Methods ◽

Neural Models ◽

The Impact ◽

Systematic Manipulation

AbstractThe Empirical Methods in Natural Language Processing (EMNLP) 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.

Download Full-text

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

Electronics ◽

10.3390/electronics8010078 ◽

2019 ◽

Vol 8 (1) ◽

pp. 78 ◽

Cited By ~ 1

Author(s):

Zidi Qin ◽

Di Zhu ◽

Xingwei Zhu ◽

Xuan Chen ◽

Yinghuan Shi ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Circulant Matrix ◽

Circulant Matrices ◽

Matrix Vector Multiplication ◽

Processing Power ◽

Measurement Results ◽

Block Circulant Matrix ◽

Storage Complexity ◽

Fully Connected

As a key ingredient of deep neural networks (DNNs), fully-connected (FC) layers are widely used in various artificial intelligence applications. However, there are many parameters in FC layers, so the efficient process of FC layers is restricted by memory bandwidth. In this paper, we propose a compression approach combining block-circulant matrix-based weight representation and power-of-two quantization. Applying block-circulant matrices in FC layers can reduce the storage complexity from O ( k 2 ) to O ( k ) . By quantizing the weights into integer powers of two, the multiplications in the reference can be replaced by shift and add operations. The memory usages of models for MNIST, CIFAR-10 and ImageNet can be compressed by 171 × , 2731 × and 128 × with minimal accuracy loss, respectively. A configurable parallel hardware architecture is then proposed for processing the compressed FC layers efficiently. Without multipliers, a block matrix-vector multiplication module (B-MV) is used as the computing kernel. The architecture is flexible to support FC layers of various compression ratios with small footprint. Simultaneously, the memory access can be significantly reduced by using the configurable architecture. Measurement results show that the accelerator has a processing power of 409.6 GOPS, and achieves 5.3 TOPS/W energy efficiency at 800 MHz.

Download Full-text