Topological measurement of deep neural networks using persistent homology

Annals of Mathematics and Artificial Intelligence ◽

10.1007/s10472-021-09761-3 ◽

2021 ◽

Author(s):

Satoru Watanabe ◽

Hayato Yamana

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Persistent Homology ◽

Topological Data Analysis ◽

Data Sets ◽

One Dimensional ◽

Novel Approach ◽

The One ◽

Fully Connected ◽

Fully Connected Networks

AbstractThe inner representation of deep neural networks (DNNs) is indecipherable, which makes it difficult to tune DNN models, control their training process, and interpret their outputs. In this paper, we propose a novel approach to investigate the inner representation of DNNs through topological data analysis (TDA). Persistent homology (PH), one of the outstanding methods in TDA, was employed for investigating the complexities of trained DNNs. We constructed clique complexes on trained DNNs and calculated the one-dimensional PH of DNNs. The PH reveals the combinational effects of multiple neurons in DNNs at different resolutions, which is difficult to be captured without using PH. Evaluations were conducted using fully connected networks (FCNs) and networks combining FCNs and convolutional neural networks (CNNs) trained on the MNIST and CIFAR-10 data sets. Evaluation results demonstrate that the PH of DNNs reflects both the excess of neurons and problem difficulty, making PH one of the prominent methods for investigating the inner representation of DNNs.

Download Full-text

Notes on the Symmetries of 2-Layer ReLU-Networks

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5150 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Henning Petzka ◽

Martin Trimmel ◽

Cristian Sminchisescu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Complete Characterization ◽

Activation Functions ◽

Network Function ◽

Fully Connected ◽

Fully Connected Networks

Symmetries in neural networks allow different weight configurations leading to the same network function. For odd activation functions, the set of transformations mapping between such configurations have been studied extensively, but less is known for neural networks with ReLU activation functions. We give a complete characterization for fully-connected networks with two layers. Apart from two well-known transformations, only degenerated situations allow additional transformations that leave the network function unchanged. Reduction steps can remove only part of the degenerated cases. Finally, we present a non-degenerate situation for deep neural networks leading to new transformations leaving the network function intact.

Download Full-text

VECA: A Method for Detecting Overfitting in Neural Networks (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7167 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13791-13792

Author(s):

Liangzhu Ge ◽

Yuexian Hou ◽

Yaju Jiang ◽

Shuai Yao ◽

Chao Yang

Keyword(s):

Neural Networks ◽

Strong Correlation ◽

Good Predictor ◽

Deep Neural Networks ◽

Training Data ◽

Training Process ◽

Generalization Performance ◽

Validation Set ◽

Fully Connected ◽

Fully Connected Networks

Despite their widespread applications, deep neural networks often tend to overfit the training data. Here, we propose a measure called VECA (Variance of Eigenvalues of Covariance matrix of Activation matrix) and demonstrate that VECA is a good predictor of networks' generalization performance during the training process. Experiments performed on fully-connected networks and convolutional neural networks trained on benchmark image datasets show a strong correlation between test loss and VECA, which suggest that we can calculate the VECA to estimate generalization performance without sacrificing training data to be used as a validation set.

Download Full-text

A Novel Approach to the Analysis of the Soil Consolidation Problem by Using Non-Classical Rheological Schemes

Applied Sciences ◽

10.3390/app11051980 ◽

2021 ◽

Vol 11 (5) ◽

pp. 1980

Author(s):

Kazimierz Józefiak ◽

Artur Zbiciak ◽

Karol Brzeziński ◽

Maciej Maślakowski

Keyword(s):

Constitutive Models ◽

Organic Soil ◽

Organic Soils ◽

Soil Consolidation ◽

Flexible Tool ◽

One Dimensional ◽

Novel Approach ◽

Soil Skeleton ◽

The One ◽

The Relationship

The paper presents classical and non-classical rheological schemes used to formulate constitutive models of the one-dimensional consolidation problem. The authors paid special attention to the secondary consolidation effects in organic soils as well as the soil over-consolidation phenomenon. The systems of partial differential equations were formulated for every model and solved numerically to obtain settlement curves. Selected numerical results were compared with standard oedometer laboratory test data carried out by the authors on organic soil samples. Additionally, plasticity phenomenon and non-classical rheological elements were included in order to take into account soil over-consolidation behaviour in the one-dimensional settlement model. A new way of formulating constitutive equations for the soil skeleton and predicting the relationship between the effective stress and strain or void ratio was presented. Rheological structures provide a flexible tool for creating complex constitutive relationships of soil.

Download Full-text

Binary and Multiclass Text Classification by Means of Separable Convolutional Neural Network

Inventions ◽

10.3390/inventions6040070 ◽

2021 ◽

Vol 6 (4) ◽

pp. 70

Author(s):

Elena Solovyeva ◽

Ali Abdullah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Networks ◽

Low Cost ◽

Computational Cost ◽

High Accuracy ◽

Activation Functions ◽

Fully Connected ◽

Fully Connected Networks

In this paper, the structure of a separable convolutional neural network that consists of an embedding layer, separable convolutional layers, convolutional layer and global average pooling is represented for binary and multiclass text classifications. The advantage of the proposed structure is the absence of multiple fully connected layers, which is used to increase the classification accuracy but raises the computational cost. The combination of low-cost separable convolutional layers and a convolutional layer is proposed to gain high accuracy and, simultaneously, to reduce the complexity of neural classifiers. Advantages are demonstrated at binary and multiclass classifications of written texts by means of the proposed networks under the sigmoid and Softmax activation functions in convolutional layer. At binary and multiclass classifications, the accuracy obtained by separable convolutional neural networks is higher in comparison with some investigated types of recurrent neural networks and fully connected networks.

Download Full-text

One-Dimensional Convolutional Neural Networks with Feature Selection for Highly Concise Rule Extraction from Credit Scoring Datasets with Heterogeneous Attributes

Electronics ◽

10.3390/electronics9081318 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1318

Author(s):

Yoichi Hayashi ◽

Naoki Takano

Keyword(s):

Neural Networks ◽

Credit Scoring ◽

Extraction Methods ◽

Rule Extraction ◽

Financial Industry ◽

New Approach ◽

New Era ◽

One Dimensional ◽

Recursive Rule ◽

Fully Connected

Convolution neural networks (CNNs) have proven effectiveness, but they are not applicable to all datasets, such as those with heterogeneous attributes, which are often used in the finance and banking industries. Such datasets are difficult to classify, and to date, existing high-accuracy classifiers and rule-extraction methods have not been able to achieve sufficiently high classification accuracies or concise classification rules. This study aims to provide a new approach for achieving transparency and conciseness in credit scoring datasets with heterogeneous attributes by using a one-dimensional (1D) fully-connected layer first CNN combined with the Recursive-Rule Extraction (Re-RX) algorithm with a J48graft decision tree (hereafter 1D FCLF-CNN). Based on a comparison between the proposed 1D FCLF-CNN and existing rule extraction methods, our architecture enabled the extraction of the most concise rules (6.2) and achieved the best accuracy (73.10%), i.e., the highest interpretability–priority rule extraction. These results suggest that the 1D FCLF-CNN with Re-RX with J48graft is very effective for extracting highly concise rules for heterogeneous credit scoring datasets. Although it does not completely overcome the accuracy–interpretability dilemma for deep learning, it does appear to resolve this issue for credit scoring datasets with heterogeneous attributes, and thus, could lead to a new era in the financial industry.

Download Full-text

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

Electronics ◽

10.3390/electronics8010078 ◽

2019 ◽

Vol 8 (1) ◽

pp. 78 ◽

Cited By ~ 1

Author(s):

Zidi Qin ◽

Di Zhu ◽

Xingwei Zhu ◽

Xuan Chen ◽

Yinghuan Shi ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Circulant Matrix ◽

Circulant Matrices ◽

Matrix Vector Multiplication ◽

Processing Power ◽

Measurement Results ◽

Block Circulant Matrix ◽

Storage Complexity ◽

Fully Connected

As a key ingredient of deep neural networks (DNNs), fully-connected (FC) layers are widely used in various artificial intelligence applications. However, there are many parameters in FC layers, so the efficient process of FC layers is restricted by memory bandwidth. In this paper, we propose a compression approach combining block-circulant matrix-based weight representation and power-of-two quantization. Applying block-circulant matrices in FC layers can reduce the storage complexity from O ( k 2 ) to O ( k ) . By quantizing the weights into integer powers of two, the multiplications in the reference can be replaced by shift and add operations. The memory usages of models for MNIST, CIFAR-10 and ImageNet can be compressed by 171 × , 2731 × and 128 × with minimal accuracy loss, respectively. A configurable parallel hardware architecture is then proposed for processing the compressed FC layers efficiently. Without multipliers, a block matrix-vector multiplication module (B-MV) is used as the computing kernel. The architecture is flexible to support FC layers of various compression ratios with small footprint. Simultaneously, the memory access can be significantly reduced by using the configurable architecture. Measurement results show that the accelerator has a processing power of 409.6 GOPS, and achieves 5.3 TOPS/W energy efficiency at 800 MHz.

Download Full-text

Enhanced Fusion of Deep Neural Networks for Classification of Benchmark High-Resolution Image Data Sets

IEEE Geoscience and Remote Sensing Letters ◽

10.1109/lgrs.2018.2839092 ◽

2018 ◽

Vol 15 (9) ◽

pp. 1451-1455 ◽

Cited By ~ 23

Author(s):

Grant J. Scott ◽

Kyle C. Hagan ◽

Richard A. Marcum ◽

James Alex Hurt ◽

Derek T. Anderson ◽

...

Keyword(s):

Neural Networks ◽

High Resolution ◽

Deep Neural Networks ◽

Image Data ◽

Data Sets ◽

Resolution Image ◽

High Resolution Image

Download Full-text

Convergence Behavior of DNNs with Mutual-Information-Based Regularization

Entropy ◽

10.3390/e22070727 ◽

2020 ◽

Vol 22 (7) ◽

pp. 727 ◽

Cited By ~ 1

Author(s):

Hlynur Jónsson ◽

Giovanni Cherubini ◽

Evangelos Eleftheriou

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Low Complexity ◽

High Dimensional ◽

Test Accuracy ◽

Compression Phase ◽

Hidden Layer ◽

Low Dimensional ◽

Fully Connected ◽

Fully Connected Networks

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.

Download Full-text

A novel layerwise pruning method for model reduction of fully connected deep neural networks

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7952583 ◽

2017 ◽

Cited By ~ 4

Author(s):

Lukas Mauch ◽

Bin Yang

Keyword(s):

Neural Networks ◽

Model Reduction ◽

Deep Neural Networks ◽

Pruning Method ◽

Fully Connected

Download Full-text

SPATIAL DISORDER OF CNN — WITH ASYMMETRIC OUTPUT FUNCTION

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127401003358 ◽

2001 ◽

Vol 11 (08) ◽

pp. 2085-2095 ◽

Cited By ~ 11

Author(s):

JUNG-CHAO BAN ◽

KAI-PING CHIEN ◽

SONG-SUN LIN ◽

CHENG-HSIUNG HSU

Keyword(s):

Neural Networks ◽

Steady State ◽

Three Dimensional ◽

Cellular Neural Networks ◽

Output Function ◽

One Dimensional ◽

Spatial Entropy ◽

The One ◽

Steady State Solutions

This investigation will describe the spatial disorder of one-dimensional Cellular Neural Networks (CNN). The steady state solutions of the one-dimensional CNN can be replaced as an iteration map which is one dimensional under certain parameters. Then, the maps are chaotic and the spatial entropy of the steady state solutions is a three-dimensional devil-staircase like function.

Download Full-text