A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

This paper presents research focusing on visualization and pattern recognition based on computer science. Although deep neural networks demonstrate satisfactory performance regarding image and voice recognition, as well as pattern analysis and intrusion detection, they exhibit inferior performance towards adversarial examples. Noise introduction, to some degree, to the original data could lead adversarial examples to be misclassified by deep neural networks, even though they can still be deemed as normal by humans. In this paper, a robust diversity adversarial training method against adversarial attacks was demonstrated. In this approach, the target model is more robust to unknown adversarial examples, as it trains various adversarial samples. During the experiment, Tensorflow was employed as our deep learning framework, while MNIST and Fashion-MNIST were used as experimental datasets. Results revealed that the diversity training method has lowered the attack success rate by an average of 27.2 and 24.3% for various adversarial examples, while maintaining the 98.7 and 91.5% accuracy rates regarding the original data of MNIST and Fashion-MNIST.

Download Full-text

Why Dose Layer-by-Layer Pre-training Improve Deep Neural Networks Learning?

Handbook of Deep Learning Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-030-11479-4_13 ◽

2019 ◽

pp. 293-318

Author(s):

Seyyede Zohreh Seyyedsalehi ◽

Seyyed Ali Seyyedsalehi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Layer By Layer

Download Full-text

A Cascading Structure and Training Method for Multilayer Neural Networks

International Journal of Neural Systems ◽

10.1142/s0129065797000495 ◽

1997 ◽

Vol 08 (05n06) ◽

pp. 509-515

Author(s):

Yan Li ◽

A. B. Rad

Keyword(s):

Neural Networks ◽

Back Propagation ◽

Training Sample ◽

Bp Algorithm ◽

Layer By Layer ◽

Training Procedure ◽

Training Method ◽

Training Samples ◽

Multilayer Neural Networks ◽

And Training

A new structure and training method for multilayer neural networks is presented. The proposed method is based on cascade training of subnetworks and optimizing weights layer by layer. The training procedure is completed in two steps. First, a subnetwork, m inputs and n outputs as the style of training samples, is trained using the training samples. Secondly the outputs of the subnetwork is taken as the inputs and the outputs of the training sample as the desired outputs, another subnetwork with n inputs and n outputs is trained. Finally the two trained subnetworks are connected and a trained multilayer neural networks is created. The numerical simulation results based on both linear least squares back-propagation (LSB) and traditional back-propagation (BP) algorithm have demonstrated the efficiency of the proposed method.

Download Full-text

Deep forest

National Science Review ◽

10.1093/nsr/nwy108 ◽

2018 ◽

Vol 6 (1) ◽

pp. 74-86 ◽

Cited By ~ 25

Author(s):

Zhi-Hua Zhou ◽

Ji Feng

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Decision Trees ◽

Deep Neural Networks ◽

Machine Learning Techniques ◽

Model Complexity ◽

Layer By Layer ◽

Theoretical Understanding ◽

Deep Forest ◽

Gradient Based

Abstract Current deep-learning models are mostly built upon neural networks, i.e. multiple layers of parameterized differentiable non-linear modules that can be trained by backpropagation. In this paper, we explore the possibility of building deep models based on non-differentiable modules such as decision trees. After a discussion about the mystery behind deep neural networks, particularly by contrasting them with shallow neural networks and traditional machine-learning techniques such as decision trees and boosting machines, we conjecture that the success of deep neural networks owes much to three characteristics, i.e. layer-by-layer processing, in-model feature transformation and sufficient model complexity. On one hand, our conjecture may offer inspiration for theoretical understanding of deep learning; on the other hand, to verify the conjecture, we propose an approach that generates deep forest holding these characteristics. This is a decision-tree ensemble approach, with fewer hyper-parameters than deep neural networks, and its model complexity can be automatically determined in a data-dependent way. Experiments show that its performance is quite robust to hyper-parameter settings, such that in most cases, even across different data from different domains, it is able to achieve excellent performance by using the same default setting. This study opens the door to deep learning based on non-differentiable modules without gradient-based adjustment, and exhibits the possibility of constructing deep models without backpropagation.

Download Full-text

Rolling Bearing Fault Diagnosis Using a Deep Convolutional Autoencoding Network and Improved Gustafson–Kessel Clustering

Shock and Vibration ◽

10.1155/2020/8846589 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Yaochun Wu ◽

Rongzhen Zhao ◽

Wuyin Jin ◽

Linfeng Deng ◽

Tianjing He ◽

...

Keyword(s):

Neural Networks ◽

Fault Diagnosis ◽

Deep Neural Networks ◽

Clustering Algorithm ◽

Rolling Bearing ◽

Fault Identification ◽

Layer By Layer ◽

Bearing Fault Diagnosis ◽

Novel Method ◽

Low Dimensional

Deep learning (DL) has been successfully used in fault diagnosis. Training deep neural networks, such as convolutional neural networks (CNNs), require plenty of labeled samples. However, in mechanical fault diagnosis, labeled data are costly and time-consuming to collect. A novel method based on a deep convolutional autoencoding network (DCAEN) and adaptive nonparametric weighted-feature extraction Gustafson–Kessel (ANW-GK) clustering algorithm was developed for the fault diagnosis of bearings. First, the DCAEN that is pretrained layer by layer by unlabeled samples and fine-tuned by a few labeled samples is applied to learn representative features from the vibration signals. Then, the learned representative features are reduced by t-distributed stochastic neighbor embedding (t-SNE), and the low-dimensional main features are obtained. Finally, the low-dimensional features are input ANW-GK clustering for fault identification. Two datasets were used to validate the effectiveness of the proposed method. The experimental results show that the proposed method can effectively diagnose different fault types with only a few labeled samples.

Download Full-text

Examining the Causal Structures of Deep Neural Networks Using Information Theory

Entropy ◽

10.3390/e22121429 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1429

Author(s):

Scythia Marrow ◽

Eric J. Michaud ◽

Erik Hoel

Keyword(s):

Neural Networks ◽

Information Theory ◽

Mutual Information ◽

Deep Neural Networks ◽

Causal Structure ◽

Data Sets ◽

Layer By Layer ◽

Input And Output ◽

Causal Structures ◽

Entropy Perturbation

Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. Yet DNNs can also be examined at the level of causation, exploring “what does what” within the layers of the network itself. Historically, analyzing the causal structure of DNNs has received less attention than understanding their responses to input. Yet definitionally, generalizability must be a function of a DNN’s causal structure as it reflects how the DNN responds to unseen or even not-yet-defined future inputs. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training. Specifically, we introduce the effective information (EI) of a feedforward DNN, which is the mutual information between layer input and output following a maximum-entropy perturbation. The EI can be used to assess the degree of causal influence nodes and edges have over their downstream targets in each layer. We show that the EI can be further decomposed in order to examine the sensitivity of a layer (measured by how well edges transmit perturbations) and the degeneracy of a layer (measured by how edge overlap interferes with transmission), along with estimates of the amount of integrated information of a layer. Together, these properties define where each layer lies in the “causal plane”, which can be used to visualize how layer connectivity becomes more sensitive or degenerate over time, and how integration changes during training, revealing how the layer-by-layer causal structure differentiates. These results may help in understanding the generalization capabilities of DNNs and provide foundational tools for making DNNs both more generalizable and more explainable.

Download Full-text

Information Bottleneck Theory Based Exploration of Cascade Learning

Entropy ◽

10.3390/e23101360 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1360

Author(s):

Xin Du ◽

Katayoun Farrahi ◽

Mahesan Niranjan

Keyword(s):

Neural Network ◽

Neural Networks ◽

Pattern Recognition ◽

Mutual Information ◽

Theoretical Approach ◽

Deep Neural Networks ◽

Layer By Layer ◽

Information Compression ◽

Excellent Performance ◽

Information Bottleneck

In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (I(X;T)) and the representation to the target (I(T;Y)). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, I(T;Y)/I(X;T), and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.

Download Full-text

Object detection models of remote sensing images using deep neural networks with weakly supervised training method

Scientia Sinica Informationis ◽

10.1360/n112017-00208 ◽

2018 ◽

Vol 48 (8) ◽

pp. 1022-1034

Author(s):

Mingfei ZHOU ◽

Xili WANG

Keyword(s):

Remote Sensing ◽

Neural Networks ◽

Object Detection ◽

Deep Neural Networks ◽

Training Method ◽

Remote Sensing Images ◽

Supervised Training ◽

Weakly Supervised

Download Full-text

Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/31/4/5944 ◽

2016 ◽

Vol 31 (4) ◽

pp. 267

Author(s):

Bao Quoc Nguyen ◽

Thang Tat Vu ◽

Mai Chi Luong

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Error Rate ◽

Deep Neural Networks ◽

Recognition Performance ◽

Recognition System ◽

Speech Recognition System ◽

Training Method ◽

Continuous Speech Recognition ◽

Word Error Rate

In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively.

Download Full-text

Digital Hologram Watermarking Based on Multiple Deep Neural Networks Training Reconstruction and Attack

Sensors ◽

10.3390/s21154977 ◽

2021 ◽

Vol 21 (15) ◽

pp. 4977

Author(s):

Ji-Won Kang ◽

Jae-Eun Lee ◽

Jang-Hwan Choi ◽

Woosuk Kim ◽

Jin-Kyum Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Training Method ◽

Digital Hologram ◽

Network Training ◽

Digital Holograms ◽

Holographic Reconstruction

This paper proposes a method to embed and extract a watermark on a digital hologram using a deep neural network. The entire algorithm for watermarking digital holograms consists of three sub-networks. For the robustness of watermarking, an attack simulation is inserted inside the deep neural network. By including attack simulation and holographic reconstruction in the network, the deep neural network for watermarking can simultaneously train invisibility and robustness. We propose a network training method using hologram and reconstruction. After training the proposed network, we analyze the robustness of each attack and perform re-training according to this result to propose a method to improve the robustness. We quantitatively evaluate the results of robustness against various attacks and show the reliability of the proposed technique.

Download Full-text