HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

Sungho Shin; Jinhwan Park; Yoonho Boo; Wonyong Sung

doi:10.1609/aaai.v34i04.6035

HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6035 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5784-5791

Author(s):

Sungho Shin ◽

Jinhwan Park ◽

Yoonho Boo ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Quantization Noise ◽

Quantization Scheme ◽

Performance Improvements ◽

Training Scheme ◽

Network Training ◽

Training Technique

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.

Download Full-text

Modified Deep Neural Networks for Dog Breeds Identification

10.20944/preprints201812.0232.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Aydin Ayanzadeh ◽

Sahand Vahidnia

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Fine Tuning ◽

Test Accuracy ◽

Data Sets ◽

Data Set

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.

Download Full-text

Tri-net for Semi-Supervised Deep Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/278 ◽

2018 ◽

Cited By ~ 11

Author(s):

Dong-Dong Chen ◽

Wei Wang ◽

Wei Gao ◽

Zhi-Hua Zhou

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Error Rate ◽

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Learning Methods ◽

Model Initialization

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.

Download Full-text

Gastric Pathology Image Classification Using Stepwise Fine-Tuning for Deep Neural Networks

Journal of Healthcare Engineering ◽

10.1155/2018/8961781 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 18

Author(s):

Jia Qu ◽

Nobuyuki Hiruta ◽

Kensuke Terai ◽

Hirokazu Nosato ◽

Masahiro Murakawa ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Image Classification ◽

Deep Neural Networks ◽

State Of The Art ◽

Image Data ◽

Fine Tuning ◽

Data Annotation ◽

Gastric Pathology ◽

Pathology Image

Deep learning using convolutional neural networks (CNNs) is a distinguished tool for many image classification tasks. Due to its outstanding robustness and generalization, it is also expected to play a key role to facilitate advanced computer-aided diagnosis (CAD) for pathology images. However, the shortage of well-annotated pathology image data for training deep neural networks has become a major issue at present because of the high-cost annotation upon pathologist’s professional observation. Faced with this problem, transfer learning techniques are generally used to reinforcing the capacity of deep neural networks. In order to further boost the performance of the state-of-the-art deep neural networks and alleviate insufficiency of well-annotated data, this paper presents a novel stepwise fine-tuning-based deep learning scheme for gastric pathology image classification and establishes a new type of target-correlative intermediate datasets. Our proposed scheme is deemed capable of making the deep neural network imitating the pathologist’s perception manner and of acquiring pathology-related knowledge in advance, but with very limited extra cost in data annotation. The experiments are conducted with both well-annotated gastric pathology data and the proposed target-correlative intermediate data on several state-of-the-art deep neural networks. The results congruously demonstrate the feasibility and superiority of our proposed scheme for boosting the classification performance.

Download Full-text

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015297 ◽

2019 ◽

Vol 33 ◽

pp. 5297-5304 ◽

Cited By ~ 4

Author(s):

Shipeng Wang ◽

Jian Sun ◽

Zongben Xu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Research Topic ◽

Decay Rates ◽

Adaptive Combination ◽

Network Training ◽

Stochastic Optimization Algorithms

Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme

Electronics ◽

10.3390/electronics10030230 ◽

2021 ◽

Vol 10 (3) ◽

pp. 230

Author(s):

Jaechan Cho ◽

Yongchul Jung ◽

Seongjoo Lee ◽

Yunho Jung

Keyword(s):

Neural Networks ◽

High Throughput ◽

Deep Neural Networks ◽

State Of The Art ◽

Throughput Performance ◽

Adaptive Parallelism ◽

Sensor Applications ◽

Binary Neural Network ◽

Target Layer ◽

Network Topologies

Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.

Download Full-text

Towards pixel-to-pixel deep nucleus detection in microscopy images

BMC Bioinformatics ◽

10.1186/s12859-019-3037-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Fuyong Xing ◽

Yuanpu Xie ◽

Xiaoshuang Shi ◽

Pingjun Chen ◽

Zizhao Zhang ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Image Data ◽

Fine Tuning ◽

Cell Detection ◽

Imaging Protocol ◽

Microscopy Image ◽

Microscopy Images ◽

Target Data

Abstract Background Nucleus or cell detection is a fundamental task in microscopy image analysis and supports many other quantitative studies such as object counting, segmentation, tracking, etc. Deep neural networks are emerging as a powerful tool for biomedical image computing; in particular, convolutional neural networks have been widely applied to nucleus/cell detection in microscopy images. However, almost all models are tailored for specific datasets and their applicability to other microscopy image data remains unknown. Some existing studies casually learn and evaluate deep neural networks on multiple microscopy datasets, but there are still several critical, open questions to be addressed. Results We analyze the applicability of deep models specifically for nucleus detection across a wide variety of microscopy image data. More specifically, we present a fully convolutional network-based regression model and extensively evaluate it on large-scale digital pathology and microscopy image datasets, which consist of 23 organs (or cancer diseases) and come from multiple institutions. We demonstrate that for a specific target dataset, training with images from the same types of organs might be usually necessary for nucleus detection. Although the images can be visually similar due to the same staining technique and imaging protocol, deep models learned with images from different organs might not deliver desirable results and would require model fine-tuning to be on a par with those trained with target data. We also observe that training with a mixture of target and other/non-target data does not always mean a higher accuracy of nucleus detection, and it might require proper data manipulation during model training to achieve good performance. Conclusions We conduct a systematic case study on deep models for nucleus detection in a wide variety of microscopy images, aiming to address several important but previously understudied questions. We present and extensively evaluate an end-to-end, pixel-to-pixel fully convolutional regression network and report a few significant findings, some of which might have not been reported in previous studies. The model performance analysis and observations would be helpful to nucleus detection in microscopy images.

Download Full-text

Natural Images Allow Universal Adversarial Attacks on Medical Image Classification Using Deep Neural Networks with Transfer Learning

10.21203/rs.3.rs-757225/v1 ◽

2021 ◽

Author(s):

Akinori Minagi ◽

Hokuto Hirano ◽

Kazuhiro Takemoto

Keyword(s):

Neural Networks ◽

Image Classification ◽

Transfer Learning ◽

Medical Image ◽

Deep Neural Networks ◽

Disease Diagnosis ◽

Natural Images ◽

Fine Tuning ◽

Security And Privacy ◽

Medical Image Classification

Abstract Transfer learning from natural images is well used in deep neural networks (DNNs) for medical image classification to achieve computer-aided clinical diagnosis. Although the adversarial vulnerability of DNNs hinders practical applications owing to the high stakes of diagnosis, adversarial attacks are expected to be limited because training data — which are often required for adversarial attacks — are generally unavailable in terms of security and privacy preservation. Nevertheless, we hypothesized that adversarial attacks are also possible using natural images because pre-trained models do not change significantly after fine-tuning. We focused on three representative DNN-based medical image classification tasks (i.e., skin cancer, referable diabetic retinopathy, and pneumonia classifications) and investigated whether medical DNN models with transfer learning are vulnerable to universal adversarial perturbations (UAPs), generated using natural images. UAPs from natural images are useful for both non-targeted and targeted attacks. The performance of UAPs from natural images was significantly higher than that of random controls, although slightly lower than that of UAPs from training images. Vulnerability to UAPs from natural images was observed between different natural image datasets and between different model architectures. The use of transfer learning causes a security hole, which decreases the reliability and safety of computer-based disease diagnosis. Model training from random initialization (without transfer learning) reduced the performance of UAPs from natural images; however, it did not completely avoid vulnerability to UAPs. The vulnerability of UAPs from natural images will become a remarkable security threat.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Framework for TCAD augmented machine learning on multi- I–V characteristics using convolutional neural network and multiprocessing

Journal of Semiconductors ◽

10.1088/1674-4926/42/12/124101 ◽

2021 ◽

Vol 42 (12) ◽

pp. 124101

Author(s):

Thomas Hirtz ◽

Steyn Huurman ◽

He Tian ◽

Yi Yang ◽

Tian-Ling Ren

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Information Technologies ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Driven ◽

Sufficient Data ◽

Learning Models ◽

Simulation Tools ◽

New Information

Abstract In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-I–V curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.

Download Full-text