Optimization based Layer-wise Magnitude-based Pruning for DNN Compression

Layer-wise magnitude-based pruning (LMP) is a very popular method for deep neural network (DNN) compression. However, tuning the layer-specific thresholds is a difficult task, since the space of threshold candidates is exponentially large and the evaluation is very expensive. Previous methods are mainly by hand and require expertise. In this paper, we propose an automatic tuning approach based on optimization, named OLMP. The idea is to transform the threshold tuning problem into a constrained optimization problem (i.e., minimizing the size of the pruned model subject to a constraint on the accuracy loss), and then use powerful derivative-free optimization algorithms to solve it. To compress a trained DNN, OLMP is conducted within a new iterative pruning and adjusting pipeline. Empirical results show that OLMP can achieve the best pruning ratio on LeNet-style models (i.e., 114 times for LeNet-300-100 and 298 times for LeNet-5) compared with some state-of-the- art DNN pruning methods, and can reduce the size of an AlexNet-style network up to 82 times without accuracy loss.

Download Full-text

An empirical study of derivative-free-optimization algorithms for targeted black-box attacks in deep neural networks

Optimization and Engineering ◽

10.1007/s11081-021-09652-w ◽

2021 ◽

Author(s):

Giuseppe Ughi ◽

Vinayak Abrol ◽

Jared Tanner

Keyword(s):

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Derivative Free Optimization ◽

Derivative Free ◽

Perturbation Energy ◽

Adversarial Example ◽

New Algorithms ◽

Comprehensive Study

AbstractWe perform a comprehensive study on the performance of derivative free optimization (DFO) algorithms for the generation of targeted black-box adversarial attacks on Deep Neural Network (DNN) classifiers assuming the perturbation energy is bounded by an $$\ell _\infty$$ ℓ ∞ constraint and the number of queries to the network is limited. This paper considers four pre-existing state-of-the-art DFO-based algorithms along with a further developed algorithm built on BOBYQA, a model-based DFO method. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify given a maximum number of queries to the DNN. The experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack; algorithms limiting the search of adversarial example to the vertices of the $$\ell ^\infty$$ ℓ ∞ constraint work particularly well without structural defenses, while the presented BOBYQA based algorithm works better for especially small perturbation energies. This variance in performance highlights the importance of new algorithms being compared to the state-of-the-art in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.

Download Full-text

Learning Compact Model for Large-Scale Multi-Label Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015385 ◽

2019 ◽

Vol 33 ◽

pp. 5385-5392 ◽

Cited By ~ 2

Author(s):

Tong Wei ◽

Yu-Feng Li

Keyword(s):

Large Scale ◽

Optimization Problem ◽

Performance Metrics ◽

State Of The Art ◽

Optimization Method ◽

Model Parameters ◽

Generalization Capability ◽

Constrained Optimization Problem ◽

Unseen Data ◽

Model Size

Large-scale multi-label learning (LMLL) aims to annotate relevant labels from a large number of candidates for unseen data. Due to the high dimensionality in both feature and label spaces in LMLL, the storage overheads of LMLL models are often costly. This paper proposes a POP (joint label and feature Parameter OPtimization) method. It tries to filter out redundant model parameters to facilitate compact models. Our key insights are as follows. First, we investigate labels that have little impact on the commonly used LMLL performance metrics and only preserve a small number of dominant parameters for these labels. Second, for the remaining influential labels, we reduce spurious feature parameters that have little contribution to the generalization capability of models, and preserve parameters for only discriminative features. The overall problem is formulated as a constrained optimization problem pursuing minimal model size. In order to solve the resultant difficult optimization, we show that a relaxation of the optimization can be efficiently solved using binary search and greedy strategies. Experiments verify that the proposed method clearly reduces the model size compared to state-of-the-art LMLL approaches, in addition, achieves highly competitive performance.

Download Full-text

FNNC: Achieving Fairness through Neural Networks

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/315 ◽

2020 ◽

Author(s):

Manisha Padala ◽

Sujit Gujar

Keyword(s):

Gradient Descent ◽

Optimization Problem ◽

State Of The Art ◽

High Accuracy ◽

Stochastic Gradient Descent ◽

Classification Models ◽

Constrained Optimization Problem ◽

Lagrangian Multipliers ◽

Fairness Constraints ◽

Generalization Errors

In classification models, fairness can be ensured by solving a constrained optimization problem. We focus on fairness constraints like Disparate Impact, Demographic Parity, and Equalized Odds, which are non-decomposable and non-convex. Researchers define convex surrogates of the constraints and then apply convex optimization frameworks to obtain fair classifiers. Surrogates serve as an upper bound to the actual constraints, and convexifying fairness constraints is challenging. We propose a neural network-based framework, \emph{FNNC}, to achieve fairness while maintaining high accuracy in classification. The above fairness constraints are included in the loss using Lagrangian multipliers. We prove bounds on generalization errors for the constrained losses which asymptotically go to zero. The network is optimized using two-step mini-batch stochastic gradient descent. Our experiments show that FNNC performs as good as the state of the art, if not better. The experimental evidence supplements our theoretical guarantees. In summary, we have an automated solution to achieve fairness in classification, which is easily extendable to many fairness constraints.

Download Full-text

Experienced Optimization with Reusable Directional Model for Hyper-Parameter Search

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/315 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yi-Qi Hu ◽

Yang Yu ◽

Zhi-Hua Zhou

Keyword(s):

State Of The Art ◽

Optimization Method ◽

Parameter Selection ◽

Optimization Approach ◽

Network Architectures ◽

Directional Model ◽

Derivative Free Optimization ◽

Derivative Free ◽

Parameter Search ◽

Difficult Issue

Hyper-parameter selection is a crucial yet difficult issue in machine learning. For this problem, derivative-free optimization has being playing an irreplaceable role. However, derivative-free optimization commonly requires a lot of hyper-parameter samples, while each sample could have a high cost for hyper-parameter selection due to the costly evaluation of a learning model. To tackle this issue, in this paper, we propose an experienced optimization approach, i.e., learning how to optimize better from a set of historical optimization processes. From the historical optimization processes on previous datasets, a directional model is trained to predict the direction of the next good hyper-parameter. The directional model is then reused to guide the optimization in learning new datasets. We implement this mechanism within a state-of-the-art derivative-free optimization method SRacos, and conduct experiments on learning the hyper-parameters of heterogeneous ensembles and neural network architectures. Experimental results verify that the proposed approach can significantly improve the learning accuracy within a limited hyper-parameter sample budget.

Download Full-text

HyperNOMAD

ACM Transactions on Mathematical Software ◽

10.1145/3450975 ◽

2021 ◽

Vol 47 (3) ◽

pp. 1-27

Author(s):

Dounia Lakhmiri ◽

Sébastien Le Digabel ◽

Christophe Tribes

Keyword(s):

Neural Network ◽

Learning Process ◽

Deep Neural Network ◽

Search Space ◽

Categorical Variables ◽

Current State ◽

Derivative Free Optimization ◽

Highly Sensitive ◽

Derivative Free ◽

Application Tuning

The performance of deep neural networks is highly sensitive to the choice of the hyperparameters that define the structure of the network and the learning process. When facing a new application, tuning a deep neural network is a tedious and time-consuming process that is often described as a “dark art.” This explains the necessity of automating the calibration of these hyperparameters. Derivative-free optimization is a field that develops methods designed to optimize time-consuming functions without relying on derivatives. This work introduces the HyperNOMAD package, an extension of the NOMAD software that applies the MADS algorithm [7] to simultaneously tune the hyperparameters responsible for both the architecture and the learning process of a deep neural network (DNN). This generic approach allows for an important flexibility in the exploration of the search space by taking advantage of categorical variables. HyperNOMAD is tested on the MNIST, Fashion-MNIST, and CIFAR-10 datasets and achieves results comparable to the current state of the art.

Download Full-text

Constrained multi-fidelity surrogate framework using Bayesian optimization with non-intrusive reduced-order basis

Advanced Modeling and Simulation in Engineering Sciences ◽

10.1186/s40323-020-00176-z ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Hanane Khatouri ◽

Tariq Benamara ◽

Piotr Breitkopf ◽

Jean Demange ◽

Paul Feliot

Keyword(s):

Optimization Problem ◽

Optimization Techniques ◽

Process Models ◽

Bayesian Optimization ◽

Test Case ◽

High Fidelity ◽

Reduced Order ◽

Derivative Free Optimization ◽

Derivative Free ◽

Trend Functions

AbstractThis article addresses the problem of constrained derivative-free optimization in a multi-fidelity (or variable-complexity) framework using Bayesian optimization techniques. It is assumed that the objective and constraints involved in the optimization problem can be evaluated using either an accurate but time-consuming computer program or a fast lower-fidelity one. In this setting, the aim is to solve the optimization problem using as few calls to the high-fidelity program as possible. To this end, it is proposed to use Gaussian process models with trend functions built from the projection of low-fidelity solutions on a reduced-order basis synthesized from scarce high-fidelity snapshots. A study on the ability of such models to accurately represent the objective and the constraints and a comparison of two improvement-based infill strategies are performed on a representative benchmark test case.

Download Full-text

Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

10.21236/ada622645 ◽

2015 ◽

Author(s):

Katya Scheinberg

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Learning Models ◽

Statistical Machine Learning ◽

Derivative Free Optimization ◽

Derivative Free ◽

Machine Learning Models

Download Full-text

Optimal Wells Placement to Maximize the Field Coverage Using Derivative-Free Optimization

Procedia Computer Science ◽

10.1016/j.procs.2020.11.008 ◽

2020 ◽

Vol 178 ◽

pp. 65-74

Author(s):

Ksenia Balabaeva ◽

Liya Akmadieva ◽

Sergey Kovalchuk

Keyword(s):

Derivative Free Optimization ◽

Derivative Free

Download Full-text

Black box operation optimization of basic oxygen furnace steelmaking process with derivative free optimization algorithm

Computers & Chemical Engineering ◽

10.1016/j.compchemeng.2021.107311 ◽

2021 ◽

Vol 150 ◽

pp. 107311

Author(s):

Yongxia Liu ◽

Lixin Tang ◽

Chang Liu ◽

Lijie Su ◽

Jian Wu

Keyword(s):

Optimization Algorithm ◽

Basic Oxygen Furnace ◽

Black Box ◽

Steelmaking Process ◽

Basic Oxygen ◽

Operation Optimization ◽

Derivative Free Optimization ◽

Derivative Free

Download Full-text

Fast Accurate and Automatic Brushstroke Extraction

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3429742 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-24

Author(s):

Yunfei Fu ◽

Hongchuan Yu ◽

Chih-Kuo Yeh ◽

Tong-Yee Lee ◽

Jian J. Zhang

Keyword(s):

Neural Network ◽

Efficient Algorithm ◽

Deep Neural Network ◽

High Efficiency ◽

State Of The Art ◽

High Reliability ◽

The Other ◽

Manual Annotation ◽

Stroke Extraction ◽

Art Research

Brushstrokes are viewed as the artist’s “handwriting” in a painting. In many applications such as style learning and transfer, mimicking painting, and painting authentication, it is highly desired to quantitatively and accurately identify brushstroke characteristics from old masters’ pieces using computer programs. However, due to the nature of hundreds or thousands of intermingling brushstrokes in the painting, it still remains challenging. This article proposes an efficient algorithm for brush Stroke extraction based on a Deep neural network, i.e., DStroke. Compared to the state-of-the-art research, the main merit of the proposed DStroke is to automatically and rapidly extract brushstrokes from a painting without manual annotation, while accurately approximating the real brushstrokes with high reliability. Herein, recovering the faithful soft transitions between brushstrokes is often ignored by the other methods. In fact, the details of brushstrokes in a master piece of painting (e.g., shapes, colors, texture, overlaps) are highly desired by artists since they hold promise to enhance and extend the artists’ powers, just like microscopes extend biologists’ powers. To demonstrate the high efficiency of the proposed DStroke, we perform it on a set of real scans of paintings and a set of synthetic paintings, respectively. Experiments show that the proposed DStroke is noticeably faster and more accurate at identifying and extracting brushstrokes, outperforming the other methods.

Download Full-text