Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

Jianghao Shen; Yue Wang; Pengfei Xu; Yonggan Fu; Zhangyang Wang; Yingyan Lin

doi:10.1609/aaai.v34i04.6025

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6025 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5700-5708 ◽

Cited By ~ 3

Author(s):

Jianghao Shen ◽

Yue Wang ◽

Pengfei Xu ◽

Yonggan Fu ◽

Zhangyang Wang ◽

...

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Expressive Power ◽

Dependent Manner ◽

Binary Decision ◽

Trade Offs ◽

Adaptive Inference ◽

Unified View ◽

Supplementary Material ◽

Core Idea

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, “softer” decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate “soft” choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two “extremes” (i.e., full bitwidth and zero bitwidth). In this way, DFS can “fractionally” exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at https://github.com/Torment123/DFS.

Download Full-text

Evaluation of Clustering Algorithms on HPC Platforms

Mathematics ◽

10.3390/math9172156 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2156

Author(s):

Juan M. Cebrian ◽

Baldomero Imbernón ◽

Jesús Soto ◽

José M. Cecilia

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

Computational Cost ◽

Common Features ◽

Fitness Functions ◽

Fuzzy Methods ◽

Trade Offs ◽

Energy Trade ◽

Group A ◽

Data Elements

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.

Download Full-text

Plant Leaf Disease Recognition Using Depth-Wise Separable Convolution-Based Models

Symmetry ◽

10.3390/sym13030511 ◽

2021 ◽

Vol 13 (3) ◽

pp. 511

Author(s):

Syed Mohammad Minhaz Hossain ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Region Of Interest ◽

Number Of Clusters ◽

Plant Leaf ◽

Leaf Disease ◽

Automatic Initialization ◽

Adequate Accuracy ◽

Model Size ◽

High Computational Cost

Proper plant leaf disease (PLD) detection is challenging in complex backgrounds and under different capture conditions. For this reason, initially, modified adaptive centroid-based segmentation (ACS) is used to trace the proper region of interest (ROI). Automatic initialization of the number of clusters (K) using modified ACS before recognition increases tracing ROI’s scalability even for symmetrical features in various plants. Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent. However, memory requirements (large-scaled parameters) and the high computational cost of CNN-based PLD models are burning issues for the memory restricted mobile and IoT-based devices. Therefore, after tracing ROIs, three proposed depth-wise separable convolutional PLD (DSCPLD) models, such as segmented modified DSCPLD (S-modified MobileNet), segmented reduced DSCPLD (S-reduced MobileNet), and segmented extended DSCPLD (S-extended MobileNet), are utilized to represent the constructive trade-off among accuracy, model size, and computational latency. Moreover, we have compared our proposed DSCPLD recognition models with state-of-the-art models, such as MobileNet, VGG16, VGG19, and AlexNet. Among segmented-based DSCPLD models, S-modified MobileNet achieves the best accuracy of 99.55% and F1-sore of 97.07%. Besides, we have simulated our DSCPLD models using both full plant leaf images and segmented plant leaf images and conclude that, after using modified ACS, all models increase their accuracy and F1-score. Furthermore, a new plant leaf dataset containing 6580 images of eight plants was used to experiment with several depth-wise separable convolution models.

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

Evaluating simplified chemical mechanisms within present-day simulations of the Community Earth System Model version 1.2 with CAM4 (CESM1.2 CAM-chem): MOZART-4 vs. Reduced Hydrocarbon vs. Super-Fast chemistry

Geoscientific Model Development ◽

10.5194/gmd-11-4155-2018 ◽

2018 ◽

Vol 11 (10) ◽

pp. 4155-4174 ◽

Cited By ~ 2

Author(s):

Benjamin Brown-Steiner ◽

Noelle E. Selin ◽

Ronald Prinn ◽

Simone Tilmes ◽

Louisa Emmons ◽

...

Keyword(s):

Computational Cost ◽

Earth System Model ◽

Chemical Mechanism ◽

System Model ◽

Earth System ◽

Chemical Mechanisms ◽

Trade Offs ◽

Community Earth System Model ◽

Time Periods ◽

Fast Mechanism

Abstract. While state-of-the-art complex chemical mechanisms expand our understanding of atmospheric chemistry, their sheer size and computational requirements often limit simulations to short lengths or ensembles to only a few members. Here we present and compare three 25-year present-day offline simulations with chemical mechanisms of different levels of complexity using the Community Earth System Model (CESM) Version 1.2 CAM-chem (CAM4): the Model for Ozone and Related Chemical Tracers, version 4 (MOZART-4) mechanism, the Reduced Hydrocarbon mechanism, and the Super-Fast mechanism. We show that, for most regions and time periods, differences in simulated ozone chemistry between these three mechanisms are smaller than the model–observation differences themselves. The MOZART-4 mechanism and the Reduced Hydrocarbon are in close agreement in their representation of ozone throughout the troposphere during all time periods (annual, seasonal, and diurnal). While the Super-Fast mechanism tends to have higher simulated ozone variability and differs from the MOZART-4 mechanism over regions of high biogenic emissions, it is surprisingly capable of simulating ozone adequately given its simplicity. We explore the trade-offs between chemical mechanism complexity and computational cost by identifying regions where the simpler mechanisms are comparable to the MOZART-4 mechanism and regions where they are not. The Super-Fast mechanism is 3 times as fast as the MOZART-4 mechanism, which allows for longer simulations or ensembles with more members that may not be feasible with the MOZART-4 mechanism given limited computational resources.

Download Full-text

Supplementary material to "Network design for quantifying urban CO<sub>2</sub> emissions: Assessing trade-offs between precision and network density"

10.5194/acp-2016-355-supplement ◽

2016 ◽

Author(s):

Alexander J. Turner ◽

Alexis A. Shusterman ◽

Brian C. McDonald ◽

Virginia Teige ◽

Robert A Harley ◽

...

Keyword(s):

Network Design ◽

Network Density ◽

Trade Offs ◽

Supplementary Material

Download Full-text

Evaluate trade-offs between I/sub sp/ and lifetime for a specified fuel elements state-of-the-art

10.2172/4237816 ◽

1971 ◽

Author(s):

Keyword(s):

State Of The Art ◽

Trade Offs ◽

Fuel Elements

Download Full-text

Solving Delete Free Planning with Relaxed Decision Diagram Based Heuristics

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11659 ◽

2020 ◽

Vol 67 ◽

pp. 607-651

Author(s):

Margarita Paz Castro ◽

Chiara Piacentini ◽

Andre Augusto Cire ◽

J. Christopher Beck

Keyword(s):

Linear Programming ◽

Empirical Analysis ◽

State Of The Art ◽

The State ◽

Decision Diagrams ◽

Binary Decision ◽

Admissible Heuristics ◽

The Cost ◽

Construction Algorithms ◽

Diagram Representation

We investigate the use of relaxed decision diagrams (DDs) for computing admissible heuristics for the cost-optimal delete-free planning (DFP) problem. Our main contributions are the introduction of two novel DD encodings for a DFP task: a multivalued decision diagram that includes the sequencing aspect of the problem and a binary decision diagram representation of its sequential relaxation. We present construction algorithms for each DD that leverage these different perspectives of the DFP task and provide theoretical and empirical analyses of the associated heuristics. We further show that relaxed DDs can be used beyond heuristic computation to extract delete-free plans, find action landmarks, and identify redundant actions. Our empirical analysis shows that while DD-based heuristics trail the state of the art, even small relaxed DDs are competitive with the linear programming heuristic for the DFP task, thus, revealing novel ways of designing admissible heuristics.

Download Full-text

Learning Large Logic Programs By Going Beyond Entailment

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/287 ◽

2020 ◽

Author(s):

Andrew Cropper ◽

Sebastijan Dumančic

Keyword(s):

Logic Programming ◽

Loss Function ◽

Inductive Logic Programming ◽

State Of The Art ◽

Inductive Logic ◽

Program Synthesis ◽

Loss Functions ◽

Logic Programs ◽

Binary Decision ◽

Best First Search

A major challenge in inductive logic programming (ILP) is learning large programs. We argue that a key limitation of existing systems is that they use entailment to guide the hypothesis search. This approach is limited because entailment is a binary decision: a hypothesis either entails an example or does not, and there is no intermediate position. To address this limitation, we go beyond entailment and use 'example-dependent' loss functions to guide the search, where a hypothesis can partially cover an example. We implement our idea in Brute, a new ILP system which uses best-first search, guided by an example-dependent loss function, to incrementally build programs. Our experiments on three diverse program synthesis domains (robot planning, string transformations, and ASCII art), show that Brute can substantially outperform existing ILP systems, both in terms of predictive accuracies and learning times, and can learn programs 20 times larger than state-of-the-art systems.

Download Full-text

Asymmetric Distribution Measure for Few-shot Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/409 ◽

2020 ◽

Author(s):

Wenbin Li ◽

Lei Wang ◽

Jing Huo ◽

Yinghuan Shi ◽

Yang Gao ◽

...

Keyword(s):

State Of The Art ◽

Asymmetric Distribution ◽

Query Image ◽

Local Descriptor ◽

Innovative Design ◽

Feature Representations ◽

The Core ◽

Measure Function ◽

Asymmetric Relation ◽

Core Idea

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https://github.com/WenbinLee/ADM.git.

Download Full-text

Coloring Graph Neural Networks for Node Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/294 ◽

2020 ◽

Author(s):

George Dasoulas ◽

Ludovic Dos Santos ◽

Kevin Scaman ◽

Aladin Virmaux

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Structural Characteristics ◽

Expressive Power ◽

Continuous Functions ◽

Graph Classification ◽

Node Attributes ◽

Graph Neural Networks ◽

Coloring Graph

In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks (MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability, a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish, while being state-of-the-art on benchmark graph classification datasets.

Download Full-text