scholarly journals Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

2020 ◽  
Vol 34 (04) ◽  
pp. 5700-5708 ◽  
Author(s):  
Jianghao Shen ◽  
Yue Wang ◽  
Pengfei Xu ◽  
Yonggan Fu ◽  
Zhangyang Wang ◽  
...  

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, “softer” decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate “soft” choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two “extremes” (i.e., full bitwidth and zero bitwidth). In this way, DFS can “fractionally” exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at https://github.com/Torment123/DFS.

Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2156
Author(s):  
Juan M. Cebrian ◽  
Baldomero Imbernón ◽  
Jesús Soto ◽  
José M. Cecilia

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 511
Author(s):  
Syed Mohammad Minhaz Hossain ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Takeshi Koshiba

Proper plant leaf disease (PLD) detection is challenging in complex backgrounds and under different capture conditions. For this reason, initially, modified adaptive centroid-based segmentation (ACS) is used to trace the proper region of interest (ROI). Automatic initialization of the number of clusters (K) using modified ACS before recognition increases tracing ROI’s scalability even for symmetrical features in various plants. Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent. However, memory requirements (large-scaled parameters) and the high computational cost of CNN-based PLD models are burning issues for the memory restricted mobile and IoT-based devices. Therefore, after tracing ROIs, three proposed depth-wise separable convolutional PLD (DSCPLD) models, such as segmented modified DSCPLD (S-modified MobileNet), segmented reduced DSCPLD (S-reduced MobileNet), and segmented extended DSCPLD (S-extended MobileNet), are utilized to represent the constructive trade-off among accuracy, model size, and computational latency. Moreover, we have compared our proposed DSCPLD recognition models with state-of-the-art models, such as MobileNet, VGG16, VGG19, and AlexNet. Among segmented-based DSCPLD models, S-modified MobileNet achieves the best accuracy of 99.55% and F1-sore of 97.07%. Besides, we have simulated our DSCPLD models using both full plant leaf images and segmented plant leaf images and conclude that, after using modified ACS, all models increase their accuracy and F1-score. Furthermore, a new plant leaf dataset containing 6580 images of eight plants was used to experiment with several depth-wise separable convolution models.


2021 ◽  
Vol 14 (5) ◽  
pp. 785-798
Author(s):  
Daokun Hu ◽  
Zhiwen Chen ◽  
Jianbing Wu ◽  
Jianhua Sun ◽  
Hao Chen

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.


2018 ◽  
Vol 11 (10) ◽  
pp. 4155-4174 ◽  
Author(s):  
Benjamin Brown-Steiner ◽  
Noelle E. Selin ◽  
Ronald Prinn ◽  
Simone Tilmes ◽  
Louisa Emmons ◽  
...  

Abstract. While state-of-the-art complex chemical mechanisms expand our understanding of atmospheric chemistry, their sheer size and computational requirements often limit simulations to short lengths or ensembles to only a few members. Here we present and compare three 25-year present-day offline simulations with chemical mechanisms of different levels of complexity using the Community Earth System Model (CESM) Version 1.2 CAM-chem (CAM4): the Model for Ozone and Related Chemical Tracers, version 4 (MOZART-4) mechanism, the Reduced Hydrocarbon mechanism, and the Super-Fast mechanism. We show that, for most regions and time periods, differences in simulated ozone chemistry between these three mechanisms are smaller than the model–observation differences themselves. The MOZART-4 mechanism and the Reduced Hydrocarbon are in close agreement in their representation of ozone throughout the troposphere during all time periods (annual, seasonal, and diurnal). While the Super-Fast mechanism tends to have higher simulated ozone variability and differs from the MOZART-4 mechanism over regions of high biogenic emissions, it is surprisingly capable of simulating ozone adequately given its simplicity. We explore the trade-offs between chemical mechanism complexity and computational cost by identifying regions where the simpler mechanisms are comparable to the MOZART-4 mechanism and regions where they are not. The Super-Fast mechanism is 3 times as fast as the MOZART-4 mechanism, which allows for longer simulations or ensembles with more members that may not be feasible with the MOZART-4 mechanism given limited computational resources.


2016 ◽  
Author(s):  
Alexander J. Turner ◽  
Alexis A. Shusterman ◽  
Brian C. McDonald ◽  
Virginia Teige ◽  
Robert A Harley ◽  
...  

2020 ◽  
Vol 67 ◽  
pp. 607-651
Author(s):  
Margarita Paz Castro ◽  
Chiara Piacentini ◽  
Andre Augusto Cire ◽  
J. Christopher Beck

We investigate the use of relaxed decision diagrams (DDs) for computing admissible heuristics for the cost-optimal delete-free planning (DFP) problem. Our main contributions are the introduction of two novel DD encodings for a DFP task: a multivalued decision diagram that includes the sequencing aspect of the problem and a binary decision diagram representation of its sequential relaxation. We present construction algorithms for each DD that leverage these different perspectives of the DFP task and provide theoretical and empirical analyses of the associated heuristics. We further show that relaxed DDs can be used beyond heuristic computation to extract delete-free plans, find action landmarks, and identify redundant actions. Our empirical analysis shows that while DD-based heuristics trail the state of the art, even small relaxed DDs are competitive with the linear programming heuristic for the DFP task, thus, revealing novel ways of designing admissible heuristics.


Author(s):  
Andrew Cropper ◽  
Sebastijan Dumančic

A major challenge in inductive logic programming (ILP) is learning large programs. We argue that a key limitation of existing systems is that they use entailment to guide the hypothesis search. This approach is limited because entailment is a binary decision: a hypothesis either entails an example or does not, and there is no intermediate position. To address this limitation, we go beyond entailment and use 'example-dependent' loss functions to guide the search, where a hypothesis can partially cover an example. We implement our idea in Brute, a new ILP system which uses best-first search, guided by an example-dependent loss function, to incrementally build programs. Our experiments on three diverse program synthesis domains (robot planning, string transformations, and ASCII art), show that Brute can substantially outperform existing ILP systems, both in terms of predictive accuracies and learning times, and can learn programs 20 times larger than state-of-the-art systems.


Author(s):  
Wenbin Li ◽  
Lei Wang ◽  
Jing Huo ◽  
Yinghuan Shi ◽  
Yang Gao ◽  
...  

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https://github.com/WenbinLee/ADM.git.


Author(s):  
George Dasoulas ◽  
Ludovic Dos Santos ◽  
Kevin Scaman ◽  
Aladin Virmaux

In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks (MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability, a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish, while being state-of-the-art on benchmark graph classification datasets.


Sign in / Sign up

Export Citation Format

Share Document