Verified tensor-program optimization via high-level scheduling rewrites

We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using series of verified, semantics-preserving rewrites. Unusually for compilation targeting imperative code with arrays and nested loops, all rewrites are source-to-source within a purely functional language. Our language comprises a set of core constructs for expressing high-level computation detail and a set of what we call reshape operators, which can be derived from core constructs but trigger low-level decisions about storage patterns and ordering. We demonstrate that not only is this system capable of deriving the optimizations of existing state-of-the-art languages like Halide and generating comparably performant code, it is also able to schedule a family of useful program transformations beyond what is reachable in Halide.

Download Full-text

Saliency Detection by Multilevel Deep Pyramid Model

Journal of Sensors ◽

10.1155/2018/8249180 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Hai Wang ◽

Lei Dai ◽

Yingfeng Cai ◽

Long Chen ◽

Yong Zhang

Keyword(s):

Background Noise ◽

State Of The Art ◽

Saliency Detection ◽

Saliency Map ◽

Multiple Features ◽

Low Level ◽

Pyramid Model ◽

High Level ◽

Different Levels ◽

Better Than

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.

Download Full-text

Static Profiling of Assembly Code Performance and Optimization Effectiveness using Instructions Performed and Program Latency

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1083.0882s819 ◽

2019 ◽

Vol 8 (2S8) ◽

pp. 1463-1468

Keyword(s):

Assembly Language ◽

Optimization Method ◽

Program Optimization ◽

Low Level ◽

Code Performance ◽

Execution Speed ◽

The Difference ◽

And Performance ◽

High Level ◽

Optimization And Performance

Software program optimization for improved execution speed can be achieved through modifying the program. Programs are usually written in high level languages then translated into low level assembly language. More coverage of optimization and performance analysis can be performed on low level than high level language. Optimization improvement is measured in the difference in program execution performance. Several methods are available for measuring program performance are classified into static approaches and dynamic approaches. This paper presents an alternative method of more accurately measuring code performance statically than commonly used code analysis metrics. New metrics proposed are designed to expose effectiveness of optimization performed on code, specifically unroll optimizations. An optimization method, loop unroll is used to demonstrate the effectiveness of the increased accuracy of the proposed metric. The results of the study show that measuring Instructions Performed and Instruction Latency is a more accurate static metric than Instruction Count and subsequently those based on it.

Download Full-text

Backbone Cannot Be Trained at Once: Rolling Back to Pre-Trained Network for Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018859 ◽

2019 ◽

Vol 33 ◽

pp. 8859-8867 ◽

Cited By ~ 4

Author(s):

Youngmin Ro ◽

Jongwon Choi ◽

Dae Ung Jo ◽

Byeongho Heo ◽

Jongin Lim ◽

...

Keyword(s):

Network Architecture ◽

State Of The Art ◽

Fine Tuning ◽

Neural Network Architecture ◽

Large Dataset ◽

Low Level ◽

Tuning Method ◽

Improved Performance ◽

High Level ◽

Tuning Strategy

In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently finetune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any addons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.

Download Full-text

Context-Aware Image Inpainting with Learned Semantic Priors

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/183 ◽

2021 ◽

Author(s):

Wendong Zhang ◽

Junwei Zhu ◽

Ying Tai ◽

Yunbo Wang ◽

Wenqing Chu ◽

...

Keyword(s):

State Of The Art ◽

Contextual Information ◽

Image Inpainting ◽

Context Aware ◽

Global Context ◽

Low Level ◽

Complex Scenes ◽

Knowledge Distillation ◽

Image Generator ◽

High Level

Recent advances in image inpainting have shown impressive results for generating plausible visual details on rather simple backgrounds. However, for complex scenes, it is still challenging to restore reasonable contents as the contextual information within the missing regions tends to be ambiguous. To tackle this problem, we introduce pretext tasks that are semantically meaningful to estimating the missing contents. In particular, we perform knowledge distillation on pretext models and adapt the features to image inpainting. The learned semantic priors ought to be partially invariant between the high-level pretext task and low-level image inpainting, which not only help to understand the global context but also provide structural guidance for the restoration of local textures. Based on the semantic priors, we further propose a context-aware image inpainting model, which adaptively integrates global semantics and local features in a unified image generator. The semantic learner and the image generator are trained in an end-to-end manner. We name the model SPL to highlight its ability to learn and leverage semantic priors. It achieves the state of the art on Places2, CelebA, and Paris StreetView datasets

Download Full-text

Deep Convolutional Neural Network for Pedestrian Detection with Multi-Levels Features Fusion

MATEC Web of Conferences ◽

10.1051/matecconf/201823201061 ◽

2018 ◽

Vol 232 ◽

pp. 01061

Author(s):

Danhua Li ◽

Xiaofeng Di ◽

Xuan Qu ◽

Yunfei Zhao ◽

Honggang Kong

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Pedestrian Detection ◽

Deep Convolutional Neural Network ◽

High Quality ◽

Low Level ◽

Features Fusion ◽

Current State ◽

High Level

Pedestrian detection aims to localize and recognize every pedestrian instance in an image with a bounding box. The current state-of-the-art method is Faster RCNN, which is such a network that uses a region proposal network (RPN) to generate high quality region proposals, while Fast RCNN is used to classifiers extract features into corresponding categories. The contribution of this paper is integrated low-level features and high-level features into a Faster RCNN-based pedestrian detection framework, which efficiently increase the capacity of the feature. Through our experiments, we comprehensively evaluate our framework, on the Caltech pedestrian detection benchmark and our methods achieve state-of-the-art accuracy and present a competitive result on Caltech dataset.

Download Full-text

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/91 ◽

2021 ◽

Author(s):

Zhihao Fan ◽

Zhongyu Wei ◽

Siyuan Wang ◽

Ruize Wang ◽

Zejun Li ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Experimental Results ◽

Text Representation ◽

Image Captioning ◽

Scene Graph ◽

Low Level ◽

Language And Vision ◽

High Level ◽

Cross Language

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.

Download Full-text

A SysML and CLEAN Based Methodology for RISC Processor Micro-Architecture Design

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/ijertcs.2015010105 ◽

2015 ◽

Vol 6 (1) ◽

pp. 101-131

Author(s):

Zakaria Lakhdara ◽

Salah Merniz

Keyword(s):

Code Generation ◽

Architecture Design ◽

Functional Language ◽

Low Level ◽

Design Methodologies ◽

Risc Processor ◽

High Level

Nowadays, processor micro-architectures are becoming more and more complex. Consequently, designers increasingly need powerful abstraction and structuration mechanisms, as well as design methodologies that automatically and formally derive low-level concrete designs from high-level abstract ones. In this context, this paper proposes a methodology for RISC processor micro-architecture design. The proposed methodology uses mainly SysML to model both ISA and MA levels and the functional language CLEAN to describe them. Functional specifications in CLEAN are automatically generated from the ISA and MA models. These specifications, which are executable and formally verifiable, are used for simulation and verification. The proposed approach is validated by a case study that consists of designing the micro-architecture of MIPS processor. It shows how to easily model and generate CLEAN specifications describing the ISA and MA levels. It also illustrates, with multiple cases, how the generated specifications are used to simulate the MA. The results of the simulation phase prove the efficiency of the proposed modeling and code generation techniques.

Download Full-text

Deep ChaosNet for Action Recognition in Videos

Complexity ◽

10.1155/2021/6634156 ◽

2021 ◽

Vol 2021 ◽

pp. 1-5

Author(s):

Huafeng Chen ◽

Maosheng Zhang ◽

Zhengming Gao ◽

Yunhong Zhao

Keyword(s):

Neural Network ◽

Action Recognition ◽

Deep Neural Network ◽

Recognition Accuracy ◽

State Of The Art ◽

Experimental Results ◽

Low Level ◽

Hidden Layer ◽

High Level ◽

Standard Action

Current methods of chaos-based action recognition in videos are limited to the artificial feature causing the low recognition accuracy. In this paper, we improve ChaosNet to the deep neural network and apply it to action recognition. First, we extend ChaosNet to deep ChaosNet for extracting action features. Then, we send the features to the low-level LSTM encoder and high-level LSTM encoder for obtaining low-level coding output and high-level coding results, respectively. The agent is a behavior recognizer for producing recognition results. The manager is a hidden layer, responsible for giving behavioral segmentation targets at the high level. Our experiments are executed on two standard action datasets: UCF101 and HMDB51. The experimental results show that the proposed algorithm outperforms the state of the art.

Download Full-text

Unifying Search-based and Compilation-based Approaches to Multi-agent Path Finding through Satisfiability Modulo Theories

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/164 ◽

2019 ◽

Cited By ~ 6

Author(s):

Pavel Surynek

Keyword(s):

Undirected Graph ◽

State Of The Art ◽

The State ◽

Satisfiability Modulo Theories ◽

Path Finding ◽

Low Level ◽

Multi Agent ◽

High Level ◽

Novel Algorithm

We unify search-based and compilation-based approaches to multi-agent path finding (MAPF) through satisfiability modulo theories (SMT). The task in MAPF is to navigate agents in an undirected graph to given goal vertices so that they do not collide. We rephrase Conflict-Based Search (CBS), one of the state-of-the-art algorithms for optimal MAPF solving, in the terms of SMT. This idea combines SAT-based solving known from MDD-SAT, a SAT-based optimal MAPF solver, at the low-level with conflict elimination of CBS at the high-level. Where the standard CBS branches the search after a conflict, we refine the propositional model with a disjunctive constraint. Our novel algorithm called SMT-CBS hence does not branch at the high-level but incrementally extends the propositional model. We experimentally compare SMT-CBS with CBS, ICBS, and MDD-SAT.

Download Full-text

A Modern Look at GRIN, an Optimizing Functional Language Back End

Acta Cybernetica ◽

10.14232/actacyb.282969 ◽

2021 ◽

Author(s):

Peter Podlovics ◽

Csaba Hruska ◽

Andor Pénzes

Keyword(s):

Program Analysis ◽

Ad Hoc ◽

Functional Languages ◽

Functional Language ◽

Program Optimization ◽

Graph Reduction ◽

Code Transformations ◽

Low Level ◽

Intermediate Language ◽

The Dead

GRIN is short for Graph Reduction Intermediate Notation, a modern back end for lazy functional languages. Most of the currently available compilers for such languages share a common flaw: they can only optimize programs on a per-module basis. The GRIN framework allows for interprocedural whole program analysis, enabling optimizing code transformations across functions and modules as well. Some implementations of GRIN already exist, but most of them were developed only for experimentation purposes. Thus, they either compromise on low-level efficiency or contain ad hoc modifications compared to the original specification. Our goal is to provide a full-fledged implementation of GRIN by combining the currently available best technologies like LLVM, and evaluate the framework's effectiveness by measuring how the optimizer improves the performance of certain programs. We also present some improvements to the already existing components of the framework. Some of these improvements include a typed representation for the intermediate language and an interprocedural program optimization, the dead data elimination.

Download Full-text