FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling

AbstractConvolutional neural networks (CNNs) exhibit state-of-the-art performance while performing computer-vision tasks. CNNs require high-speed, low-power, and high-accuracy hardware for various scenarios, such as edge environments. However, the number of weights is so large that embedded systems cannot store them owing to their limited on-chip memory. A different method is used to minimize the input image size, for real-time processing, but it causes a considerable drop in accuracy. Although pruned sparse CNNs and special accelerators are proposed, the requirement of random access incurs a large number of wide multiplexers for a high degree of parallelism, which becomes more complicated and unsuitable for FPGA implementation. To address this problem, we propose filter-wise pruning with distillation and block RAM (BRAM)-based zero-weight skipping accelerator. It eliminates weights such that each filter has the same number of nonzero weights, performing retraining with distillation, while retaining comparable accuracy. Further, filter-wise pruning enables our accelerator to exploit inter-filter parallelism, where a processing block for a layer executes filters concurrently, with a straightforward architecture. We also propose an overlapped tiling algorithm, where tiles are extracted with overlap to prevent both accuracy degradation and high utilization of BRAMs storing high-resolution images. Our evaluation using semantic-segmentation tasks showed a 1.8 times speedup and 18.0 times increase in power efficiency of our FPGA design compared with a desktop GPU. Additionally, compared with the conventional FPGA implementation, the speedup and accuracy improvement were 1.09 times and 6.6 points, respectively. Therefore, our approach is useful for FPGA implementation and exhibits considerable accuracy for applications in embedded systems.

Download Full-text

Concrete Cracks Detection Based on FCN with Dilated Convolution

Applied Sciences ◽

10.3390/app9132686 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2686 ◽

Cited By ~ 15

Author(s):

Jianming Zhang ◽

Chaoquan Lu ◽

Jin Wang ◽

Lei Wang ◽

Xiao-Guang Yue

Keyword(s):

Crack Detection ◽

Receptive Fields ◽

Semantic Segmentation ◽

Concrete Surface ◽

Input Image ◽

Feature Maps ◽

Test Set ◽

Dilated Convolution ◽

Fully Convolutional Networks ◽

Segmentation Task

In civil engineering, the stability of concrete is of great significance to safety of people’s life and property, so it is necessary to detect concrete damage effectively. In this paper, we treat crack detection on concrete surface as a semantic segmentation task that distinguishes background from crack at the pixel level. Inspired by Fully Convolutional Networks (FCN), we propose a full convolution network based on dilated convolution for concrete crack detection, which consists of an encoder and a decoder. Specifically, we first used the residual network to extract the feature maps of the input image, designed the dilated convolutions with different dilation rates to extract the feature maps of different receptive fields, and fused the extracted features from multiple branches. Then, we exploited the stacked deconvolution to do up-sampling operator in the fused feature maps. Finally, we used the SoftMax function to classify the feature maps at the pixel level. In order to verify the validity of the model, we introduced the commonly used evaluation indicators of semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU). The experimental results show that the proposed model converges faster and has better generalization performance on the test set by introducing dilated convolutions with different dilation rates and a multi-branch fusion strategy. Our model has a PA of 96.84%, MPA of 92.55%, MIoU of 86.05% and FWIoU of 94.22% on the test set, which is superior to other models.

Download Full-text

The Integrated Display System in Aircraft Cockpit Based on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.651-653.911 ◽

2014 ◽

Vol 651-653 ◽

pp. 911-915

Author(s):

Jing Gao ◽

Yin Liang Jia ◽

Bing Yang Li

Keyword(s):

Real Time ◽

High Speed ◽

Random Access ◽

Display System ◽

Logic Device ◽

Access Memory ◽

Real Time Processing ◽

Main Research ◽

Ping Pong ◽

Integrated Display

The main research object is the graphics generation and display system based on FPGA, the system is mainly used for the integrated display of aircraft cockpit. The display system has the characteristics of large amount of data, real-time processing in the graphics generation. According to the characters, the paper uses programmable logic device due to FPGA has the advantages of high speed, real time. In order to further improve the efficiency of the system, the paper also designs the ping-pong operation of double SSRAM(Synchronous Static Random Access Memory) at the same time. Through the experiment, the system can run well and achieve the desired objectives.

Download Full-text

Single Ended Static Random Access Memory for Low-Vdd, High-Speed Embedded Systems

2009 22nd International Conference on VLSI Design ◽

10.1109/vlsi.design.2009.38 ◽

2009 ◽

Cited By ~ 3

Author(s):

Jawar Singh ◽

Jimson Mathew ◽

Saraju P. Mohanty ◽

Dhiraj K. Pradhan

Keyword(s):

Embedded Systems ◽

High Speed ◽

Random Access ◽

Random Access Memory ◽

Static Random Access Memory ◽

Access Memory

Download Full-text

Simultaneous pixel-level concrete defect detection and grouping using a fully convolutional model

Structural Health Monitoring ◽

10.1177/1475921720985437 ◽

2021 ◽

pp. 147592172098543

Author(s):

Chaobo Zhang ◽

Chih-chen Chang ◽

Maziar Jamshidi

Keyword(s):

Deep Learning ◽

Defect Detection ◽

Visual Inspection ◽

Surface Defects ◽

Semantic Segmentation ◽

Input Image ◽

Image Size ◽

Bounding Box ◽

Proposed Model ◽

Image Pixels

Deep learning techniques have attracted significant attention in the field of visual inspection of civil infrastructure systems recently. Currently, most deep learning-based visual inspection techniques utilize a convolutional neural network to recognize surface defects either by detecting a bounding box of each defect or classifying all pixels on an image without distinguishing between different defect instances. These outputs cannot be directly used for acquiring the geometric properties of each individual defect in an image, thus hindering the development of fully automated structural assessment techniques. In this study, a novel fully convolutional model is proposed for simultaneously detecting and grouping the image pixels for each individual defect on an image. The proposed model integrates an optimized mask subnet with a box-level detection network, where the former outputs a set of position-sensitive score maps for pixel-level defect detection and the latter predicts a bounding box for each defect to group the detected pixels. An image dataset containing three common types of concrete defects, crack, spalling and exposed rebar, is used for training and testing of the model. Results demonstrate that the proposed model is robust to various defect sizes and shapes and can achieve a mask-level mean average precision ( mAP) of 82.4% and a mean intersection over union ( mIoU) of 75.5%, with a processing speed of about 10 FPS at input image size of 576 × 576 when tested on an NVIDIA GeForce GTX 1060 GPU. Its performance is compared with the state-of-the-art instance segmentation network Mask R-CNN and the semantic segmentation network U-Net. The comparative studies show that the proposed model has a distinct defect boundary delineation capability and outperforms the Mask R-CNN and the U-Net in both accuracy and speed.

Download Full-text

Camera Assisted Roadside Monitoring for Invasive Alien Plant Species Using Deep Learning

Sensors ◽

10.3390/s21186126 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6126

Author(s):

Mads Dyrmann ◽

Anders Krogh Mortensen ◽

Lars Linneberg ◽

Toke Thomas Høye ◽

Kim Bjerge

Keyword(s):

Plant Species ◽

High Speed ◽

Input Image ◽

Image Size ◽

Alien Plant Species ◽

Deep Convolutional Neural Networks ◽

Alien Plant ◽

Lupinus Polyphyllus ◽

Invasive Alien Plant

Invasive alien plant species (IAPS) pose a threat to biodiversity as they propagate and outcompete natural vegetation. In this study, a system for monitoring IAPS on the roadside is presented. The system consists of a camera that acquires images at high speed mounted on a vehicle that follows the traffic. Images of seven IAPS (Cytisus scoparius, Heracleum, Lupinus polyphyllus, Pastinaca sativa, Reynoutria, Rosa rugosa, and Solidago) were collected on Danish motorways. Three deep convolutional neural networks for classification (ResNet50V2 and MobileNetV2) and object detection (YOLOv3) were trained and evaluated at different image sizes. The results showed that the performance of the networks varied with the input image size and also the size of the IAPS in the images. Binary classification of IAPS vs. non-IAPS showed an increased performance, compared to the classification of individual IAPS. This study shows that automatic detection and mapping of invasive plants along the roadside is possible at high speeds.

Download Full-text

High speed image processing based on Java processor for embedded systems

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02873 ◽

2010 ◽

Vol 30 (11) ◽

pp. 2873-2875 ◽

Cited By ~ 1

Author(s):

Ming-kai ZHU ◽

Zhen-hua GAO ◽

Zhi-lei CHAI

Keyword(s):

Image Processing ◽

Embedded Systems ◽

High Speed ◽

High Speed Image Processing ◽

Java Processor

Download Full-text

Algorithm for optimal allocation of limited resources based on the game iteration method

Informacionno-technologicheskij vestnik ◽

10.21499/2409-1650-2019-2-89-99 ◽

2019 ◽

pp. 89-99

Author(s):

V. Ya. Vilisov

Keyword(s):

Optimal Control ◽

Linear Programming ◽

Embedded Systems ◽

Iterative Method ◽

High Speed ◽

Optimal Allocation ◽

Software Implementation ◽

Limited Resources ◽

Matrix Game ◽

Acceptable Accuracy

The article proposes an algorithm for solving a linear programming problem (LPP) based on the use of its representation in the form of an antagonistic matrix game and the subsequent solution of the game by an iterative method. The algorithm is implemented as a computer program. The rate of convergence of the estimates of the solution to the actual value with the required accuracy has been studied. The software implementation shows a high speed of obtaining the LPP solution with acceptable accuracy in fractions or units of seconds. This allows the use algorithm in embedded systems for optimal control.

Download Full-text

Automatic Deep Learning Semantic Segmentation of Ultrasound Thyroid Cineclips using Recurrent Fully Convolutional Networks

IEEE Access ◽

10.1109/access.2020.3045906 ◽

2020 ◽

pp. 1-1

Author(s):

Jeremy M. Webb ◽

Duane D. Meixner ◽

Shaheeda A. Adusei ◽

Eric C. Polley ◽

Mostafa Fatemi ◽

...

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Convolutional Networks ◽

Fully Convolutional Networks

Download Full-text

A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It

Entropy ◽

10.3390/e23080991 ◽

2021 ◽

Vol 23 (8) ◽

pp. 991

Author(s):

Yuta Nakahara ◽

Toshiyasu Matsushima

Keyword(s):

Computational Cost ◽

Block Size ◽

Input Image ◽

Generative Model ◽

Image Size ◽

Variable Block ◽

General Data ◽

The Difference ◽

Segmentation Of Images ◽

Target Data

In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.

Download Full-text