Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

This chapter introduces some of the fundamental concepts of numerical network calculations. The chapter starts with a discussion of basic concepts of computational complexity and data structures for storing network data, then progresses to the description and analysis of algorithms for a range of network calculations: breadth-first search and its use for calculating shortest paths, shortest distances, components, closeness, and betweenness; Dijkstra's algorithm for shortest paths and distances on weighted networks; and the augmenting path algorithm for calculating maximum flows, minimum cut sets, and independent paths in networks.

Download Full-text

Motion-estimation/motion-compensation hardware architecture for a scene-adaptive algorithm on a single-chip MPEG-2 MP@ML video encoder

10.1117/12.334739 ◽

1998 ◽

Author(s):

Koyo Nitta ◽

Toshihiro Minami ◽

Toshio Kondo ◽

Takeshi Ogura

Keyword(s):

Motion Estimation ◽

Motion Compensation ◽

Adaptive Algorithm ◽

Hardware Architecture ◽

Single Chip ◽

Video Encoder

Download Full-text

4D-DCT Hardware Architecture for JPEG Pleno Light Field Coding

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip49819.2020.9301781 ◽

2020 ◽

Author(s):

Matheus Jahnke ◽

Jones Goebel ◽

Daniel Palomino ◽

Guilherme Correa ◽

Luciano Agostini ◽

...

Keyword(s):

Light Field ◽

Hardware Architecture

Download Full-text

Real-time FPGA-based implementation of the AKAZE algorithm with nonlinear scale space generation using image partitioning

Journal of Real-Time Image Processing ◽

10.1007/s11554-021-01089-9 ◽

2021 ◽

Author(s):

Parastoo Soleimani ◽

David W. Capson ◽

Kin Fun Li

Keyword(s):

Real Time ◽

Memory Management ◽

Scale Space ◽

Image Resolution ◽

Hardware Architecture ◽

Frame Rate ◽

Time Sharing ◽

Scale Invariant ◽

Nonlinear Scale Space ◽

Nonlinear Scale

AbstractThe first step in a scale invariant image matching system is scale space generation. Nonlinear scale space generation algorithms such as AKAZE, reduce noise and distortion in different scales while retaining the borders and key-points of the image. An FPGA-based hardware architecture for AKAZE nonlinear scale space generation is proposed to speed up this algorithm for real-time applications. The three contributions of this work are (1) mapping the two passes of the AKAZE algorithm onto a hardware architecture that realizes parallel processing of multiple sections, (2) multi-scale line buffers which can be used for different scales, and (3) a time-sharing mechanism in the memory management unit to process multiple sections of the image in parallel. We propose a time-sharing mechanism for memory management to prevent artifacts as a result of separating the process of image partitioning. We also use approximations in the algorithm to make hardware implementation more efficient while maintaining the repeatability of the detection. A frame rate of 304 frames per second for a $$1280 \times 768$$ 1280 × 768 image resolution is achieved which is favorably faster in comparison with other work.

Download Full-text

A Fast and Configurable Pattern Matching Hardware Architecture for Intrusion Detection

2009 Second International Workshop on Knowledge Discovery and Data Mining ◽

10.1109/wkdd.2009.111 ◽

2009 ◽

Cited By ~ 2

Author(s):

Yizhen Liu ◽

Daxiong Xu ◽

Dong Liu ◽

Lingge Sun

Keyword(s):

Intrusion Detection ◽

Pattern Matching ◽

Hardware Architecture

Download Full-text

Tiered Sampling

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441299 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-52

Author(s):

Lorenzo De Stefani ◽

Erisa Terolli ◽

Eli Upfal

Keyword(s):

Large Scale ◽

Analysis Of Algorithms ◽

Base Layer ◽

Single Edge ◽

Real World Data ◽

High Quality ◽

Large Graphs ◽

Massive Graphs ◽

Variance Estimate ◽

Low Probability

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Download Full-text