scholarly journals Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Author(s):  
Tse-Wei Chen ◽  
Wei Tao ◽  
Deyu Wang ◽  
Dongchao Wen ◽  
Kinya Osa ◽  
...  
2016 ◽  
Vol 12 (2) ◽  
pp. 188-197
Author(s):  
A yahoo.com ◽  
Aumalhuda Gani Abood aumalhuda ◽  
A comp ◽  
Dr. Mohammed A. Jodha ◽  
Dr. Majid A. Alwan

2019 ◽  
Author(s):  
Pallavi Saindane ◽  
Gayatri Ganapathy ◽  
Neha Prabhavalkar ◽  
Nilesh Bhatia ◽  
Aishwarya Vaidya

Author(s):  
Mark Newman

This chapter introduces some of the fundamental concepts of numerical network calculations. The chapter starts with a discussion of basic concepts of computational complexity and data structures for storing network data, then progresses to the description and analysis of algorithms for a range of network calculations: breadth-first search and its use for calculating shortest paths, shortest distances, components, closeness, and betweenness; Dijkstra's algorithm for shortest paths and distances on weighted networks; and the augmenting path algorithm for calculating maximum flows, minimum cut sets, and independent paths in networks.


Author(s):  
Matheus Jahnke ◽  
Jones Goebel ◽  
Daniel Palomino ◽  
Guilherme Correa ◽  
Luciano Agostini ◽  
...  

Author(s):  
Parastoo Soleimani ◽  
David W. Capson ◽  
Kin Fun Li

AbstractThe first step in a scale invariant image matching system is scale space generation. Nonlinear scale space generation algorithms such as AKAZE, reduce noise and distortion in different scales while retaining the borders and key-points of the image. An FPGA-based hardware architecture for AKAZE nonlinear scale space generation is proposed to speed up this algorithm for real-time applications. The three contributions of this work are (1) mapping the two passes of the AKAZE algorithm onto a hardware architecture that realizes parallel processing of multiple sections, (2) multi-scale line buffers which can be used for different scales, and (3) a time-sharing mechanism in the memory management unit to process multiple sections of the image in parallel. We propose a time-sharing mechanism for memory management to prevent artifacts as a result of separating the process of image partitioning. We also use approximations in the algorithm to make hardware implementation more efficient while maintaining the repeatability of the detection. A frame rate of 304 frames per second for a $$1280 \times 768$$ 1280 × 768 image resolution is achieved which is favorably faster in comparison with other work.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


Sign in / Sign up

Export Citation Format

Share Document