Hardware Acceleration of OpenSSL Cryptographic Functions for High-Performance Internet Security

High-Performance Image Filters via Sparse Approximations

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406182 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Kersten Schuster ◽

Philip Trettner ◽

Leif Kobbelt

Keyword(s):

High Performance ◽

Hardware Acceleration ◽

Optimization Method ◽

Translation Invariant ◽

Approximation Quality ◽

Trade Offs ◽

Sparse Approximations ◽

Image Filters ◽

Good Trade ◽

And Performance

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

A novel hardware acceleration technique for high performance parallel FDTD method

2011 IEEE International Symposium on Antennas and Propagation (APSURSI) ◽

10.1109/aps.2011.5997202 ◽

2011 ◽

Author(s):

Wenhua Yu ◽

Xiaoling Yang ◽

Yongjun Liu ◽

Raj Mittra

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Acceleration Technique

Towards A Multi-FPGA Infrared Simulator

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/154851290700400404 ◽

2007 ◽

Vol 4 (4) ◽

pp. 343-355 ◽

Cited By ~ 1

Author(s):

Vinay Sriram ◽

David Kearney

Keyword(s):

Homeland Security ◽

Reconfigurable Computing ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Ccd Camera ◽

Hardware Acceleration ◽

Limiting Factor ◽

Scene Simulation

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Hardware acceleration of lucky-region fusion (LRF) algorithm for high-performance real-time video processing

10.1117/12.2085864 ◽

2015 ◽

Author(s):

Tyler Browning ◽

Christopher Jackson ◽

Furkan Cayci ◽

Gary W. Carhart ◽

J. J. Liu ◽

...

Keyword(s):

Real Time ◽

Video Processing ◽

High Performance ◽

Hardware Acceleration

A novel hardware acceleration technique for high performance parallel FDTD method

2011 IEEE International Conference on Microwave Technology & Computational Electromagnetics ◽

10.1109/icmtce.2011.5915554 ◽

2011 ◽

Author(s):

Wenhua Yu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Acceleration Technique

A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

International Journal of Antennas and Propagation ◽

10.1155/2012/851465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Dau-Chyrh Chang ◽

Lihong Zhang ◽

Xiaoling Yang ◽

Shao-Hsiang Yen ◽

Wenhua Yu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Computer Cluster ◽

Simulation Performance ◽

Acceleration Technique ◽

Multiple Data ◽

Difference Time

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.

Design of Distributed Reconfigurable Robotics Systems with ReconROS

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3494571 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-20

Author(s):

Christian Lienen ◽

Marco Platzner

Keyword(s):

Operating System ◽

Energy Efficiency ◽

High Performance ◽

Hardware Acceleration ◽

Design Flow ◽

Programming Models ◽

Unique Combination ◽

Reconfigurable Computers ◽

Multithreaded Programming ◽

Robot Operating System

Robotics applications process large amounts of data in real time and require compute platforms that provide high performance and energy efficiency. FPGAs are well suited for many of these applications, but there is a reluctance in the robotics community to use hardware acceleration due to increased design complexity and a lack of consistent programming models across the software/hardware boundary. In this article, we present ReconROS , a framework that integrates the widely used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers. This unique combination gives ROS 2 developers the flexibility to transparently accelerate parts of their robotics applications in hardware. We elaborate on the architecture and the design flow for ReconROS and report on a set of experiments that underline the feasibility and flexibility of our approach.

Hardware acceleration prospects and challenges for high performance computing

2009 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2009.5069426 ◽

2009 ◽

Cited By ~ 2

Author(s):

Gregory B. Newby

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hardware Acceleration ◽

Performance Computing

A novel hardware acceleration technique for high performance parallel FDTD method

IEEE iWEM2011 ◽

10.1109/iwem.2011.6021490 ◽

2011 ◽

Author(s):

Wenhua Yu ◽

Xiaoling Yang ◽

Yongjun Liu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Acceleration Technique