A novel hardware acceleration technique for high performance parallel FDTD method

Author(s):  
Wenhua Yu ◽  
Xiaoling Yang ◽  
Yongjun Liu ◽  
Raj Mittra
2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Dau-Chyrh Chang ◽  
Lihong Zhang ◽  
Xiaoling Yang ◽  
Shao-Hsiang Yen ◽  
Wenhua Yu

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.


Author(s):  
Kersten Schuster ◽  
Philip Trettner ◽  
Leif Kobbelt

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.


Author(s):  
Vinay Sriram ◽  
David Kearney

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.


2021 ◽  
Vol 2086 (1) ◽  
pp. 012166
Author(s):  
D A Savelyev

Abstract The diffraction of vortex laser beams with circular polarization by ring gratings with the variable height was investigated in this paper. Modelling of near zone diffraction is numerically investigated by the finite difference time domain (FDTD) method. The changes in the length size of the light needle and focal spot size are shown depending on the type of the ring grating.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 449
Author(s):  
Mohammad Amir Mansoori ◽  
Mario R. Casu

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.


2015 ◽  
Author(s):  
Tyler Browning ◽  
Christopher Jackson ◽  
Furkan Cayci ◽  
Gary W. Carhart ◽  
J. J. Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document