Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards

Author(s):  
Abu Asaduzzaman ◽  
Srinivas Jojigiri ◽  
Thushar Sabu ◽  
Sanath Tailam
2020 ◽  
Vol 21 (1) ◽  
pp. 47-56
Author(s):  
K Indragandhi ◽  
Jawahar P K

The recent advent of the embedded devices is equipped with multicore processor as it significantly improves the system performance. In order to utilize all the core in multicore processor in an efficient manner, application programs need to be parallelized. An efficient thread level parallelism (ETLP) scheme is proposed in this paper and uses computationally intensive edge detection algorithm for evaluation. Edge detection is the important process in various real time applications namely vehicle detection in traffic control, medical image processing etc. The main objective of ETLP scheme is to reduce the execution time and increase the CPU core utilization. The performance of ETLP scheme is evaluated with basic edge detection scheme (BEDS) for different image size. The experimental results reveal that the proposed ETLP scheme achieves efficiency of 49% and 72% for the image size 300 x 256 and 1024 x 1024 respectively. Furthermore an ETLP scheme reducing 66% execution time for image size 1024 x 1024 when compared with BEDS.


Template matching forms the basis of many image processing algorithms and hence the computer vision algorithms. There are many existing template matching algorithms like Sum of Absolute Difference (SAD), Normalized SAD (NSAD), Correlation methods (CORR), Normalized CORR(NCORR), Sum of Squared Difference (SSD), and Normalized SSD(NSSD). In general, as image requires more memory space for storage and much time for processing. The above said methods involves much computation. In any processing, efficiency constraints include many factors, especially accuracy of the results and speed of processing. An approach to reduce the execution time is always most appreciated. As a result of this, a novel method of partial NCC (PNCC) template matching technique is proposed in this paper. A block window approach is used to reduce the number of operations and hence to speed up the processing. A comparative study between existing NCC algorithm and the proposed partial NCC, PNCC algorithm is done. It is experimented and results proves that the execution time is reduced by 8 - 47 times approximately based on the various template images for different main images in PNCC. The accuracy of the result obtained is 100%. This proposed algorithm works for various types of images. The experiment is repeated for various sizes of templates and different sizes of main image. Further improvement in the speed of execution can be achieved by implementation of the proposed algorithm using parallel processors. It may find its importance in the real time image processing


2007 ◽  
Vol 17 (01) ◽  
pp. 47-59
Author(s):  
EMMANUEL JEANNOT ◽  
KEITH SEYMOUR ◽  
ASYM YARKHAN ◽  
JACK J. DONGARRA

In this paper we address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve. We use a template based model for the runtime estimation and a client-server communication test for the transfer time estimation. We implement these two mechanisms in GridSolve and test them on a real testbed. Experiments show that they allow for significant improvement in terms of client execution time on various scenarios.


In gift scenario each method has to be compelled to be quick, adept and simple. Fast Fourier transform (FFT) may be a competent algorithmic program to calculate the N purpose Discrete Fourier transform (DFT).It has huge applications in communication systems, signal processing and image processing and instrumentation. However the accomplishment of FFT needs immense range of complicated multiplications, therefore to create this method quick and simple. It’s necessary for a number to be quick and power adept. To influence this problem the mixture of Urdhva Tiryagbhyam associate degreed Karatsuba algorithmic program offers is an adept technique of multiplication [1]. Vedic arithmetic is that the aboriginal system of arithmetic that includes a distinctive technique of calculation supported sixteen Sutras. Using these techniques within the calculation algorithms of the coprocessor can reduce the complexness, execution time, area, power etc. The distinctiveness during this project is Fast Fourier Transform (FFT) style methodology exploitation mixture of Urdhva Tiryagbhyam and Karatsuba algorithmic program based mostly floating point number. By combining these two approaches projected style methodology is time-area-power adept [1] [2]. The code writing is completed in verilog and also the FPGA synthesis on virtex 5 is completed using Xilinx ISE 14.5.


Author(s):  
Moath Alsafasfeh ◽  
Bradely Bazuin ◽  
Ikhlas Abdel-Qader

Real-time inspections for the large-scale solar system may take a long time to get the hazard situations for any failures that may take place in the solar panels normal operations, where prior hazards detection is important. Reducing the execution time and improving the system’s performance are the ultimate goals of multiprocessing or multicore systems. Real-time video processing and analysis from two camcorders, thermal and charge-coupling devices (CCD), mounted on a drone compose the embedded system being proposed for solar panels inspection. The inspection method needs more time for capturing and processing the frames and detecting the faulty panels. The system can determine the longitude and latitude of the defect position information in real-time. In this work, we investigate parallel processing for the image processing operations which reduces the processing time for the inspection systems. The results show a super-linear speedup for real-time condition monitoring in large-scale solar systems. Using the multiprocessing module in Python, we execute fault detection algorithms using streamed frames from both video cameras. The experimental results show a super-linear speedup for thermal and CCD video processing, the execution time is efficiently reduced with an average of 3.1 times and 6.3 times using 2 processes and 4 processes respectively.


Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 266 ◽  
Author(s):  
Jaewon Sa ◽  
Younchang Choi ◽  
Hanhaesol Lee ◽  
Yongwha Chung ◽  
Daihee Park ◽  
...  

The fast detection of pigs is a crucial aspect for a surveillance environment intended for the ultimate purpose of the 24 h tracking of individual pigs. Particularly, in a realistic pig farm environment, one should consider various illumination conditions such as sunlight, but such consideration has not been reported yet. We propose a fast method to detect pigs under various illumination conditions by exploiting the complementary information from depth and infrared images. By applying spatiotemporal interpolation, we first remove the noises caused by sunlight. Then, we carefully analyze the characteristics of both the depth and infrared information and detect pigs using only simple image processing techniques. Rather than exploiting highly time-consuming techniques, such as frequency-, optimization-, or deep learning-based detections, our image processing-based method can guarantee a fast execution time for the final goal, i.e., intelligent pig monitoring applications. In the experimental results, pigs could be detected effectively through the proposed method for both accuracy (i.e., 0.79) and execution time (i.e., 8.71 ms), even with various illumination conditions.


2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Sa'ed Abed

Digital image processing is known as computer manipulation of image, which includes algorithms like image enhancement and target reorganization. Some of these algorithms involve operations like convolution and edge detection, which requires high computation. Generally, the software running on processor performs these manipulations. To achieve higher computation performance in terms of execution time, these algorithms are implemented on reconfigurable hardware like FPGA. One can implement parallel architecture and pipelined architecture on FPGA to gain speed up.  In this work, we provide a detailed description of implementing edge detection algorithm on SGI–RC100 platform. The algorithm is implemented using ANSI-C to manipulate the host program and Mitrion–C language. Mitrion–C offers efficient way to write code for parallel and pipelined architecture to preform edge detection. Then, the algorithm is tested on Intel Intanium 2 based architecture and compared its execution time with RC 100 platform based algorithm to check the speed up gain by FPGA based algorithm. The experimental results showed that the speed of the reconfigurable hardware FPGA based algorithm outperformed the software-based approach by more than 50 times.


Sign in / Sign up

Export Citation Format

Share Document