Studying Execution Time and Memory Transfer Time of Image Processing Using GPU Cards

The recent advent of the embedded devices is equipped with multicore processor as it significantly improves the system performance. In order to utilize all the core in multicore processor in an efficient manner, application programs need to be parallelized. An efficient thread level parallelism (ETLP) scheme is proposed in this paper and uses computationally intensive edge detection algorithm for evaluation. Edge detection is the important process in various real time applications namely vehicle detection in traffic control, medical image processing etc. The main objective of ETLP scheme is to reduce the execution time and increase the CPU core utilization. The performance of ETLP scheme is evaluated with basic edge detection scheme (BEDS) for different image size. The experimental results reveal that the proposed ETLP scheme achieves efficiency of 49% and 72% for the image size 300 x 256 and 1024 x 1024 respectively. Furthermore an ETLP scheme reducing 66% execution time for image size 1024 x 1024 when compared with BEDS.

Download Full-text

An Efficient and Fast Partial Template Matching Technique – Enhancement in Normalized Cross Correlation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8362.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 7232-7237

Keyword(s):

Image Processing ◽

Execution Time ◽

Template Matching ◽

Absolute Difference ◽

Processing Efficiency ◽

Memory Space ◽

Matching Technique ◽

Speed Up ◽

Novel Method ◽

Window Approach

Template matching forms the basis of many image processing algorithms and hence the computer vision algorithms. There are many existing template matching algorithms like Sum of Absolute Difference (SAD), Normalized SAD (NSAD), Correlation methods (CORR), Normalized CORR(NCORR), Sum of Squared Difference (SSD), and Normalized SSD(NSSD). In general, as image requires more memory space for storage and much time for processing. The above said methods involves much computation. In any processing, efficiency constraints include many factors, especially accuracy of the results and speed of processing. An approach to reduce the execution time is always most appreciated. As a result of this, a novel method of partial NCC (PNCC) template matching technique is proposed in this paper. A block window approach is used to reduce the number of operations and hence to speed up the processing. A comparative study between existing NCC algorithm and the proposed partial NCC, PNCC algorithm is done. It is experimented and results proves that the execution time is reduced by 8 - 47 times approximately based on the various template images for different main images in PNCC. The accuracy of the result obtained is 100%. This proposed algorithm works for various types of images. The experiment is repeated for various sizes of templates and different sizes of main image. Further improvement in the speed of execution can be achieved by implementation of the proposed algorithm using parallel processors. It may find its importance in the real time image processing

Download Full-text

A Design of Programmable Fragment Shader with Reduction of Memory Transfer Time

The Journal of the Korean Institute of Information and Communication Engineering ◽

10.6109/jkiice.2010.14.12.2675 ◽

2010 ◽

Vol 14 (12) ◽

pp. 2675-2680

Author(s):

Tae-Ryoung Park

Keyword(s):

Transfer Time ◽

Memory Transfer

Download Full-text

IMPROVED RUNTIME AND TRANSFER TIME PREDICTION MECHANISMS IN A NETWORK ENABLED SERVERS MIDDLEWARE

Parallel Processing Letters ◽

10.1142/s0129626407002867 ◽

2007 ◽

Vol 17 (01) ◽

pp. 47-59

Author(s):

EMMANUEL JEANNOT ◽

KEITH SEYMOUR ◽

ASYM YARKHAN ◽

JACK J. DONGARRA

Keyword(s):

Execution Time ◽

Time Estimation ◽

Transfer Time ◽

Client Server ◽

Time Prediction ◽

Communication Time ◽

Client Request ◽

Testbed Experiments ◽

Runtime Estimation

In this paper we address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve. We use a template based model for the runtime estimation and a client-server communication test for the transfer time estimation. We implement these two mechanisms in GridSolve and test them on a real testbed. Experiments show that they allow for significant improvement in terms of client execution time on various scenarios.

Download Full-text

Design of High Speed FFT using Urdhva-Tiryagbhyam Algorithm and Karatsuba Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8783.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1904-1908

Keyword(s):

Image Processing ◽

Signal Processing ◽

Fourier Transform ◽

Fast Fourier Transform ◽

Execution Time ◽

Communication Systems ◽

High Speed ◽

Floating Point ◽

Calculation Algorithms ◽

Floating Point Number

In gift scenario each method has to be compelled to be quick, adept and simple. Fast Fourier transform (FFT) may be a competent algorithmic program to calculate the N purpose Discrete Fourier transform (DFT).It has huge applications in communication systems, signal processing and image processing and instrumentation. However the accomplishment of FFT needs immense range of complicated multiplications, therefore to create this method quick and simple. It’s necessary for a number to be quick and power adept. To influence this problem the mixture of Urdhva Tiryagbhyam associate degreed Karatsuba algorithmic program offers is an adept technique of multiplication [1]. Vedic arithmetic is that the aboriginal system of arithmetic that includes a distinctive technique of calculation supported sixteen Sutras. Using these techniques within the calculation algorithms of the coprocessor can reduce the complexness, execution time, area, power etc. The distinctiveness during this project is Fast Fourier Transform (FFT) style methodology exploitation mixture of Urdhva Tiryagbhyam and Karatsuba algorithmic program based mostly floating point number. By combining these two approaches projected style methodology is time-area-power adept [1] [2]. The code writing is completed in verilog and also the FPGA synthesis on virtex 5 is completed using Xilinx ISE 14.5.

Download Full-text

A novel approach to improve execution time performance of Medical image processing

2014 IEEE International Advance Computing Conference (IACC) ◽

10.1109/iadcc.2014.6779413 ◽

2014 ◽

Cited By ~ 1

Author(s):

H. Sarojadevi

Keyword(s):

Image Processing ◽

Execution Time ◽

Medical Image ◽

Medical Image Processing ◽

Time Performance ◽

Novel Approach

Download Full-text

Super-linear speedup for real-time condition monitoring using image processing and drones

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1548-1557 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1548

Author(s):

Moath Alsafasfeh ◽

Bradely Bazuin ◽

Ikhlas Abdel-Qader

Keyword(s):

Image Processing ◽

Real Time ◽

Condition Monitoring ◽

Video Processing ◽

Execution Time ◽

Large Scale ◽

Solar Panels ◽

Position Information ◽

Time Condition ◽

Linear Speedup

Real-time inspections for the large-scale solar system may take a long time to get the hazard situations for any failures that may take place in the solar panels normal operations, where prior hazards detection is important. Reducing the execution time and improving the system’s performance are the ultimate goals of multiprocessing or multicore systems. Real-time video processing and analysis from two camcorders, thermal and charge-coupling devices (CCD), mounted on a drone compose the embedded system being proposed for solar panels inspection. The inspection method needs more time for capturing and processing the frames and detecting the faulty panels. The system can determine the longitude and latitude of the defect position information in real-time. In this work, we investigate parallel processing for the image processing operations which reduces the processing time for the inspection systems. The results show a super-linear speedup for real-time condition monitoring in large-scale solar systems. Using the multiprocessing module in Python, we execute fault detection algorithms using streamed frames from both video cameras. The experimental results show a super-linear speedup for thermal and CCD video processing, the execution time is efficiently reduced with an average of 3.1 times and 6.3 times using 2 processes and 4 processes respectively.

Download Full-text

An extended architecture to optimize execution time of 3D image processing deflectometry algorithm using FPGA

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) ◽

10.1109/icsipa.2017.8120617 ◽

2017 ◽

Author(s):

Faraz Bhatti ◽

Thomas Greiner ◽

Michael Heizmann ◽

Mathias Ziebarth

Keyword(s):

Image Processing ◽

Execution Time ◽

3D Image ◽

3D Image Processing

Download Full-text

Fast Pig Detection with a Top-View Camera under Various Illumination Conditions

Symmetry ◽

10.3390/sym11020266 ◽

2019 ◽

Vol 11 (2) ◽

pp. 266 ◽

Cited By ~ 5

Author(s):

Jaewon Sa ◽

Younchang Choi ◽

Hanhaesol Lee ◽

Yongwha Chung ◽

Daihee Park ◽

...

Keyword(s):

Image Processing ◽

Execution Time ◽

Fast Method ◽

Fast Detection ◽

Spatiotemporal Interpolation ◽

Monitoring Applications ◽

Pig Farm ◽

Frequency Optimization ◽

Processing Techniques ◽

Top View

The fast detection of pigs is a crucial aspect for a surveillance environment intended for the ultimate purpose of the 24 h tracking of individual pigs. Particularly, in a realistic pig farm environment, one should consider various illumination conditions such as sunlight, but such consideration has not been reported yet. We propose a fast method to detect pigs under various illumination conditions by exploiting the complementary information from depth and infrared images. By applying spatiotemporal interpolation, we first remove the noises caused by sunlight. Then, we carefully analyze the characteristics of both the depth and infrared information and detect pigs using only simple image processing techniques. Rather than exploiting highly time-consuming techniques, such as frequency-, optimization-, or deep learning-based detections, our image processing-based method can guarantee a fast execution time for the final goal, i.e., intelligent pig monitoring applications. In the experimental results, pigs could be detected effectively through the proposed method for both accuracy (i.e., 0.79) and execution time (i.e., 8.71 ms), even with various illumination conditions.

Download Full-text

Implementation of Edge Detection Algorithm using FPGA Reconfigurable Hardware

Journal of Engineering Research ◽

10.36909/jer.v8i1.7956 ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Sa'ed Abed

Keyword(s):

Image Processing ◽

Edge Detection ◽

Execution Time ◽

Parallel Architecture ◽

Detection Algorithm ◽

Reconfigurable Hardware ◽

Edge Detection Algorithm ◽

C Language ◽

Pipelined Architecture ◽

Speed Up

Digital image processing is known as computer manipulation of image, which includes algorithms like image enhancement and target reorganization. Some of these algorithms involve operations like convolution and edge detection, which requires high computation. Generally, the software running on processor performs these manipulations. To achieve higher computation performance in terms of execution time, these algorithms are implemented on reconfigurable hardware like FPGA. One can implement parallel architecture and pipelined architecture on FPGA to gain speed up. In this work, we provide a detailed description of implementing edge detection algorithm on SGI–RC100 platform. The algorithm is implemented using ANSI-C to manipulate the host program and Mitrion–C language. Mitrion–C offers efficient way to write code for parallel and pipelined architecture to preform edge detection. Then, the algorithm is tested on Intel Intanium 2 based architecture and compared its execution time with RC 100 platform based algorithm to check the speed up gain by FPGA based algorithm. The experimental results showed that the speed of the reconfigurable hardware FPGA based algorithm outperformed the software-based approach by more than 50 times.

Download Full-text