scholarly journals Efficient automatic scheduling of imaging and vision pipelines for the GPU

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-28
Author(s):  
Luke Anderson ◽  
Andrew Adams ◽  
Karima Ma ◽  
Tzu-Mao Li ◽  
Tian Jin ◽  
...  

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of memory and parallelism on GPU architectures in reasonable compile time. We achieve this using (1) a two-phase search algorithm that first ‘freezes’ decisions for the lowest cost sections of a program, allowing relatively more time to be spent on the important stages, (2) a hierarchical sampling strategy that groups schedules based on their structural similarity, then samples representatives to be evaluated, allowing us to explore a large space with few samples, and (3) memoization of repeated partial schedules, amortizing their cost over all their occurrences. We guide the process with an efficient cost model combining machine learning, program analysis, and GPU architecture knowledge. We evaluate our method’s performance on a diverse suite of real-world imaging and vision pipelines. Our scalability optimizations lead to average compile time speedups of 49x (up to 530x). We find schedules that are on average 1.7x faster than existing automatic solutions (up to 5x), and competitive with what the best human experts were able to achieve in an active effort to beat our automatic results.


2011 ◽  
Vol 181-182 ◽  
pp. 623-628
Author(s):  
Dan Zhang ◽  
Rong Cai Zhao ◽  
Lin Han ◽  
Jin Qu

Using FPGA for general-purpose computing has become an important research direction in high performance computing technology. However, it is not a lossless optimization method. Due to the impact of hardware reconfiguration overhead, data transmission cost, specific characteristics of programs, and other factors, the speedup of general-purpose computing on FPGA has visible difference. On the basis of in-depth analysis of FPGA architecture and development process, the main factors affecting FPGA implementation performance are pointed out, and a parallel cost model for FPGA based on static program analysis is proposed to provide judgment basis for using FPGA in general-purpose computing. The experiment results show that the algorithm estimates accurately FPGA execution performance.



2019 ◽  
Vol 33 (7) ◽  
pp. 2335-2356 ◽  
Author(s):  
Sarita Gajbhiye Meshram ◽  
M. A. Ghorbani ◽  
Ravinesh C. Deo ◽  
Mahsa Hasanpour Kashani ◽  
Chandrashekhar Meshram ◽  
...  


2021 ◽  
Vol 11 (3) ◽  
pp. 1286 ◽  
Author(s):  
Mohammad Dehghani ◽  
Zeinab Montazeri ◽  
Ali Dehghani ◽  
Om P. Malik ◽  
Ruben Morales-Menendez ◽  
...  

One of the most powerful tools for solving optimization problems is optimization algorithms (inspired by nature) based on populations. These algorithms provide a solution to a problem by randomly searching in the search space. The design’s central idea is derived from various natural phenomena, the behavior and living conditions of living organisms, laws of physics, etc. A new population-based optimization algorithm called the Binary Spring Search Algorithm (BSSA) is introduced to solve optimization problems. BSSA is an algorithm based on a simulation of the famous Hooke’s law (physics) for the traditional weights and springs system. In this proposal, the population comprises weights that are connected by unique springs. The mathematical modeling of the proposed algorithm is presented to be used to achieve solutions to optimization problems. The results were thoroughly validated in different unimodal and multimodal functions; additionally, the BSSA was compared with high-performance algorithms: binary grasshopper optimization algorithm, binary dragonfly algorithm, binary bat algorithm, binary gravitational search algorithm, binary particle swarm optimization, and binary genetic algorithm. The results show the superiority of the BSSA. The results of the Friedman test corroborate that the BSSA is more competitive.





2008 ◽  
Author(s):  
Dalibor Jajcevic ◽  
Raimund A. Almbauer ◽  
Stephan P. Schmidt ◽  
Karl Glinsner


1994 ◽  
Vol 6 (3) ◽  
pp. 225-235 ◽  
Author(s):  
Shinji Sakurai ◽  
Bruce Elliott ◽  
J. Robert Grove

Three-dimensional (3-D) high speed photography was used to record the overarm throwing actions of five open-age, four 18-year-old, six 16-year- old, and six 14-year-old high-performance baseball catchers. The direct linear transformation method was used for 3-D space reconstruction from 2-D images of the catchers throwing from home plate to second base recorded using two phase-locked cameras operating at a nominal rate of 200 Hz. Selected physical capacity measures were also recorded and correlated with ball release speed. In general, anthropometric and strength measures significantly increased through the 14-year-old to open-age classifications, while a range of correlation coefficients from .50 to .84 was recorded between these physical capacities and ball speed at release. While many aspects of the kinematic data at release were similar, the key factors of release angle and release speed varied for the different age groups.



Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 596
Author(s):  
Marco Buzzelli ◽  
Luca Segantin

We address the task of classifying car images at multiple levels of detail, ranging from the top-level car type, down to the specific car make, model, and year. We analyze existing datasets for car classification, and identify the CompCars as an excellent starting point for our task. We show that convolutional neural networks achieve an accuracy above 90% on the finest-level classification task. This high performance, however, is scarcely representative of real-world situations, as it is evaluated on a biased training/test split. In this work, we revisit the CompCars dataset by first defining a new training/test split, which better represents real-world scenarios by setting a more realistic baseline at 61% accuracy on the new test set. We also propagate the existing (but limited) type-level annotation to the entire dataset, and we finally provide a car-tight bounding box for each image, automatically defined through an ad hoc car detector. To evaluate this revisited dataset, we design and implement three different approaches to car classification, two of which exploit the hierarchical nature of car annotations. Our experiments show that higher-level classification in terms of car type positively impacts classification at a finer grain, now reaching 70% accuracy. The achieved performance constitutes a baseline benchmark for future research, and our enriched set of annotations is made available for public download.



Author(s):  
Fanny Pinto Delgado ◽  
Ziyou Song ◽  
Heath F. Hofmann ◽  
Jing Sun

Abstract Permanent Magnet Synchronous Machines (PMSMs) have been preferred for high-performance applications due to their high torque density, high power density, high control accuracy, and high efficiency over a wide operating range. During operation, monitoring the PMSM’s health condition is crucial for detecting any anomalies so that performance degradation, maintenance/downtime costs, and safety hazards can be avoided. In particular, demagnetization of PMSMs can lead to not only degraded performance but also high maintenance cost as they are the most expensive components in a PMSM. In this paper, an equivalent two-phase model for surface-mount permanent magnet (SMPM) machines under permanent magnet demagnetization is formulated and a parameter estimator is proposed for condition monitoring purposes. The performance of the proposed estimator is investigated through analysis and simulation under different conditions, and compared with a parameter estimator based on the standard SMPM machine model. In terms of information that can be extracted for fault diagnosis and condition monitoring, the proposed estimator exhibits advantages over the standard-model-based estimator as it can differentiate between uniform demagnetization over all poles and asymmetric demagnetization between north and south poles.



Sign in / Sign up

Export Citation Format

Share Document