scholarly journals An Auto-Programming Approach to Vulkan

Author(s):  
Vladimir Alexandrovich Frolov ◽  
Vadim Sanzharov ◽  
Vladimir Alexandrovich Galaktionov ◽  
Alexandr Scherbakov

We propose a novel high-level approach for software development on GPU using Vulkan API. Our goal is to speed-up development and performance studies for complex algorithms on GPU, which is quite difficult and laborious for Vulkan due to large number of HW features low level details. The proposed approach uses auto programming to translate ordinary C++ to optimized Vulkan implementation with automatic shaders generation, resource binding and fine-grained barriers placement. Our model is not general-purpose programming, but is extendible and customer-focused. For a single C++ input our tool can generate multiple different implementations of algorithm in Vulkan for different cases or types of hardware. For example, we automatically detect reduction in C++ source code and then generate several variants of parallel reduction on GPU: with optimization for different warp size, with or without atomics, using or not subgroup operations. Another example is GPU ray tracing applications for which we can generate different variants: pure software implementation in compute shader, using hardware accelerated ray queries, using full RTX pipeline. The goal of our work is to increase productivity of developers who are forced to use Vulkan due to various required hardware features in their software but still do care about cross-platform ability of the developed software and want to debug their algorithm logic on the CPU. Therefore, we assume that the user will take generated code and integrate it with hand-written Vulkan code.

Author(s):  
Zhuliang Yao ◽  
Shijie Cao ◽  
Wencong Xiao ◽  
Chen Zhang ◽  
Lanshun Nie

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.


2019 ◽  
Author(s):  
J-Donald Tournier ◽  
Robert Smith ◽  
David Raffelt ◽  
Rami Tabbara ◽  
Thijs Dhollander ◽  
...  

AbstractMRtrix3 is an open-source, cross-platform software package for medical image processing, analysis and visualization, with a particular emphasis on the investigation of the brain using diffusion MRI. It is implemented using a fast, modular and flexible general-purpose code framework for image data access and manipulation, enabling efficient development of new applications, whilst retaining high computational performance and a consistent command-line interface between applications. In this article, we provide a high-level overview of the features of the MRtrix3 framework and general-purpose image processing applications provided with the software.


2021 ◽  
pp. 197-210
Author(s):  
John Toner ◽  
Barbara Gail Montero ◽  
Aidan Moran

The final chapter synthesizes the arguments presented over the course of the book by suggesting that skill execution continues to be governed by conscious processes even after performers have attained a high level of expertise. It argues that skill-focused attention is necessary if experts are to eschew proceduralization and react flexibly to ‘crises’ and fine-grained changes in situational demands. In doing so, it discusses the role played by conscious control, reflection, and bodily awareness in maintaining performance proficiency. It suggests that skill maintenance and continuous improvement are underpinned by the use of both automated procedures (acknowledging that these are inherently active and flexible) and metacognitive knowledge. The chapter concludes by briefly considering how skill-focused attention needs to be applied in both training and performance contexts in order to facilitate continuous improvement.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Duc-Thang Nguyen ◽  
Taehong Kim

In recent years, the prevalence of Wi-Fi-enabled devices such as smartphones, smart appliances, and various sensors has increased. As most IoT devices lack a display or a keypad owing to their tiny size, it is difficult to set connectivity information such as service set identifier (SSID) and password without any help from external devices such as smartphones. Moreover, it is much more complex to apply advanced connectivity options such as SSID hiding, MAC ID filtering, and Wi-Fi Protected Access (WPA) to these devices. Thus, we need a new Wi-Fi network management system which not only facilitates client access operations but also provides a high-level authentication procedure. In this paper, we introduce a remote connectivity control system for Wi-Fi devices based on software-defined networking (SDN) in a wireless environment. The main contributions of the proposed system are twofold: (i) it enables network owner/administrator to manage and approve connection request from Wi-Fi devices through remote services, which is essential for easy connection management across diverse IoT devices; (ii) it also allows fine-grained access control at the device level through remote control. We describe the architecture of SDN-based remote connectivity control of Wi-Fi devices. While verifying the feasibility and performance of the proposed system, we discuss how the proposed system can benefit both service providers and users.


2013 ◽  
Vol 10 (2) ◽  
Author(s):  
Wisma Nugraha Christianto ◽  
Timbul Haryono ◽  
G. R. Lono L. Simatupang ◽  
Soetarno

Pathêt: on the Paper and on the Stage of Shadow Puppet Theater in Practice Theory Perspectif. Researchesabout karawitan focusing on pathêt has given conceptual result which repairs and clarifi es the result of previousresearches, with different paradigms. Some are even repetitive. On the other hand, in the practical level and socialdevelopment, some changes and dynamic art performing paradigm have changed. If the researches and the researchparadigms on this matter cannot accommodate the changes, the result will be static. There have been some pathêtresearches that inspired a critical development of karawitan and performance studies. The result of the researchesimplied a temporal context, demanding revisiting on the temporary context and its relation to non-theoretical matter,partial, practicality, and social reality. A scientifi c construction based on human activity as a practice is neededto construct scientifi c atmosphere to balance social dynamics, performance, and the harmony of gending karawitanrelated to the utilization of pathêt as a part of coherent structure. Social reality of wayang kulit stage is close to theaesthetic and art feel towards karawitan music. Social structure and performance structure in the present time do notalways posses linier relation to the symbolic structure and past aesthetic. Symbolic and aesthetic constructs have theability of symbolic formation and high level of musical appreciation. Therefore, Pierre Bourdieu’s practical critiqueserves as an alternative encouragement reacting to critical pretension towards problem solution and reception ofpathêt in the conceptual and practical level


Author(s):  
Severin Sadjina ◽  
Lars Tandle Kyllingstad ◽  
Martin Rindarøy ◽  
Stian Skjong ◽  
Vilmar Æsøy ◽  
...  

Here, we present the concept of an open virtual prototyping framework (VPF) for maritime systems and operations that enables its users to develop reusable component or subsystem models, and combine them in full-system simulations for prototyping, verification, training, and performance studies. This framework consists of a set of guidelines for model coupling, high-level and low-level coupling interfaces to guarantee interoperability, a full-system simulation software, and example models and demonstrators. We discuss the requirements for such a framework, address the challenges and the possibilities in fulfilling them, and aim to give a list of best practices for modular and efficient virtual prototyping and full-system simulation. The context of our work is within maritime systems and operations, but the issues and solutions we present here are general enough to be of interest to a much broader audience, both industrial and scientific.


Universe ◽  
2021 ◽  
Vol 7 (7) ◽  
pp. 218
Author(s):  
Iuri La Rosa ◽  
Pia Astone ◽  
Sabrina D’Antonio ◽  
Sergio Frasca ◽  
Paola Leaci ◽  
...  

We present a new approach to searching for Continuous gravitational Waves (CWs) emitted by isolated rotating neutron stars, using the high parallel computing efficiency and computational power of modern Graphic Processing Units (GPUs). Specifically, in this paper the porting of one of the algorithms used to search for CW signals, the so-called FrequencyHough transform, on the TensorFlow framework, is described. The new code has been fully tested and its performance on GPUs has been compared to those in a CPU multicore system of the same class, showing a factor of 10 speed-up. This demonstrates that GPU programming with general purpose libraries (the those of the TensorFlow framework) of a high-level programming language can provide a significant improvement of the performance of data analysis, opening new perspectives on wide-parameter searches for CWs.


2012 ◽  
pp. 658-676
Author(s):  
Fabio Garzia ◽  
Roberto Airoldi ◽  
Jari Nurmi

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Ilia Lebedev ◽  
Christopher Fletcher ◽  
Shaoyi Cheng ◽  
James Martin ◽  
Austin Doupnik ◽  
...  

We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.


Author(s):  
Graham Pullan

Engineers are acquiring data at an ever-increasing rate: data from computational design studies; measurements data from manufacturing processes, development tests, and products in service; contemporary data and legacy data. In this paper, two recommendations are made to allow engineers to make better use of these expanding databases. First, we should build on the hierarchical nature of our data; we can navigate and filter the database using high level descriptors such as design specifications and performance metrics, and then request comparative plots of detailed data such as line, contour and surface plots. Second, we can speed up the rate at which we learn from data by making the visualisations dynamic; in so doing, we enable virtual experiments to be performed that highlight connections between input parameters, output metrics and physical mechanisms. The embodiment of these two principles in the open source project, dbslice, is described. Three example applications (an aerodynamic design study for a compressor stator; the application of machine learning to aid navigation of large databases; and visualisation of a database of snapshots from an unsteady simulation) are presented. In each case, the hierarchical data and dynamic visualisations allow the user to explore the database and experience the connections and patterns within it. By Making Use of Our Data to interactively navigate existing and new design spaces in this way, engineers can accelerate their response to the challenges of future products.


Sign in / Sign up

Export Citation Format

Share Document