An Auto-Programming Approach to Vulkan

We propose a novel high-level approach for software development on GPU using Vulkan API. Our goal is to speed-up development and performance studies for complex algorithms on GPU, which is quite difficult and laborious for Vulkan due to large number of HW features low level details. The proposed approach uses auto programming to translate ordinary C++ to optimized Vulkan implementation with automatic shaders generation, resource binding and fine-grained barriers placement. Our model is not general-purpose programming, but is extendible and customer-focused. For a single C++ input our tool can generate multiple different implementations of algorithm in Vulkan for different cases or types of hardware. For example, we automatically detect reduction in C++ source code and then generate several variants of parallel reduction on GPU: with optimization for different warp size, with or without atomics, using or not subgroup operations. Another example is GPU ray tracing applications for which we can generate different variants: pure software implementation in compute shader, using hardware accelerated ray queries, using full RTX pipeline. The goal of our work is to increase productivity of developers who are forced to use Vulkan due to various required hardware features in their software but still do care about cross-platform ability of the developed software and want to debug their algorithm logic on the CPU. Therefore, we assume that the user will take generated code and integrate it with hand-written Vulkan code.

Download Full-text

Balanced Sparsity for Efficient DNN Inference on GPU

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015676 ◽

2019 ◽

Vol 33 ◽

pp. 5676-5683 ◽

Cited By ~ 3

Author(s):

Zhuliang Yao ◽

Shijie Cao ◽

Wencong Xiao ◽

Chen Zhang ◽

Lanshun Nie

Keyword(s):

Deep Neural Networks ◽

General Purpose ◽

Coarse Grained ◽

Efficient Computation ◽

Model Accuracy ◽

Sparse Model ◽

Model Inference ◽

Fine Grained ◽

Practical Inference ◽

Speed Up

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

Download Full-text

MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation

10.1101/551739 ◽

2019 ◽

Cited By ~ 16

Author(s):

J-Donald Tournier ◽

Robert Smith ◽

David Raffelt ◽

Rami Tabbara ◽

Thijs Dhollander ◽

...

Keyword(s):

Image Processing ◽

Medical Image ◽

Medical Image Processing ◽

Image Data ◽

Data Access ◽

General Purpose ◽

Computational Performance ◽

Cross Platform ◽

High Level ◽

New Applications

AbstractMRtrix3 is an open-source, cross-platform software package for medical image processing, analysis and visualization, with a particular emphasis on the investigation of the brain using diffusion MRI. It is implemented using a fast, modular and flexible general-purpose code framework for image data access and manipulation, enabling efficient development of new applications, whilst retaining high computational performance and a consistent command-line interface between applications. In this article, we provide a high-level overview of the features of the MRtrix3 framework and general-purpose image processing applications provided with the software.

Download Full-text

Explaining continuous improvement

10.1093/oso/9780198852261.003.0010 ◽

2021 ◽

pp. 197-210

Author(s):

John Toner ◽

Barbara Gail Montero ◽

Aidan Moran

Keyword(s):

Continuous Improvement ◽

Focused Attention ◽

Fine Grained ◽

Skill Maintenance ◽

And Performance ◽

Situational Demands ◽

High Level ◽

Conscious Processes ◽

Automated Procedures

The final chapter synthesizes the arguments presented over the course of the book by suggesting that skill execution continues to be governed by conscious processes even after performers have attained a high level of expertise. It argues that skill-focused attention is necessary if experts are to eschew proceduralization and react flexibly to ‘crises’ and fine-grained changes in situational demands. In doing so, it discusses the role played by conscious control, reflection, and bodily awareness in maintaining performance proficiency. It suggests that skill maintenance and continuous improvement are underpinned by the use of both automated procedures (acknowledging that these are inherently active and flexible) and metacognitive knowledge. The chapter concludes by briefly considering how skill-focused attention needs to be applied in both training and performance contexts in order to facilitate continuous improvement.

Download Full-text

An SDN-Based Connectivity Control System for Wi-Fi Devices

Wireless Communications and Mobile Computing ◽

10.1155/2018/9359878 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Duc-Thang Nguyen ◽

Taehong Kim

Keyword(s):

Control System ◽

Service Providers ◽

Connection Management ◽

Fine Grained ◽

Connectivity Information ◽

Wireless Environment ◽

Remote Services ◽

Iot Devices ◽

And Performance ◽

High Level

In recent years, the prevalence of Wi-Fi-enabled devices such as smartphones, smart appliances, and various sensors has increased. As most IoT devices lack a display or a keypad owing to their tiny size, it is difficult to set connectivity information such as service set identifier (SSID) and password without any help from external devices such as smartphones. Moreover, it is much more complex to apply advanced connectivity options such as SSID hiding, MAC ID filtering, and Wi-Fi Protected Access (WPA) to these devices. Thus, we need a new Wi-Fi network management system which not only facilitates client access operations but also provides a high-level authentication procedure. In this paper, we introduce a remote connectivity control system for Wi-Fi devices based on software-defined networking (SDN) in a wireless environment. The main contributions of the proposed system are twofold: (i) it enables network owner/administrator to manage and approve connection request from Wi-Fi devices through remote services, which is essential for easy connection management across diverse IoT devices; (ii) it also allows fine-grained access control at the device level through remote control. We describe the architecture of SDN-based remote connectivity control of Wi-Fi devices. While verifying the feasibility and performance of the proposed system, we discuss how the proposed system can benefit both service providers and users.

Download Full-text

Pathêt: di Atas Kertas dan di Atas Panggung Wayang Kulit dalam Perspektif Teori Praktik

RESITAL JURNAL SENI PERTUNJUKAN ◽

10.24821/resital.v10i2.486 ◽

2013 ◽

Vol 10 (2) ◽

Author(s):

Wisma Nugraha Christianto ◽

Timbul Haryono ◽

G. R. Lono L. Simatupang ◽

Soetarno

Keyword(s):

Performance Studies ◽

Social Dynamics ◽

Social Reality ◽

Problem Solution ◽

Puppet Theater ◽

And Performance ◽

High Level ◽

Wayang Kulit ◽

Symbolic Formation ◽

Practical Level

Pathêt: on the Paper and on the Stage of Shadow Puppet Theater in Practice Theory Perspectif. Researchesabout karawitan focusing on pathêt has given conceptual result which repairs and clarifi es the result of previousresearches, with different paradigms. Some are even repetitive. On the other hand, in the practical level and socialdevelopment, some changes and dynamic art performing paradigm have changed. If the researches and the researchparadigms on this matter cannot accommodate the changes, the result will be static. There have been some pathêtresearches that inspired a critical development of karawitan and performance studies. The result of the researchesimplied a temporal context, demanding revisiting on the temporary context and its relation to non-theoretical matter,partial, practicality, and social reality. A scientifi c construction based on human activity as a practice is neededto construct scientifi c atmosphere to balance social dynamics, performance, and the harmony of gending karawitanrelated to the utilization of pathêt as a part of coherent structure. Social reality of wayang kulit stage is close to theaesthetic and art feel towards karawitan music. Social structure and performance structure in the present time do notalways posses linier relation to the symbolic structure and past aesthetic. Symbolic and aesthetic constructs have theability of symbolic formation and high level of musical appreciation. Therefore, Pierre Bourdieu’s practical critiqueserves as an alternative encouragement reacting to critical pretension towards problem solution and reception ofpathêt in the conceptual and practical level

Download Full-text

Distributed Co-simulation of Maritime Systems and Operations

Journal of Offshore Mechanics and Arctic Engineering ◽

10.1115/1.4040473 ◽

2018 ◽

Vol 141 (1) ◽

Cited By ~ 7

Author(s):

Severin Sadjina ◽

Lars Tandle Kyllingstad ◽

Martin Rindarøy ◽

Stian Skjong ◽

Vilmar Æsøy ◽

...

Keyword(s):

Best Practices ◽

Performance Studies ◽

System Simulation ◽

Virtual Prototyping ◽

Simulation Software ◽

Model Coupling ◽

Full System ◽

Reusable Component ◽

And Performance ◽

High Level

Here, we present the concept of an open virtual prototyping framework (VPF) for maritime systems and operations that enables its users to develop reusable component or subsystem models, and combine them in full-system simulations for prototyping, verification, training, and performance studies. This framework consists of a set of guidelines for model coupling, high-level and low-level coupling interfaces to guarantee interoperability, a full-system simulation software, and example models and demonstrators. We discuss the requirements for such a framework, address the challenges and the possibilities in fulfilling them, and aim to give a list of best practices for modular and efficient virtual prototyping and full-system simulation. The context of our work is within maritime systems and operations, but the issues and solutions we present here are general enough to be of interest to a much broader audience, both industrial and scientific.

Download Full-text

Continuous Gravitational-Wave Data Analysis with General Purpose Computing on Graphic Processing Units

Universe ◽

10.3390/universe7070218 ◽

2021 ◽

Vol 7 (7) ◽

pp. 218

Author(s):

Iuri La Rosa ◽

Pia Astone ◽

Sabrina D’Antonio ◽

Sergio Frasca ◽

Paola Leaci ◽

...

Keyword(s):

Data Analysis ◽

General Purpose ◽

Gpu Programming ◽

Computational Power ◽

Graphic Processing Units ◽

New Approach ◽

Multicore System ◽

Speed Up ◽

High Level ◽

Graphic Processing

We present a new approach to searching for Continuous gravitational Waves (CWs) emitted by isolated rotating neutron stars, using the high parallel computing efficiency and computational power of modern Graphic Processing Units (GPUs). Specifically, in this paper the porting of one of the algorithms used to search for CW signals, the so-called FrequencyHough transform, on the TensorFlow framework, is described. The new code has been fully tested and its performance on GPUs has been compared to those in a CPU multicore system of the same class, showing a factor of 10 speed-up. This demonstrates that GPU programming with general purpose libraries (the those of the TensorFlow framework) of a high-level programming language can provide a significant improvement of the performance of data analysis, opening new perspectives on wide-parameter searches for CWs.

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

Computer Engineering ◽

10.4018/978-1-61350-456-7.ch310 ◽

2012 ◽

pp. 658-676

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Reconfigurable Array ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

Exploring Many-Core Design Templates for FPGAs and ASICs

International Journal of Reconfigurable Computing ◽

10.1155/2012/439141 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 4

Author(s):

Ilia Lebedev ◽

Christopher Fletcher ◽

Shaoyi Cheng ◽

James Martin ◽

Austin Doupnik ◽

...

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Coarse Grained ◽

Processing Unit ◽

Fine Grained ◽

Data Parallel ◽

Level Data ◽

Graph Inference ◽

High Level ◽

Many Core

We present a highly productive approach to hardware design based on a many-core microarchitectural template used to implement compute-bound applications expressed in a high-level data-parallel language such as OpenCL. The template is customized on a per-application basis via a range of high-level parameters such as the interconnect topology or processing element architecture. The key benefits of this approach are that it (i) allows programmers to express parallelism through an API defined in a high-level programming language, (ii) supports coarse-grained multithreading and fine-grained threading while permitting bit-level resource control, and (iii) reduces the effort required to repurpose the system for different algorithms or different applications. We compare template-driven design to both full-custom and programmable approaches by studying implementations of a compute-bound data-parallel Bayesian graph inference algorithm across several candidate platforms. Specifically, we examine a range of template-based implementations on both FPGA and ASIC platforms and compare each against full custom designs. Throughout this study, we use a general-purpose graphics processing unit (GPGPU) implementation as a performance and area baseline. We show that our approach, similar in productivity to programmable approaches such as GPGPU applications, yields implementations with performance approaching that of full-custom designs on both FPGA and ASIC platforms.

Download Full-text

Making use of our data

Journal of the Global Power and Propulsion Society ◽

10.33737/jgpps/134645 ◽

2021 ◽

pp. 1-13

Author(s):

Graham Pullan

Keyword(s):

Performance Metrics ◽

Computational Design ◽

Virtual Experiments ◽

Physical Mechanisms ◽

Large Databases ◽

Speed Up ◽

Unsteady Simulation ◽

Hierarchical Nature ◽

And Performance ◽

High Level

Engineers are acquiring data at an ever-increasing rate: data from computational design studies; measurements data from manufacturing processes, development tests, and products in service; contemporary data and legacy data. In this paper, two recommendations are made to allow engineers to make better use of these expanding databases. First, we should build on the hierarchical nature of our data; we can navigate and filter the database using high level descriptors such as design specifications and performance metrics, and then request comparative plots of detailed data such as line, contour and surface plots. Second, we can speed up the rate at which we learn from data by making the visualisations dynamic; in so doing, we enable virtual experiments to be performed that highlight connections between input parameters, output metrics and physical mechanisms. The embodiment of these two principles in the open source project, dbslice, is described. Three example applications (an aerodynamic design study for a compressor stator; the application of machine learning to aid navigation of large databases; and visualisation of a database of snapshots from an unsteady simulation) are presented. In each case, the hierarchical data and dynamic visualisations allow the user to explore the database and experience the connections and patterns within it. By Making Use of Our Data to interactively navigate existing and new design spaces in this way, engineers can accelerate their response to the challenges of future products.

Download Full-text