USING METAPROGRAMMING TO PARALLELIZE FUNCTIONAL SPECIFICATIONS

Metaprogramming is a paradigm for enhancing a general-purpose programming language with features catering for a special-purpose application domain, without a need for a reimplementation of the language. In a staged compilation, the special-purpose features are translated and optimised by a domain-specific preprocessor, which hands over to the general-purpose compiler for translation of the domain-independent part of the program. The domain we work in is high-performance parallel computing. We use metaprogramming to enhance the functional language Haskell with features for the efficient, parallel implementation of certain computational patterns, called skeletons.

Download Full-text

Code Refinement of Stencil Codes

Parallel Processing Letters ◽

10.1142/s0129626414410035 ◽

2014 ◽

Vol 24 (03) ◽

pp. 1441003 ◽

Cited By ~ 4

Author(s):

Marcel Köster ◽

Roland Leißa ◽

Sebastian Hack ◽

Richard Membarth ◽

Philipp Slusallek

Keyword(s):

Programming Language ◽

General Purpose ◽

Peak Performance ◽

Domain Specific Language ◽

Competitive Performance ◽

Specific Language ◽

Popular Method ◽

Domain Specific ◽

Stencil Codes ◽

Target Platform

A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: Compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those peculiarities to the implementation. A popular method to avoid these problems is to implement the algorithm in a Domain-Specific Language (DSL). A DSL compiler can then automatically tune the code for the target platform. In this article we show how to embed a DSL for stencil codes in another language. In contrast to prior approaches we only use a single language for this task which offers explicit control over code refinement. This is used to specialize stencils for particular scenarios. Our results show that our specialized programs achieve competitive performance compared to hand-tuned CUDA programs while maintaining a convenient coding experience.

Download Full-text

Combinators for program generation

Journal of Functional Programming ◽

10.1017/s0956796899003469 ◽

1999 ◽

Vol 9 (5) ◽

pp. 483-525 ◽

Cited By ~ 20

Author(s):

PETER THIEMANN

Keyword(s):

Programming Language ◽

Functional Programming ◽

Domain Specific Language ◽

Functional Language ◽

Abstract Syntax ◽

Specific Language ◽

Program Generation ◽

Domain Specific ◽

Meta Programming ◽

General Method

We present a general method to transform a compositional specification of a specializer for a functional programming language into a set of combinators that can be used to perform the same specialization more efficiently. The main transformation steps are the transition to higher-order abstract syntax and untagging. All transformation steps are proved correct. The resulting combinators can be implemented in any functional language, typed or untyped, pure or impure. They may also be considered as forming a domain-specific language for meta-programming. We demonstrate the generality of the method by applying it to several specializers of increasing strength. We demonstrate its efficiency by comparing it with a traditional specialization system based on self-application.

Download Full-text

SpecTest: Specification-Based Compiler Testing

Fundamental Approaches to Software Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-030-71500-7_14 ◽

2021 ◽

pp. 269-291

Author(s):

Richard Schumi ◽

Jun Sun

Keyword(s):

Programming Languages ◽

Programming Language ◽

General Purpose ◽

Test Coverage ◽

Domain Specific ◽

Test Oracle ◽

Coverage Criterion ◽

Language Semantics ◽

Compiler Testing ◽

Semantic Errors

AbstractCompilers are error-prone due to their high complexity. They are relevant for not only general purpose programming languages, but also for many domain specific languages. Bugs in compilers can potentially render all programs at risk. It is thus crucial that compilers are systematically tested, if not verified. Recently, a number of efforts have been made to formalise and standardise programming language semantics, which can be applied to verify the correctness of the respective compilers. In this work, we present a novel specification-based testing method named SpecTest to better utilise these semantics for testing. By applying an executable semantics as test oracle, SpecTest can discover deep semantic errors in compilers. Compared to existing approaches, SpecTest is built upon a novel test coverage criterion called semantic coverage which brings together mutation testing and fuzzing to specifically target less tested language features. We apply SpecTest to systematically test two compilers, i.e., the Java compiler and the Solidity compiler. SpecTest improves the semantic coverage of both compilers considerably and reveals multiple previously unknown bugs.

Download Full-text

SHAKTI: An Open-Source Processor Ecosystem

10.34048/2018.3.f2 ◽

2018 ◽

Author(s):

Neel Gala ◽

Madhusudan G. S. ◽

Paul George ◽

Anmol Sahoo ◽

Arjun Menon ◽

...

Keyword(s):

Low Power ◽

Open Source ◽

High Performance ◽

General Purpose ◽

Domain Specific ◽

Industrial Settings ◽

Motor Controls ◽

Application Specific

Processors have become ubiquitous in all the appliances and machines we use, in both consumer and industrial settings. These processors range from extremely small and low power micro-controllers (used in motor controls, home robots and appliances) to high-performance multi-core processors (used in servers and supercomputers). However, the growth of modern AI/ML environments (like Caffe[Jia et al. 2014], Tensorflow[Abadi et al. 2016]) and the need for features like enhanced security has forced the industry to look beyond general purpose solutions and towards domain-specific-customizations. While a large number of companies today can develop custom ASICs (Application Specific Integrated Chips) and license specific silicon blocks from chip-vendors to develop a customized SoCs (System on Chips), at the heart of every design is the processor and the associated hardware. To serve modern workloads better, these processors also need to be customized, upgraded, re-designed and augmented suitably. This requires that vendors/consumers have access to appropriate processor variants and the flexibility to make modifications and ship them at an affordable cost.

Download Full-text

Extending C and FORTRAN for Design Automation

20th Design Automation Conference: Volume 1 — Dynamic Mechanical Systems; Geometric Modeling and Features; Concurrent Engineering ◽

10.1115/detc1994-0051 ◽

1994 ◽

Author(s):

Harry H. Cheng

Keyword(s):

Programming Languages ◽

Programming Language ◽

Systems Engineering ◽

Design Automation ◽

High Performance ◽

General Purpose ◽

Software Packages ◽

Current Implementation ◽

Almost All ◽

Fortran 77

Abstract The CH programming language, a high-performance C, is designed to be a superset of ANSI C. CH bridges the gap between ANSI C and FORTRAN; it encompasses almost all the programming capabilities of FORTRAN 77 in the current implementation and consists of features of many other programming languages and software packages. Unlike other general-purpose programming languages, CH is designed to be especially suitable for applications in mechanical systems engineering. Because of our research interests, many programming features in CH have been implemented for design automation, although they are useful in other applications as well. In this paper we will describe these new programming features for design automation, as they are currently implemented in CH in comparison with ANSI C and FORTRAN 77.

Download Full-text

APA Book Talk: The Psychology of High Performance: Developing Human Potential Into Domain-Specific Talent

PsycEXTRA Dataset ◽

10.1037/e501472020-001 ◽

2020 ◽

Author(s):

Jamie Buck ◽

Rena Subotnik ◽

Frank Worrell ◽

Paula Olszewski-Kubilius ◽

Chi Wang

Keyword(s):

High Performance ◽

Human Potential ◽

Domain Specific ◽

Book Talk

Download Full-text

Matlab and Parallel Computing

Image Processing & Communications ◽

10.2478/v10248-012-0048-5 ◽

2012 ◽

Vol 17 (4) ◽

pp. 207-216 ◽

Cited By ~ 5

Author(s):

Magdalena Szymczyk ◽

Piotr Szymczyk

Keyword(s):

Image Processing ◽

Signal Processing ◽

Parallel Computing ◽

Distributed Computing ◽

Control Systems ◽

High Performance ◽

Parallel Applications ◽

Process Simulations ◽

Key Features ◽

Financial Process

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.

Download Full-text

Design Tradeoffs of High Performance DSPs for General-Purpose HPC

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2013.00790 ◽

2014 ◽

Vol 36 (4) ◽

pp. 790-798

Author(s):

Kai ZHANG ◽

Shu-Ming CHEN ◽

Yao-Hua WANG ◽

Xi NING

Keyword(s):

High Performance ◽

General Purpose ◽

Design Tradeoffs

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons

International Journal of Parallel Programming ◽

10.1007/s10766-021-00714-1 ◽

2021 ◽

Author(s):

Breno A. de Melo Menezes ◽

Nina Herrmann ◽

Herbert Kuchen ◽

Fernando Buarque de Lima Neto

Keyword(s):

Ant Colony Optimization ◽

High Performance ◽

Optimization Problems ◽

Programming Model ◽

Parallel Implementation ◽

Ant Colony ◽

Algorithmic Skeletons ◽

Low Level ◽

Programming Patterns ◽

High Level

AbstractParallel implementations of swarm intelligence algorithms such as the ant colony optimization (ACO) have been widely used to shorten the execution time when solving complex optimization problems. When aiming for a GPU environment, developing efficient parallel versions of such algorithms using CUDA can be a difficult and error-prone task even for experienced programmers. To overcome this issue, the parallel programming model of Algorithmic Skeletons simplifies parallel programs by abstracting from low-level features. This is realized by defining common programming patterns (e.g. map, fold and zip) that later on will be converted to efficient parallel code. In this paper, we show how algorithmic skeletons formulated in the domain specific language Musket can cope with the development of a parallel implementation of ACO and how that compares to a low-level implementation. Our experimental results show that Musket suits the development of ACO. Besides making it easier for the programmer to deal with the parallelization aspects, Musket generates high performance code with similar execution times when compared to low-level implementations.

Download Full-text