Linnea

The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. To combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. Linnea uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time. In 125 test problems, the code generated by Linnea almost always outperforms Matlab, Julia, Eigen, and Armadillo, with speedups up to and exceeding 10×.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

Percepción de los entrenadores de alto rendimiento de atletismo: caracterización de su perfil polifacético (Perception of high performance track and field coaches: characterization of their multifaceted profile)

Retos ◽

10.47197/retos.v0i29.36511 ◽

2015 ◽

pp. 42-47

Author(s):

Angeles Filgueira Perez

Keyword(s):

High Performance ◽

Target Population ◽

Track And Field ◽

Personal Qualities ◽

High Level ◽

Almost All ◽

Methodological Aspects ◽

The Ideal

En el presente estudio se ha tratado de obtener una aproximación al perfil ideal del entrenador de alto rendimiento en atletismo. En el mundo del deporte de alta competición, el entrenador debe desarrollar funciones de maestro, técnico y líder, lo cual hace difícil delimitar sus competencias (conocimientos, habilidades y cualidades personales). Por tanto, la principal motivación de este trabajo es establecer el papel que debe desempeñar el entrenador para la preparación física, técnica, táctica, psicológica y moral del deportista de alto nivel. Para ello, se ha considerado como población objeto de estudio al conjunto de entrenadores de alto rendimiento que en el momento de la investigación estuviesen activo, ya que interesaba realizar este estudio desde su propia perspectiva. Los datos recogidos forman parte de una investigación más amplia, llevada a cabo mediante encuesta, para la que se diseñó un cuestionario de 78 preguntas en el que se abarcaban tres temas: el perfil del entrenador y del deportista, así como la figura del formador de entrenadores en el Practicum. En este trabajo nos centramos en el primer tema y el análisis de la información obtenida nos permite concluir que la que práctica totalidad de los entrenadores de atletismo consideran necesario dominar con precisión los aspectos técnico-metodológicos. Además, consideran que su ética profesional debe estar regida por los principios de autonomía y beneficencia, de modo que valores como la honestidad y la justicia deben primar en el desarrollo de sus funciones.Abstract: In the current study, we have tried to get an approximation of the ideal profile of high performance coaches in athletics. In the high-level sports world, the coach must develop the role of teacher, technician and leader, which makes it difficult to delimit his/her competencies (knowledge, skills and personal qualities). Therefore, the main motivation of this work is to define the role that the coach plays in relation to the physical, technical, tactical, psychological and moral preparation of elite athletes. Keeping this aim as an objective, we have considered as the target population the high performance coaches who are active at the time of the investigation, since we were interested in knowing their own perspective. The data that was collected is a part of a wider investigation, conducted by a survey, for which we designed a questionnaire of 78 questions divided on three topics: the profile of the coach and the athlete, as well as the figure of the coach educator in the Practicum. In this work, we focus on the first topic and the analysis of the information that we obtained allows us to conclude that almost all the track and field coaches find it necessary to dominate the technical and methodological aspects. They also consider that professional ethics must be governed by the principles of autonomy and care that values like honesty and justice must prevail in the performance of their work.

Download Full-text

2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

10.1109/wolfhpc40351.2016 ◽

2016 ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

International Workshop ◽

Domain Specific Languages ◽

Domain Specific ◽

Sixth International Workshop ◽

High Level ◽

Performance Computing ◽

Sixth International

Download Full-text

Editor’s Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing

International Journal of Parallel Programming ◽

10.1007/s10766-019-00644-z ◽

2019 ◽

Vol 47 (5-6) ◽

pp. 1045-1045

Keyword(s):

High Performance Computing ◽

High Performance ◽

Special Issue ◽

High Level ◽

Performance Computing

Download Full-text

PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Journal of Functional Programming ◽

10.1017/s0956796816000010 ◽

2016 ◽

Vol 26 ◽

Cited By ~ 1

Author(s):

JOST BERTHOLD ◽

HANS-WOLFGANG LOIDL ◽

KEVIN HAMMOND

Keyword(s):

Shared Memory ◽

High Performance ◽

Parallel Machines ◽

State Of The Art ◽

Computing Systems ◽

Programming Abstraction ◽

Work Distribution ◽

High Level ◽

Parallelism Model ◽

Performance Computing

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Download Full-text

Arithmetic Research on Data Mining Technology and Associative Rules Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3949 ◽

2014 ◽

Vol 556-562 ◽

pp. 3949-3951

Author(s):

Jian Xin Zhu

Keyword(s):

Data Mining ◽

High Performance ◽

Knowledge Engineering ◽

Business Decisions ◽

Depth Data ◽

Source Data ◽

Wide Availability ◽

High Level ◽

Performance Computing ◽

Level Analysis

Data mining is a technique that aims to analyze and understand large source data reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions.

Download Full-text

Evaluating high-level design strategies on FPGAs for high-performance computing

2017 27th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.23919/fpl.2017.8056756 ◽

2017 ◽

Author(s):

Artur Podobas ◽

Hamid Reza Zohouri ◽

Naoya Maruyama ◽

Satoshi Matsuoka

Keyword(s):

High Performance Computing ◽

High Performance ◽

Design Strategies ◽

Level Design ◽

High Level ◽

Performance Computing

Download Full-text

Next-generation geophysical modelling

10.5194/egusphere-egu2020-20536 ◽

2020 ◽

Author(s):

Roman Nuterman ◽

Dion Häfner ◽

Markus Jochum ◽

Brian Vinter

Keyword(s):

Programming Languages ◽

High Performance ◽

Ocean Model ◽

User Friendliness ◽

Model Code ◽

Building Models ◽

Fortran Implementation ◽

High Level ◽

New Generation ◽

Performance Computing

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>

Download Full-text

Evaluating high-level design strategies on FPGAs for high-performance computing

2017 27th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.23919/fpl.2017.8056760 ◽

2017 ◽

Author(s):

Artur Podobas ◽

Hamid Reza Zohouri ◽

Naoya Maruyama ◽

Satoshi Matsuoka

Keyword(s):

High Performance Computing ◽

High Performance ◽

Design Strategies ◽

Level Design ◽

High Level ◽

Performance Computing

Download Full-text