Linnea

2021 ◽  
Vol 47 (3) ◽  
pp. 1-26
Author(s):  
Henrik Barthels ◽  
Christos Psarras ◽  
Paolo Bientinesi

The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. To combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. Linnea uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time. In 125 test problems, the code generated by Linnea almost always outperforms Matlab, Julia, Eigen, and Armadillo, with speedups up to and exceeding 10×.

Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


Retos ◽  
2015 ◽  
pp. 42-47
Author(s):  
Angeles Filgueira Perez

En el presente estudio se ha tratado de obtener una aproximación al perfil ideal del entrenador de alto rendimiento en atletismo. En el mundo del deporte de alta competición, el entrenador debe desarrollar funciones de maestro, técnico y líder, lo cual hace difícil delimitar sus competencias (conocimientos, habilidades y cualidades personales). Por tanto, la principal motivación de este trabajo es establecer el papel que debe desempeñar el entrenador para la preparación física, técnica, táctica, psicológica y moral del deportista de alto nivel. Para ello, se ha considerado como población objeto de estudio al conjunto de entrenadores de alto rendimiento que en el momento de la investigación estuviesen activo, ya que interesaba realizar este estudio desde su propia perspectiva. Los datos recogidos forman parte de una investigación más amplia, llevada a cabo mediante encuesta, para la que se diseñó un cuestionario de 78 preguntas en el que se abarcaban tres temas: el perfil del entrenador y del deportista, así como la figura del formador de entrenadores en el Practicum. En este trabajo nos centramos en el primer tema y el análisis de la información obtenida nos permite concluir que la que práctica totalidad de los entrenadores de atletismo consideran necesario dominar con precisión los aspectos técnico-metodológicos. Además, consideran que su ética profesional debe estar regida por los principios de autonomía y beneficencia, de modo que valores como la honestidad y la justicia deben primar en el desarrollo de sus funciones.Abstract: In the current study, we have tried to get an approximation of the ideal profile of high performance coaches in athletics. In the high-level sports world, the coach must develop the role of teacher, technician and leader,  which makes it difficult to delimit  his/her competencies (knowledge, skills and personal qualities). Therefore, the main motivation of this work is to define the role that the coach plays in relation  to the physical, technical, tactical, psychological and moral preparation of elite athletes. Keeping this aim as an objective, we have considered as the target population the high performance coaches who are active at the time of the investigation, since we were interested  in knowing their own perspective. The data that was collected is a part of a wider investigation, conducted by a survey, for which we designed a questionnaire of 78 questions divided on three topics: the profile of the coach and the athlete, as well as the figure of the coach educator in the Practicum. In this work, we focus on the first topic and the analysis of the information that we obtained allows us to conclude that almost all the track and field coaches find it necessary to dominate the technical and methodological aspects. They also consider that professional ethics must be governed by the principles of autonomy and care that values   like honesty and justice must prevail in the performance of their work.


Author(s):  
JOST BERTHOLD ◽  
HANS-WOLFGANG LOIDL ◽  
KEVIN HAMMOND

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.


2014 ◽  
Vol 556-562 ◽  
pp. 3949-3951
Author(s):  
Jian Xin Zhu

Data mining is a technique that aims to analyze and understand large source data reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. Why there have been more attentions to it from researchers or businessmen is due to the wide availability of huge amounts of data and imminent needs for turning such data into valuable information. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Data mining involves an integration of techniques from database, artificial intelligence, machine learning, statistics, knowledge engineering, object-oriented method, information retrieval, high-performance computing and visualization. Essentially, data mining is high-level analysis technology and it has a strong purpose for business profiting. Unlike OLTP applications, data mining should provide in-depth data analysis and the supports for business decisions.


2020 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum ◽  
Brian Vinter

<div>So far, our pure Python, primitive equation ocean model Veros has been</div><div>about 50% slower than a corresponding Fortran implementation. But recent</div><div>benchmarks show that, thanks to a thriving scientific and machine</div><div>learning library ecosystem, tremendous speed-ups on GPU, and to a lesser</div><div>degree CPU, are within reach. On GPU, we find that the same model code</div><div>can reach a 2-5 times higher energy efficiency compared to a traditional</div><div>Fortran model.</div><div>We thus propose a new generation of geophysical models. One that</div><div>combines high-level abstractions and user friendliness on one hand, and</div><div>that leverages modern developments in high-performance computing on the</div><div>other hand.</div><div>We discuss what there is to gain from building models in high-level</div><div>programming languages, what we have achieved, and what the future holds</div><div>for us and the modelling community.</div>


Sign in / Sign up

Export Citation Format

Share Document