A Code Generator for High-Performance Tensor Contractions on GPUs

Author(s):  
Jinsung Kim ◽  
Aravind Sukumaran-Rajam ◽  
Vineeth Thumma ◽  
Sriram Krishnamoorthy ◽  
Ajay Panyala ◽  
...  
2014 ◽  
Vol 24 (03) ◽  
pp. 1441004 ◽  
Author(s):  
Stefan Kronawitter ◽  
Holger Stengel ◽  
Georg Hager ◽  
Christian Lengauer

Our aim is to apply program transformations to stencil codes in order to yield the highest possible performance. We recognize memory bandwidth as a major limitation in stencil code performance. We conducted a study in which we applied optimizing transformations to two Jacobi smoother kernels: one 3D 1st-order 7-point stencil and one 3D 3rd-order 19-point stencil. To obtain high performance, the optimizations have to be customized for the execution platform at hand. We illustrate this by experiments on two consumer and two server architectures. We also verified the need for complex optimizations with the help of the Execution-Cache-Memory performance model. A code generator with knowledge about stencil codes and execution platforms should be able to apply our transformations automatically. We are working towards such a generator in project ExaStencils.


2021 ◽  
Vol 47 (3) ◽  
pp. 1-26
Author(s):  
Henrik Barthels ◽  
Christos Psarras ◽  
Paolo Bientinesi

The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. To combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. Linnea uses a custom best-first search algorithm to find a first solution in less than a second, and increasingly better solutions when given more time. In 125 test problems, the code generated by Linnea almost always outperforms Matlab, Julia, Eigen, and Armadillo, with speedups up to and exceeding 10×.


2014 ◽  
Vol 2014 ◽  
pp. 1-23 ◽  
Author(s):  
Songqing Yue ◽  
Jeff Gray

Metaprogramming has shown much promise for improving the quality of software by offering programming language techniques to address issues of modularity, reusability, maintainability, and extensibility. Thus far, the power of metaprogramming has not been explored deeply in the area of high performance computing (HPC). There is a vast body of legacy code written in Fortran running throughout the HPC community. In order to facilitate software maintenance and evolution in HPC systems, we introduce a DSL that can be used to perform source-to-source translation of Fortran programs by providing a higher level of abstraction for specifying program transformations. The underlying transformations are actually carried out through a metaobject protocol (MOP) and a code generator is responsible for translating a SPOT program to the corresponding MOP code. The design focus of the framework is to automate program transformations through techniques of code generation, so that developers only need to specify desired transformations while being oblivious to the details about how the transformations are performed. The paper provides a general motivation for the approach and explains its design and implementation. In addition, this paper presents case studies that illustrate the potential of our approach to improve code modularity, maintainability, and productivity.


Author(s):  
Alfredo Cristóbal-Salas ◽  
Juan D. Santiago-Domínguez ◽  
Bardo Santiago-Vicente ◽  
Marco Antonio Ramírez-Salinas ◽  
Luis Alfonso Villa-Vargas ◽  
...  

2016 ◽  
Vol 80 ◽  
pp. 108-118 ◽  
Author(s):  
A. Abdelfattah ◽  
M. Baboulin ◽  
V. Dobrev ◽  
J. Dongarra ◽  
C. Earl ◽  
...  

Author(s):  
A. V. Crewe ◽  
M. Isaacson ◽  
D. Johnson

A double focusing magnetic spectrometer has been constructed for use with a field emission electron gun scanning microscope in order to study the electron energy loss mechanism in thin specimens. It is of the uniform field sector type with curved pole pieces. The shape of the pole pieces is determined by requiring that all particles be focused to a point at the image slit (point 1). The resultant shape gives perfect focusing in the median plane (Fig. 1) and first order focusing in the vertical plane (Fig. 2).


Author(s):  
N. Yoshimura ◽  
K. Shirota ◽  
T. Etoh

One of the most important requirements for a high-performance EM, especially an analytical EM using a fine beam probe, is to prevent specimen contamination by providing a clean high vacuum in the vicinity of the specimen. However, in almost all commercial EMs, the pressure in the vicinity of the specimen under observation is usually more than ten times higher than the pressure measured at the punping line. The EM column inevitably requires the use of greased Viton O-rings for fine movement, and specimens and films need to be exchanged frequently and several attachments may also be exchanged. For these reasons, a high speed pumping system, as well as a clean vacuum system, is now required. A newly developed electron microscope, the JEM-100CX features clean high vacuum in the vicinity of the specimen, realized by the use of a CASCADE type diffusion pump system which has been essentially improved over its predeces- sorD employed on the JEM-100C.


Author(s):  
John W. Coleman

In the design engineering of high performance electromagnetic lenses, the direct conversion of electron optical design data into drawings for reliable hardware is oftentimes difficult, especially in terms of how to mount parts to each other, how to tolerance dimensions, and how to specify finishes. An answer to this is in the use of magnetostatic analytics, corresponding to boundary conditions for the optical design. With such models, the magnetostatic force on a test pole along the axis may be examined, and in this way one may obtain priority listings for holding dimensions, relieving stresses, etc..The development of magnetostatic models most easily proceeds from the derivation of scalar potentials of separate geometric elements. These potentials can then be conbined at will because of the superposition characteristic of conservative force fields.


Author(s):  
J W Steeds ◽  
R Vincent

We review the analytical powers which will become more widely available as medium voltage (200-300kV) TEMs with facilities for CBED on a nanometre scale come onto the market. Of course, high performance cold field emission STEMs have now been in operation for about twenty years, but it is only in relatively few laboratories that special modification has permitted the performance of CBED experiments. Most notable amongst these pioneering projects is the work in Arizona by Cowley and Spence and, more recently, that in Cambridge by Rodenburg and McMullan.There are a large number of potential advantages of a high intensity, small diameter, focussed probe. We discuss first the advantages for probes larger than the projected unit cell of the crystal under investigation. In this situation we are able to perform CBED on local regions of good crystallinity. Zone axis patterns often contain information which is very sensitive to thickness changes as small as 5nm. In conventional CBED, with a lOnm source, it is very likely that the information will be degraded by thickness averaging within the illuminated area.


Author(s):  
Klaus-Ruediger Peters

A new generation of high performance field emission scanning electron microscopes (FSEM) is now commercially available (JEOL 890, Hitachi S 900, ISI OS 130-F) characterized by an "in lens" position of the specimen where probe diameters are reduced and signal collection improved. Additionally, low voltage operation is extended to 1 kV. Compared to the first generation of FSEM (JE0L JSM 30, Hitachi S 800), which utilized a specimen position below the final lens, specimen size had to be reduced but useful magnification could be impressively increased in both low (1-4 kV) and high (5-40 kV) voltage operation, i.e. from 50,000 to 200,000 and 250,000 to 1,000,000 x respectively.At high accelerating voltage and magnification, contrasts on biological specimens are well characterized1 and are produced by the entering probe electrons in the outmost surface layer within -vl nm depth. Backscattered electrons produce only a background signal. Under these conditions (FIG. 1) image quality is similar to conventional TEM (FIG. 2) and only limited at magnifications >1,000,000 x by probe size (0.5 nm) or non-localization effects (%0.5 nm).


Sign in / Sign up

Export Citation Format

Share Document