LOMA: Fast Auto-Scheduling on DNN Accelerators through Loop-Order-based Memory Allocation

Author(s):  
Arne Symons ◽  
Linyan Mei ◽  
Marian Verhelst
Keyword(s):  
Author(s):  
Joseph F. Boudreau ◽  
Eric S. Swanson

While there is no such thing as a “typical” C++ class, several common syntactical constructs lend themselves to extremely widespread use and must be mastered by C++ programmers. To motivate the discussion of software design at the level of the C++ class, examples from computer science and optics are introduced. Important syntactical elements such as constructors, destructors, copy constructors, assignment operators, cast operators, and const qualifiers, together with function overloading, operator overloading, and dynamic memory allocation are discussed. These concepts, illustrated with examples from physics, are presented and explained. Further examples from optical and quantum mechanical problems are left to the exercises. This chapter and its exercises gives the reader sufficient information to begin developing his or her own classes and to experiment with class design through trial and error.


2021 ◽  
Vol 2021 (3) ◽  
Author(s):  
Neelima Agarwal ◽  
Lorenzo Magnea ◽  
Sourav Pal ◽  
Anurag Tripathi

Abstract Correlators of Wilson-line operators in non-abelian gauge theories are known to exponentiate, and their logarithms can be organised in terms of collections of Feynman diagrams called webs. In [1] we introduced the concept of Cweb, or correlator web, which is a set of skeleton diagrams built with connected gluon correlators, and we computed the mixing matrices for all Cwebs connecting four or five Wilson lines at four loops. Here we complete the evaluation of four-loop mixing matrices, presenting the results for all Cwebs connecting two and three Wilson lines. We observe that the conjuctured column sum rule is obeyed by all the mixing matrices that appear at four-loops. We also show how low-dimensional mixing matrices can be uniquely determined from their known combinatorial properties, and provide some all-order results for selected classes of mixing matrices. Our results complete the required colour building blocks for the calculation of the soft anomalous dimension matrix at four-loop order.


2021 ◽  
Vol 2021 (4) ◽  
Author(s):  
Martin Bauer ◽  
Matthias Neubert ◽  
Sophie Renner ◽  
Marvin Schnubel ◽  
Andrea Thamm

Abstract Axions and axion-like particles (ALPs) are well-motivated low-energy relics of high-energy extensions of the Standard Model, which interact with the known particles through higher-dimensional operators suppressed by the mass scale Λ of the new-physics sector. Starting from the most general dimension-5 interactions, we discuss in detail the evolution of the ALP couplings from the new-physics scale to energies at and below the scale of electroweak symmetry breaking. We derive the relevant anomalous dimensions at two-loop order in gauge couplings and one-loop order in Yukawa interactions, carefully considering the treatment of a redundant operator involving an ALP coupling to the Higgs current. We account for one-loop (and partially two-loop) matching contributions at the weak scale, including in particular flavor-changing effects. The relations between different equivalent forms of the effective Lagrangian are discussed in detail. We also construct the effective chiral Lagrangian for an ALP interacting with photons and light pseudoscalar mesons, pointing out important differences with the corresponding Lagrangian for the QCD axion.


2021 ◽  
Vol 17 (2) ◽  
pp. 1-45
Author(s):  
Cheng Pan ◽  
Xiaolin Wang ◽  
Yingwei Luo ◽  
Zhenlin Wang

Due to large data volume and low latency requirements of modern web services, the use of an in-memory key-value (KV) cache often becomes an inevitable choice (e.g., Redis and Memcached). The in-memory cache holds hot data, reduces request latency, and alleviates the load on background databases. Inheriting from the traditional hardware cache design, many existing KV cache systems still use recency-based cache replacement algorithms, e.g., least recently used or its approximations. However, the diversity of miss penalty distinguishes a KV cache from a hardware cache. Inadequate consideration of penalty can substantially compromise space utilization and request service time. KV accesses also demonstrate locality, which needs to be coordinated with miss penalty to guide cache management. In this article, we first discuss how to enhance the existing cache model, the Average Eviction Time model, so that it can adapt to modeling a KV cache. After that, we apply the model to Redis and propose pRedis, Penalty- and Locality-aware Memory Allocation in Redis, which synthesizes data locality and miss penalty, in a quantitative manner, to guide memory allocation and replacement in Redis. At the same time, we also explore the diurnal behavior of a KV store and exploit long-term reuse. We replace the original passive eviction mechanism with an automatic dump/load mechanism, to smooth the transition between access peaks and valleys. Our evaluation shows that pRedis effectively reduces the average and tail access latency with minimal time and space overhead. For both real-world and synthetic workloads, our approach delivers an average of 14.0%∼52.3% latency reduction over a state-of-the-art penalty-aware cache management scheme, Hyperbolic Caching (HC), and shows more quantitative predictability of performance. Moreover, we can obtain even lower average latency (1.1%∼5.5%) when dynamically switching policies between pRedis and HC.


Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 956
Author(s):  
Dafne Carolina Arias-Perdomo ◽  
Adriano Cherchiglia ◽  
Brigitte Hiller ◽  
Marcos Sampaio

Quantum Field Theory, as the keystone of particle physics, has offered great insights into deciphering the core of Nature. Despite its striking success, by adhering to local interactions, Quantum Field Theory suffers from the appearance of divergent quantities in intermediary steps of the calculation, which encompasses the need for some regularization/renormalization prescription. As an alternative to traditional methods, based on the analytic extension of space–time dimension, frameworks that stay in the physical dimension have emerged; Implicit Regularization is one among them. We briefly review the method, aiming to illustrate how Implicit Regularization complies with the BPHZ theorem, which implies that it respects unitarity and locality to arbitrary loop order. We also pedagogically discuss how the method complies with gauge symmetry using one- and two-loop examples in QED and QCD.


2021 ◽  
Vol 13 (4) ◽  
pp. 559
Author(s):  
Milto Miltiadou ◽  
Neill D. F. Campbell ◽  
Darren Cosker ◽  
Michael G. Grant

In this paper, we investigate the performance of six data structures for managing voxelised full-waveform airborne LiDAR data during 3D polygonal model creation. While full-waveform LiDAR data has been available for over a decade, extraction of peak points is the most widely used approach of interpreting them. The increased information stored within the waveform data makes interpretation and handling difficult. It is, therefore, important to research which data structures are more appropriate for storing and interpreting the data. In this paper, we investigate the performance of six data structures while voxelising and interpreting full-waveform LiDAR data for 3D polygonal model creation. The data structures are tested in terms of time efficiency and memory consumption during run-time and are the following: (1) 1D-Array that guarantees coherent memory allocation, (2) Voxel Hashing, which uses a hash table for storing the intensity values (3) Octree (4) Integral Volumes that allows finding the sum of any cuboid area in constant time, (5) Octree Max/Min, which is an upgraded octree and (6) Integral Octree, which is proposed here and it is an attempt to combine the benefits of octrees and Integral Volumes. In this paper, it is shown that Integral Volumes is the more time efficient data structure but it requires the most memory allocation. Furthermore, 1D-Array and Integral Volumes require the allocation of coherent space in memory including the empty voxels, while Voxel Hashing and the octree related data structures do not require to allocate memory for empty voxels. These data structures, therefore, and as shown in the test conducted, allocate less memory. To sum up, there is a need to investigate how the LiDAR data are stored in memory. Each tested data structure has different benefits and downsides; therefore, each application should be examined individually.


2020 ◽  
Vol 98 (Supplement_3) ◽  
pp. 41-42
Author(s):  
B Victor Oribamise ◽  
Lauren L Hulsman Hanna

Abstract Without appropriate relationships present in a given population, identifying dominance effects in the expression of desirable traits is challenging. Including non-additive effects is desirable to increase accuracy of breeding values. There is no current user-friendly tool package to investigate genetic relatedness in large pedigrees. The objective was to develop and implement efficient algorithms in R to calculate and visualize measures of relatedness (e.g., sibling and family structure, numerator relationship matrices) for large pedigrees. Comparisons to current R packages (Table 1) are also made. Functions to assign animals to families, summary of sibling counts, calculation of numerator relationship matrix (NRM), and NRM summary by groups were created, providing a comprehensive toolkit (Sibs package) not found in other packages. Pedigrees of various sizes (n = 20, 4,035, 120,000 and 132,833) were used to test functionality and compare to current packages. All runs were conducted on a Windows-based computer with an 8 GB RAM, 2.5 GHz Intel Core i7 processor. Other packages had no significant difference in runtime when constructing the NRM for small pedigrees (n = 20) compared to Sibs (0 to 0.05 s difference). However, packages such as ggroups, AGHmatrix, and pedigree were 10 to 15 min slower than Sibs for a 4,035-individual pedigree. Packages nadiv and pedigreemm competed with Sibs (0.30 to 60 s slower than Sibs), but no package besides Sibs was able to complete the 132,833-individual pedigree due to memory allocation issues in R. The nadiv package was closest with a pedigree of 120,000 individuals, but took 37 min to complete (13 min slower than Sibs). This package also provides easier input of pedigrees and is more encompassing of such relatedness measures than other packages (Table 1). Furthermore, it can provide an option to utilize other packages such as GCA for connectedness calculations when using large pedigrees.


2015 ◽  
Vol 2015 (11) ◽  
Author(s):  
Ioan Ghisoiu ◽  
Jan Möller ◽  
York Schröder

Sign in / Sign up

Export Citation Format

Share Document