Automatic Pipelining and Vectorization of Scientific Code for FPGAs

There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyTra to overcome this obstacle. The TyTra flow aims to compile legacy Fortran code automatically for FPGA-based acceleration, while applying suitable optimizations. We present the flow with a focus on two key optimizations, automatic pipelining and vectorization. Our compiler frontend extracts patterns from legacy Fortran code that can be pipelined and vectorized. The backend first creates fine and coarse-grained pipelines and then automatically vectorizes both the memory access and the datapath based on a cost model, generating an OpenCL-HDL hybrid working solution for FPGA targets on the Amazon cloud. Our results show up to 4.2× performance improvement over baseline OpenCL code.

Download Full-text

Reducing Memory Access Conflicts with Loop Transformation and Data Reuse on Coarse-grained Reconfigurable Architecture

10.23919/date51398.2021.9473971 ◽

2021 ◽

Author(s):

Yuge Chen ◽

Zhongyuan Zhao ◽

Jianfei Jiang ◽

Guanghui He ◽

Zhigang Mao ◽

...

Keyword(s):

Data Reuse ◽

Reconfigurable Architecture ◽

Memory Access ◽

Coarse Grained ◽

Loop Transformation

Download Full-text

A mixed transaction cost model for coarse grained multi-column partitioning in a shared-nothing database machine

Information Systems ◽

10.1016/0306-4379(94)90010-8 ◽

1994 ◽

Vol 19 (2) ◽

pp. 193

Keyword(s):

Transaction Cost ◽

Cost Model ◽

Coarse Grained ◽

Database Machine

Download Full-text

Membrane-mediated forces can stabilize tubular assemblies of I-Bar proteins

10.1101/2020.06.10.144527 ◽

2020 ◽

Cited By ~ 1

Author(s):

Z. Jarin ◽

A. J. Pak ◽

P. Bassereau ◽

G. A. Voth

Keyword(s):

Computational Models ◽

Large Body ◽

Membrane Curvature ◽

Coarse Grained ◽

Membrane Remodeling ◽

Membrane Composition ◽

Membrane Structures ◽

Bar Domain ◽

Cellular Environment ◽

Bar Domains

AbstractCollective action by Inverse-BAR (I-BAR) domains drive micron-scale membrane remodeling. The macroscopic curvature sensing and generation behavior of I-BAR domains is well characterized, and computational models have suggested various mechanisms on simplified membrane systems, but there remain missing connections between the complex environment of the cell and the models proposed thus far. Here, we show a connection between the role of protein curvature and lipid clustering in the stabilization of large membrane deformations. We find lipid clustering provides a directional membrane-mediated interaction between membrane-bound I-BAR domains. Lipid clusters stabilize I-BAR domain aggregates that would not arise through membrane fluctuation-based or curvature-based interactions. Inside of membrane protrusions, lipid cluster-mediated interaction draws long side-by-side aggregates together resulting in more cylindrical protrusions as opposed to bulbous, irregularly shaped protrusions.Statement of SignificanceMembrane remodeling occurs throughout the cell and is crucial to proper cellular function. In the cellular environment, I-BAR proteins are responsible for sensing membrane curvature and initiating the formation of protrusions outward from the cell. Additionally, there is a large body of evidence that I-BAR domains are sufficient to reshape the membrane on scales much larger than any single domain. The mechanism by which I-BAR domains can remodel the membrane is uncertain. However, experiments show that membrane composition and most notably negatively-charge lipids like PIP2 play a role in the onset of tubulation. Using coarse-grained models, we show that I-BAR domains can cluster negatively charge lipids and clustered PIP2-like membrane structures facilitate a directional membrane-mediated interaction between I-BAR domains.

Download Full-text

The physics of Empty Liquids: from Patchy particles to Water

Reports on Progress in Physics ◽

10.1088/1361-6633/ac42d9 ◽

2021 ◽

Author(s):

John Russo ◽

Fabio Leoni ◽

Fausto Martelli ◽

Francesco SCIORTINO

Keyword(s):

Phase Diagram ◽

Liquid Phase ◽

Random Network ◽

Large Body ◽

The Other ◽

Coarse Grained ◽

Pure Substance ◽

Complex Phase ◽

Precise Control ◽

Patchy Particles

Abstract Empty liquids represent a wide class of materials whose constituents arrange in a random network through reversible bonds. Many key insights on the physical properties of empty liquids have originated almost independently from the study of colloidal patchy particles on one side, and a large body of theoretical and experimental research on water on the other side. Patchy particles represent a family of coarse-grained potentials that allows for a precise control of both the geometric and the energetic aspects of bonding, while water has arguably the most complex phase diagram of any pure substance, and a puzzling amorphous phase behavior. It was only recently that the exchange of ideas from both fields has made it possible to solve long-standing problems and shed new light on the behavior of empty liquids. Here we highlight the connections between patchy particles and water, focusing on the modelling principles that make an empty liquid behave like water, including the factors that control the appearance of thermodynamic and dynamic anomalies, the possibility of liquid-liquid phase transitions, and the crystallization of open crystalline structures.

Download Full-text

Cumulate mush hybridization by melt invasion: Evidence from compositionally-diverse amphiboles in ultramafic-mafic arc cumulates within the eastern Gangdese Batholith, southern Tibet

Journal of Petrology ◽

10.1093/petrology/egab073 ◽

2021 ◽

Author(s):

Wei Xu ◽

Di-Cheng Zhu ◽

Qing Wang ◽

Roberto F Weinberg ◽

Rui Wang ◽

...

Keyword(s):

Mineral Assemblage ◽

Large Body ◽

The Body ◽

Coarse Grained ◽

Southern Tibet ◽

Fine Grained ◽

Arc Magmas ◽

Rock Types ◽

Geochemical Study ◽

Host Rocks

Abstract Amphibole plays an important role in the petrogenesis and evolution of arc magmas, but its role is not completely understood yet. Here, a field, petrological, geochronological and geochemical study is carried out on ultramafic-mafic arc cumulates with textural and chemical heterogeneities and on associated host diorites from the eastern Gangdese Batholith, southern Tibet to explore the problem. The cumulates occur as a large body in diorite host-rocks. The core of the body consists of coarse-grained Cpx hornblendite with a porphyritic texture. Towards the contact with the host diorite, the coarse-grained Cpx hornblendite grades to relatively homogeneous fine-grained melagabbro. Zircon U–Pb dating indicates they all crystallized at 200 ± 1 Ma. Textural features and whole-rock and mineral chemical data reveal that both the Cpx hornblendite and the melagabbro are mixtures of two different mineral assemblages that are not in equilibrium: (1) brown amphibole and its clinopyroxene inclusions; (2) matrix clinopyroxene + green amphibole + plagioclase + quartz + accessory phases. Clinopyroxene and brown amphibole from the first assemblage are enriched in middle rare earth elements (MREE) relative to light REE (LREE) and heavy REE (HREE), and are weakly depleted in Ti, whereas clinopyroxene and green amphibole from the second assemblage are characterized by LREE enrichment over MREE-HREE and more marked Sr and Ti depletion. The higher Mg#, MgO and Cr of the late-formed green amphibole than the early-formed brown amphibole suggest that the two assemblages are not on the same liquid line of descent. Given the close relations of the three rock types in the exposed crustal section, the cumulates are interpreted to have formed in an open system, in which an ultramafic cumulate body consisting of the first assemblage reacted with the host dioritic melt to form new clinopyroxene and amphibole of the second assemblage. The melt calculated to be in equilibrium with the first mineral assemblage resembles an average continental arc basalt, that is less evolved than the host dioritic melt, responsible for the second mineral assemblage. On the basis of whole-rock Sr–Nd–Hf isotopic similarity of the cumulates and a host diorite sample, we argue that the host diorites were formed through crystal fractionation from the parent melt of the first assemblage. Results of least-squares mass-balance calculations suggest the quantities of the host dioritic melts, involved in the generation of these modified cumulates, vary from ~25% to ~44%. The presence of magmatic epidote in the host diorites and Al-in-Hb geobarometry indicate the reaction that occurred when the dioritic melts percolated through the cumulate body was at ~6 kbar. Both the brown and green amphiboles are enriched in MREE relative to HREE, and can impart residual melts with a strong geochemical signature of amphibole fractionation (low Dy/Yb). Thus, we conclude that fractional crystallization and melt-rock reaction are two mechanisms by which amphibole controls arc magma petrogenesis and evolution.

Download Full-text

Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/2003695.2003702 ◽

2011 ◽

Vol 16 (4) ◽

pp. 1-27 ◽

Cited By ~ 11

Author(s):

Yongjoo Kim ◽

Jongeun Lee ◽

Aviral Shrivastava ◽

Yunheung Paek

Keyword(s):

Memory Access ◽

Coarse Grained ◽

Reconfigurable Architectures

Download Full-text

Menhir: An Environment for High Performance Matlab

Scientific Programming ◽

10.1155/1999/525690 ◽

1999 ◽

Vol 7 (3-4) ◽

pp. 303-312 ◽

Cited By ~ 3

Author(s):

Stéphane Chauveau ◽

François Bodin

Keyword(s):

High Performance ◽

Specification Language ◽

Target System ◽

System Description ◽

Fortran Code ◽

Compilation Process ◽

Parallel Code

In this paper we present Menhir a compiler for generating sequential or parallel code from the Matlab language. The compiler has been designed in the context of using Matlab as a specification language. One of the major features of Menhir is its retargetability to generate parallel and sequential C or Fortran code. We present the compilation process and the target system description for Menhir. Preliminary performances are given and compared with MCC, the MathWorks Matlab compiler.

Download Full-text

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Computers ◽

10.3390/computers7040066 ◽

2018 ◽

Vol 7 (4) ◽

pp. 66

Author(s):

Iulia Știrb

Keyword(s):

Compiler Optimization ◽

Memory Access ◽

Task Parallelism ◽

Real Time Control ◽

Shared Resources ◽

Mapping Algorithm ◽

Time Control ◽

Parallel Code ◽

Thread Mapping ◽

Task Level

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.

Download Full-text

A Variable Processor Cache Line Size Architecture

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8427.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1724-1727

Keyword(s):

Energy Saving ◽

Performance Improvement ◽

Performance Parameter ◽

Memory Access ◽

Access Time ◽

Time Interval ◽

Cache Line ◽

Fixed Line ◽

Processor Caches ◽

A Performance

Processor caches have fixed line size. A processor cache defined by tuple (C, k, L) where C is the capacity, k associativity and L line size has fixed values for the parameters. Algorithms to have variable processor cache line size are proposed in literature. This paper proposes algorithm to have variable cache line size based on the miss count for any application. The line size is varied by increasing or decreasing line size based on the miss count for any time interval. The algorithm can be used in running any application. The SPEC2000 benchmarks are used for simulating the proposed algorithm for cache with one level. The average memory access time is chosen as performance parameter. A performance improvement of 12% is observed with energy saving of 18% for chosen parameters.

Download Full-text

Optimizing Sample Design for Approximate Query Processing

International Journal of Knowledge-Based Organizations ◽

10.4018/ijkbo.2013100101 ◽

2013 ◽

Vol 3 (4) ◽

pp. 1-21

Author(s):

Philipp Rösch ◽

Wolfgang Lehner

Keyword(s):

Cost Model ◽

Large Body ◽

Management Systems ◽

Sample Design ◽

Approximate Query Processing ◽

Data Management Systems ◽

Optimal Sample ◽

Crucial Component ◽

Approximate Query ◽

The Cost

The rapid increase of data volumes makes sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatically determine the optimal sample for a given query remained (almost) unaddressed. To tackle this problem the authors propose a sample advisor based on a novel cost model. Primarily designed for advising samples of a few queries specified by an expert, the authors additionally propose two extensions of the sample advisor. The first extension enhances the applicability by utilizing recorded workload information and taking memory bounds into account. The second extension increases the effectiveness by merging samples in case of overlapping pieces of sample advice. For both extensions, the authors present exact and heuristic solutions. Within their evaluation, the authors analyze the properties of the cost model and demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.

Download Full-text