TRANSFORMATION OF NESTED LOOPS WITH MODULO INDEXING TO AFFINE RECURRENCES

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.

Download Full-text

Analysis of the Practical Implementation of Flicker Measurement Coprocessor for AMI Meters

Energies ◽

10.3390/en14061589 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1589

Author(s):

Krzysztof Kołek ◽

Andrzej Firlit ◽

Krzysztof Piątek ◽

Krzysztof Chmielowiec

Keyword(s):

Code Generation ◽

State Of The Art ◽

Power Grids ◽

Practical Implementation ◽

System A ◽

Measuring Devices ◽

Point Representation ◽

Measurement Algorithm ◽

On Chip ◽

Vhdl Code

Monitoring power quality (PQ) indicators is an important part of modern power grids’ maintenance. Among different PQ indicators, flicker severity coefficients Pst and Plt are measures of voltage fluctuations. In state-of-the-art PQ measuring devices, the flicker measurement channel is usually implemented as a dedicated processor subsystem. Implementation of the IEC 61000-4-15 compliant flicker measurement algorithm requires a significant amount of computational power. In typical PQ analysers, the flicker measurement is usually implemented as a part of the meter’s algorithm performed by the main processor. This paper considers the implementation of the flicker measurement as an FPGA module to offload the processor subsystem or operate as an IP core in FPGA-based system-on-chip units. The measurement algorithm is developed and validated as a Simulink diagram, which is then converted to a fixed-point representation. Parts of the diagram are applied for automatic VHDL code generation, and the classifier block is implemented as a local soft-processor system. A simple eight-bit processor operates within the flicker measurement coprocessor and performs statistical operations. Finally, an IP module is created that can be considered as a flicker coprocessor module. When using the coprocessor, the main processor’s only role is to trigger the coprocessor and read the results, while the coprocessor independently calculates the flicker coefficients.

Download Full-text

OORS: An object-oriented rewrite system

Computer Science and Information Systems ◽

10.2298/csis0702002g ◽

2007 ◽

Vol 4 (2) ◽

pp. 2-26

Author(s):

Gernot Gebhard ◽

Philipp Lucas

Keyword(s):

Code Generation ◽

Graphics Processing Units ◽

Object Oriented ◽

Graphics Hardware ◽

Code Optimization ◽

Target Architecture ◽

Rewrite Rules ◽

Graphics Processing ◽

Traditional Approaches ◽

Rewrite System

Retargeting a compiler?s back end to a new architecture is a time-consuming process. This becomes an evident problem in the area of programmable graphics hardware (graphics processing units, GPUs) or embedded processors, where architectural changes are faster than elsewhere. We propose the object-oriented rewrite system OORS to overcome this problem. Using the OORS language, a compiler developer can express the code generation and optimization phase in terms of cost-annotated rewrite rules supporting complex non-linearmatching and replacing patterns. Retargetability is achieved by organizing rules into profiles, one for each supported target architecture. Featuring a rule and profile inheritance mechanism, OORS makes the reuse of existing specifications possible. This is an improvement regarding traditional approaches. Altogether OORS increases the maintainability of the compiler?s back end and thus both decreases the complexity and reduces the effort of the retargeting process. To show the potential of this approach, we have implemented a code generation and a code optimization pattern matcher supporting different target architectures using the OORS language and introduced them in a compiler of a programming language for CPUs and GPUs.

Download Full-text

Insertion-based Decoding with Automatically Inferred Generation Order

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00292 ◽

2019 ◽

Vol 7 ◽

pp. 661-676 ◽

Cited By ~ 3

Author(s):

Jiatao Gu ◽

Qi Liu ◽

Kyunghyun Cho

Keyword(s):

Machine Translation ◽

Real World ◽

Word Order ◽

Code Generation ◽

State Of The Art ◽

Generation Model ◽

Beam Search ◽

Input Information ◽

Sequence Generation ◽

Image Caption

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Download Full-text

A High-Throughput Architecture for the SHA-256/224 Compliant With the DSRC Standard

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/ijertcs.2019010106 ◽

2019 ◽

Vol 10 (1) ◽

pp. 98-118

Author(s):

Imed Saad Ben Dhaou ◽

Hannu Tenhunen

Keyword(s):

High Throughput ◽

Short Range ◽

State Of The Art ◽

High Level Synthesis ◽

Data Flow Graph ◽

Transformation Techniques ◽

Safety Message ◽

Dedicated Short Range Communication ◽

Field Programmable ◽

High Level

This article presents a word serial retimed architecture for the SHA-256/224 algorithm. The architecture is compliant with the dedicated-short range communication for safety message authentications. We elaborate three-operand adder architectures suitable for field programmable gate array implementation. Several transformation techniques at the data-flow-graph level have been used to derive the architecture. Synthesis results show that the architecture has high throughput/ slice value compared with state-of-the-art SHA-256 implementations. The article also promulgates a comparison between high-level synthesis and RTL design.

Download Full-text

Integrating intermediate code optimization with retargetable code generation

Microprocessing and Microprogramming ◽

10.1016/0165-6074(90)90285-h ◽

1990 ◽

Vol 30 (1-5) ◽

pp. 475-481

Author(s):

E. Accomazzo ◽

M. Ancona ◽

R. Bobbio ◽

C. Cagnassi ◽

L. Paolin

Keyword(s):

Code Generation ◽

Code Optimization

Download Full-text

GENERALISING THE UNIMODULAR APPROACH TO RESTRUCTURE IMPERFECTLY NESTED LOOPS

Parallel Processing Letters ◽

10.1142/s0129626496000388 ◽

1996 ◽

Vol 06 (03) ◽

pp. 401-414

Author(s):

Jingling Xue

Keyword(s):

Code Generation ◽

Lexicographic Order ◽

Generation Algorithm ◽

Loop Nest ◽

Loop Nests ◽

Iteration Space ◽

New Concepts ◽

Affine Constraints ◽

Nested Loops ◽

Unimodular Transformations

Although overcoming some limitations of the generate-and-test approach, unimodular transformations are limited to perfect loop nests only. Extending the unimodular approach, this paper describes a framework that enables the use of unimodular transformations to restructure imperfect loop nests. The concepts used previously for perfect loop nests, such as iteration vector, iteration space and lexicographic order, are generalised and some new concepts like preorder tree are introduced. Multiple unimodular transformations are allowed, one each statement in the loop nest. A code generation algorithm is provided that produces a possibly imperfect loop nest to scan an iteration space that is given as a union of sets of affine constraints.

Download Full-text

Profile inference revisited

Proceedings of the ACM on Programming Languages ◽

10.1145/3498714 ◽

2022 ◽

Vol 6 (POPL) ◽

pp. 1-24

Author(s):

Wenlei He ◽

Julián Mestre ◽

Sergey Pupyrev ◽

Lei Wang ◽

Hongtao Yu

Keyword(s):

Optimization Problem ◽

Theoretical Foundation ◽

State Of The Art ◽

Minimum Cost ◽

Control Flow ◽

Extended Model ◽

Extensive Evaluation ◽

Inference Techniques ◽

Maximum Flows ◽

Profile Accuracy

Profile-guided optimization (PGO) is an important component in modern compilers. By allowing the compiler to leverage the program’s dynamic behavior, it can often generate substantially faster binaries. Sampling-based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments. However, the lowered profile accuracy caused by sampling fully optimized binary often hurts the benefits of PGO; thus, an important problem is to overcome the inaccuracy in a profile after it is collected. In this paper we tackle the problem, which is also known as profile inference and profile rectification . We investigate the classical approach for profile inference, based on computing minimum-cost maximum flows in a control-flow graph, and develop an extended model capturing the desired properties of real-world profiles. Next we provide a solid theoretical foundation of the corresponding optimization problem by studying its algorithmic aspects. We then describe a new efficient algorithm for the problem along with its implementation in an open-source compiler. An extensive evaluation of the algorithm and existing profile inference techniques on a variety of applications, including Facebook production workloads and SPEC CPU benchmarks, indicates that the new method outperforms its competitors by significantly improving the accuracy of profile data and the performance of generated binaries.

Download Full-text

A Practical Approach for Dynamic Taint Tracking with Control-flow Relationships

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3485464 ◽

2022 ◽

Vol 31 (2) ◽

pp. 1-43

Author(s):

Katherine Hough ◽

Jonathan Bell

Keyword(s):

Tracking Control ◽

State Of The Art ◽

Tracking System ◽

Control Flow ◽

Tracking Systems ◽

Alternative Semantics ◽

Automated Debugging ◽

Data Flows ◽

Taint Tracking

Dynamic taint tracking, a technique that traces relationships between values as a program executes, has been used to support a variety of software engineering tasks. Some taint tracking systems only consider data flows and ignore control flows. As a result, relationships between some values are not reflected by the analysis. Many applications of taint tracking either benefit from or rely on these relationships being traced, but past works have found that tracking control flows resulted in over-tainting, dramatically reducing the precision of the taint tracking system. In this article, we introduce Conflux , alternative semantics for propagating taint tags along control flows. Conflux aims to reduce over-tainting by decreasing the scope of control flows and providing a heuristic for reducing loop-related over-tainting. We created a Java implementation of Conflux and performed a case study exploring the effect of Conflux on a concrete application of taint tracking, automated debugging. In addition to this case study, we evaluated Conflux ’s accuracy using a novel benchmark consisting of popular, real-world programs. We compared Conflux against existing taint propagation policies, including a state-of-the-art approach for reducing control-flow-related over-tainting, finding that Conflux had the highest F1 score on 43 out of the 48 total tests.

Download Full-text

Maximizing control flow concurrency in BPMN workflow models through syntactic means

Business Process Management Journal ◽

10.1108/bpmj-09-2016-0177 ◽

2018 ◽

Vol 24 (2) ◽

pp. 357-383 ◽

Cited By ~ 1

Author(s):

Wai Yin Mok

Keyword(s):

Polynomial Time ◽

Process Model ◽

Source Code ◽

Real Life ◽

Control Flow ◽

Code Optimization ◽

Design Phase ◽

Modeling Tools ◽

Content Type ◽

Polynomial Time Algorithms

Purpose Concurrency is a desirable property that enhances workflow efficiency. The purpose of this paper is to propose six polynomial-time algorithms that collectively maximize control flow concurrency for Business Process Model and Notation (BPMN) workflow models. The proposed algorithms perform model-level transformations on a BPMN model during the design phase of the model, thereby improving the workflow model’s execution efficiency. Design/methodology/approach The approach is similar to source code optimization, which solely works with syntactic means. The first step makes implicit synchronizations of interdependent concurrent control flows explicit by adding parallel gateways. After that, every control flow can proceed asynchronously. The next step then generates an equivalent sequence of execution hierarchies for every control flow such that they collectively provide maximum concurrency for the control flow. As a whole, the proposed algorithms add a valuable feature to a BPMN modeling tool to maximize control flow concurrency. Findings In addition, this paper introduces the concept of control flow independence, which is a user-determined semantic property of BPMN models that cannot be obtained by any syntactic means. But, if control flow independence holds in a BPMN model, the model’s determinism is guaranteed. As a result, the proposed algorithms output a model that can be proved to be equivalent to the original model. Originality/value This paper adds value to BPMN modeling tools by providing polynomial-time algorithms that collectively maximize control flow concurrency in a BPMN model during the design phase of the model. As a result, the model’s execution efficiency will increase. Similar to source code optimization, these algorithms perform model-level transformations on a BPMN model through syntactic means; and the transformations performed to each control flow are guaranteed to be equivalent to the control flow. Furthermore, a case study on a real-life new employee preparation process is provided to demonstrate the proposed algorithms’ usefulness on increasing the process’s execution efficiency.

Download Full-text