TRANSFORMATION OF NESTED LOOPS WITH MODULO INDEXING TO AFFINE RECURRENCES

1994 ◽  
Vol 04 (03) ◽  
pp. 271-280 ◽  
Author(s):  
FLORIN BALASA ◽  
FRANK H.M. FRANSSEN ◽  
FRANCKY V.M. CATTHOOR ◽  
HUGO J. DE MAN

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.

Energies ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 1589
Author(s):  
Krzysztof Kołek ◽  
Andrzej Firlit ◽  
Krzysztof Piątek ◽  
Krzysztof Chmielowiec

Monitoring power quality (PQ) indicators is an important part of modern power grids’ maintenance. Among different PQ indicators, flicker severity coefficients Pst and Plt are measures of voltage fluctuations. In state-of-the-art PQ measuring devices, the flicker measurement channel is usually implemented as a dedicated processor subsystem. Implementation of the IEC 61000-4-15 compliant flicker measurement algorithm requires a significant amount of computational power. In typical PQ analysers, the flicker measurement is usually implemented as a part of the meter’s algorithm performed by the main processor. This paper considers the implementation of the flicker measurement as an FPGA module to offload the processor subsystem or operate as an IP core in FPGA-based system-on-chip units. The measurement algorithm is developed and validated as a Simulink diagram, which is then converted to a fixed-point representation. Parts of the diagram are applied for automatic VHDL code generation, and the classifier block is implemented as a local soft-processor system. A simple eight-bit processor operates within the flicker measurement coprocessor and performs statistical operations. Finally, an IP module is created that can be considered as a flicker coprocessor module. When using the coprocessor, the main processor’s only role is to trigger the coprocessor and read the results, while the coprocessor independently calculates the flicker coefficients.


2007 ◽  
Vol 4 (2) ◽  
pp. 2-26
Author(s):  
Gernot Gebhard ◽  
Philipp Lucas

Retargeting a compiler?s back end to a new architecture is a time-consuming process. This becomes an evident problem in the area of programmable graphics hardware (graphics processing units, GPUs) or embedded processors, where architectural changes are faster than elsewhere. We propose the object-oriented rewrite system OORS to overcome this problem. Using the OORS language, a compiler developer can express the code generation and optimization phase in terms of cost-annotated rewrite rules supporting complex non-linearmatching and replacing patterns. Retargetability is achieved by organizing rules into profiles, one for each supported target architecture. Featuring a rule and profile inheritance mechanism, OORS makes the reuse of existing specifications possible. This is an improvement regarding traditional approaches. Altogether OORS increases the maintainability of the compiler?s back end and thus both decreases the complexity and reduces the effort of the retargeting process. To show the potential of this approach, we have implemented a code generation and a code optimization pattern matcher supporting different target architectures using the OORS language and introduced them in a compiler of a programming language for CPUs and GPUs.


2019 ◽  
Vol 7 ◽  
pp. 661-676 ◽  
Author(s):  
Jiatao Gu ◽  
Qi Liu ◽  
Kyunghyun Cho

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm— InDIGO—which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption, and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared with the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.


Author(s):  
Imed Saad Ben Dhaou ◽  
Hannu Tenhunen

This article presents a word serial retimed architecture for the SHA-256/224 algorithm. The architecture is compliant with the dedicated-short range communication for safety message authentications. We elaborate three-operand adder architectures suitable for field programmable gate array implementation. Several transformation techniques at the data-flow-graph level have been used to derive the architecture. Synthesis results show that the architecture has high throughput/ slice value compared with state-of-the-art SHA-256 implementations. The article also promulgates a comparison between high-level synthesis and RTL design.


1990 ◽  
Vol 30 (1-5) ◽  
pp. 475-481
Author(s):  
E. Accomazzo ◽  
M. Ancona ◽  
R. Bobbio ◽  
C. Cagnassi ◽  
L. Paolin

1996 ◽  
Vol 06 (03) ◽  
pp. 401-414
Author(s):  
Jingling Xue

Although overcoming some limitations of the generate-and-test approach, unimodular transformations are limited to perfect loop nests only. Extending the unimodular approach, this paper describes a framework that enables the use of unimodular transformations to restructure imperfect loop nests. The concepts used previously for perfect loop nests, such as iteration vector, iteration space and lexicographic order, are generalised and some new concepts like preorder tree are introduced. Multiple unimodular transformations are allowed, one each statement in the loop nest. A code generation algorithm is provided that produces a possibly imperfect loop nest to scan an iteration space that is given as a union of sets of affine constraints.


2022 ◽  
Vol 6 (POPL) ◽  
pp. 1-24
Author(s):  
Wenlei He ◽  
Julián Mestre ◽  
Sergey Pupyrev ◽  
Lei Wang ◽  
Hongtao Yu

Profile-guided optimization (PGO) is an important component in modern compilers. By allowing the compiler to leverage the program’s dynamic behavior, it can often generate substantially faster binaries. Sampling-based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments. However, the lowered profile accuracy caused by sampling fully optimized binary often hurts the benefits of PGO; thus, an important problem is to overcome the inaccuracy in a profile after it is collected. In this paper we tackle the problem, which is also known as profile inference and profile rectification . We investigate the classical approach for profile inference, based on computing minimum-cost maximum flows in a control-flow graph, and develop an extended model capturing the desired properties of real-world profiles. Next we provide a solid theoretical foundation of the corresponding optimization problem by studying its algorithmic aspects. We then describe a new efficient algorithm for the problem along with its implementation in an open-source compiler. An extensive evaluation of the algorithm and existing profile inference techniques on a variety of applications, including Facebook production workloads and SPEC CPU benchmarks, indicates that the new method outperforms its competitors by significantly improving the accuracy of profile data and the performance of generated binaries.


2022 ◽  
Vol 31 (2) ◽  
pp. 1-43
Author(s):  
Katherine Hough ◽  
Jonathan Bell

Dynamic taint tracking, a technique that traces relationships between values as a program executes, has been used to support a variety of software engineering tasks. Some taint tracking systems only consider data flows and ignore control flows. As a result, relationships between some values are not reflected by the analysis. Many applications of taint tracking either benefit from or rely on these relationships being traced, but past works have found that tracking control flows resulted in over-tainting, dramatically reducing the precision of the taint tracking system. In this article, we introduce Conflux , alternative semantics for propagating taint tags along control flows. Conflux aims to reduce over-tainting by decreasing the scope of control flows and providing a heuristic for reducing loop-related over-tainting. We created a Java implementation of Conflux and performed a case study exploring the effect of Conflux on a concrete application of taint tracking, automated debugging. In addition to this case study, we evaluated Conflux ’s accuracy using a novel benchmark consisting of popular, real-world programs. We compared Conflux against existing taint propagation policies, including a state-of-the-art approach for reducing control-flow-related over-tainting, finding that Conflux had the highest F1 score on 43 out of the 48 total tests.


2018 ◽  
Vol 24 (2) ◽  
pp. 357-383 ◽  
Author(s):  
Wai Yin Mok

Purpose Concurrency is a desirable property that enhances workflow efficiency. The purpose of this paper is to propose six polynomial-time algorithms that collectively maximize control flow concurrency for Business Process Model and Notation (BPMN) workflow models. The proposed algorithms perform model-level transformations on a BPMN model during the design phase of the model, thereby improving the workflow model’s execution efficiency. Design/methodology/approach The approach is similar to source code optimization, which solely works with syntactic means. The first step makes implicit synchronizations of interdependent concurrent control flows explicit by adding parallel gateways. After that, every control flow can proceed asynchronously. The next step then generates an equivalent sequence of execution hierarchies for every control flow such that they collectively provide maximum concurrency for the control flow. As a whole, the proposed algorithms add a valuable feature to a BPMN modeling tool to maximize control flow concurrency. Findings In addition, this paper introduces the concept of control flow independence, which is a user-determined semantic property of BPMN models that cannot be obtained by any syntactic means. But, if control flow independence holds in a BPMN model, the model’s determinism is guaranteed. As a result, the proposed algorithms output a model that can be proved to be equivalent to the original model. Originality/value This paper adds value to BPMN modeling tools by providing polynomial-time algorithms that collectively maximize control flow concurrency in a BPMN model during the design phase of the model. As a result, the model’s execution efficiency will increase. Similar to source code optimization, these algorithms perform model-level transformations on a BPMN model through syntactic means; and the transformations performed to each control flow are guaranteed to be equivalent to the control flow. Furthermore, a case study on a real-life new employee preparation process is provided to demonstrate the proposed algorithms’ usefulness on increasing the process’s execution efficiency.


Sign in / Sign up

Export Citation Format

Share Document