Optimal Basic Block Instruction Scheduling for Multiple-Issue Processors Using Constraing Programming

Instruction scheduling algorithms are used in compilers to reduce run-time delays for the compiled code by the reordering or transformation of program statements, usually at the intermediate language or assembly code level. Considerable research has been carried out on scheduling code within the scope of basic blocks, i.e., straight line sections of code, and very effective basic block schedulers are now included in most modern compilers and especially for pipeline processors. In previous work Golumbic and Rainis: IBM J. Res. Dev., Vol. 34, pp.93–97, 1990, we presented code replication techniques for scheduling beyond the scope of basic blocks that provide reasonable improvements of running time of the compiled code, but which still leaves room for further improvement. In this article we present a new method for scheduling beyond basic blocks called SHACOOF. This new technique takes advantage of a conventional, high quality basic block scheduler by first suppressing selected subsequences of instructions and then scheduling the modified sequence of instructions using the basic block scheduler. A candidate subsequence for suppression can be found by identifying a region of a program control flow graph, called an S-region, which has a unique entry and a unique exit and meets predetermined criteria. This enables scheduling of a sequence of instructions beyond basic block boundaries, with only minimal changes to an existing compiler, by identifying beneficial opportunities to cover delays that would otherwise have been beyond its scope.

Download Full-text

OPTIMAL BASIC BLOCK INSTRUCTION SCHEDULING FOR MULTIPLE-ISSUE PROCESSORS USING CONSTRAINT PROGRAMMING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008003765 ◽

2008 ◽

Vol 17 (01) ◽

pp. 37-54 ◽

Cited By ~ 12

Author(s):

ABID M. MALIK ◽

JIM McINNES ◽

PETER VAN BEEK

Keyword(s):

Constraint Programming ◽

Fundamental Problem ◽

Instruction Scheduling ◽

Minimum Length ◽

Basic Block ◽

Exit Point ◽

Benchmark Suite ◽

Constraint Model ◽

Basic Blocks ◽

Optimal Scheduler

Instruction scheduling is one of the most important steps for improving the performance of object code produced by a compiler. A fundamental problem that arises in instruction scheduling is to find a minimum length schedule for a basic block — a straight-line sequence of code with a single entry point and a single exit point — subject to precedence, latency, and resource constraints. Solving the problem exactly is NP-complete, and heuristic approaches are currently used in most compilers. In contrast, we present a scheduler that finds provably optimal schedules for basic blocks using techniques from constraint programming. In developing our optimal scheduler, the key to scaling up to large, real problems was in the development of preprocessing techniques for improving the constraint model. We experimentally evaluated our optimal scheduler on the SPEC 2000 integer and floating point benchmarks. On this benchmark suite, the optimal scheduler was very robust — all but a handful of the hundreds of thousands of basic blocks in our benchmark suite were solved optimally within a reasonable time limit — and scaled to the largest basic blocks, including basic blocks with up to 2600 instructions. This compares favorably to the best previous exact approaches.

Download Full-text

Instruction Scheduling Algorithm for Register File Connectivity Clustered VLIW Architecture

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2008.00127 ◽

2009 ◽

Vol 31 (1) ◽

pp. 127-132

Author(s):

Zhi-Xiong ZHOU ◽

Hu HE ◽

Xu YANG ◽

Yan-Jun ZHANG ◽

Yi-He SUN

Keyword(s):

Scheduling Algorithm ◽

Instruction Scheduling ◽

Register File

Download Full-text

Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings. ◽

10.1109/micro.2002.1176244 ◽

2003 ◽

Cited By ~ 6

Author(s):

E. Gibert ◽

J. Sanchez ◽

A. Gonzalez

Keyword(s):

Effective Instruction ◽

Instruction Scheduling ◽

Vliw Processor

Download Full-text

The Marion system for retargetable instruction scheduling

Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation - PLDI '91 ◽

10.1145/113445.113465 ◽

1991 ◽

Cited By ~ 22

Author(s):

David G. Bradlee ◽

Robert R. Henry ◽

Susan J. Eggers

Keyword(s):

Instruction Scheduling

Download Full-text

Design and implementation of C-MEX S-functions in an Android-based networked control system laboratory

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211026805 ◽

2021 ◽

pp. 014233122110268

Author(s):

Lei Cao ◽

Guo-Ping Liu ◽

Wenshan Hu ◽

Jahan Zaib Bhatti

Keyword(s):

Control System ◽

Networked Control System ◽

Networked Control ◽

Control Algorithms ◽

Basic Block ◽

Air Bearing ◽

Experiment Validation ◽

Functional Blocks ◽

New Feature ◽

Three Degree Of Freedom

The Android-based networked control system laboratory (NCSLab) is a remote control laboratory that adopts an extensible architecture, mainly including Android mobile devices, MATLAB servers, controllers and test rigs. In order to conduct various simulations and experiments more effectively in NCSLab, the first key issue that needs to be solved is to enable users to design their own control algorithms or functional blocks on the Android client, rather than just using the basic block libraries provided by the system. So, this paper proposes and implements a scheme for Android-based compilation of C-MEX S-functions. With this new feature, users can design personalized algorithm according to their requirements in the form of S-functions, which can be called and executed after being compiled by MATLAB server. Finally, through the experiment validation of the three-degree-of-freedom air bearing spacecraft platform, it is proved that the method of Android-based C-MEX S-functions is reliable and efficient, and this scheme well enhances the functionality and mobility of Android-based NCSLab.

Download Full-text

Integrating register allocation and instruction scheduling for RISCs

ACM SIGPLAN Notices ◽

10.1145/106973.106986 ◽

1991 ◽

Vol 26 (4) ◽

pp. 122-131 ◽

Cited By ~ 3

Author(s):

David G. Bradlee ◽

Susan J. Eggers ◽

Robert R. Henry

Keyword(s):

Register Allocation ◽

Instruction Scheduling

Download Full-text

Machine Learning–enabled Scalable Performance Prediction of Scientific Codes

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3450264 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-28

Author(s):

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Phill Romero ◽

Stephan Eidenbenz

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Radiation Transport ◽

Discrete Event ◽

Basic Block ◽

Distribution Models ◽

Scientific Application ◽

High Level ◽

Access Patterns

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text