Multimedia algorithms deal with enormous amounts of data transfers and storage,
resulting in huge bandwidth requirements at the off-chip memory and system bus level.
As a result the related energy consumption becomes critical. Even for execution time the
bottleneck can shift from the CPU to the external bus load. This paper demonstrates
a systematic software approach to reduce this system bus load. It consists of source-to-source
code transformations, that have to be applied before the conventional ILP
compilation. To illustrate this we use a cavity detection algorithm for medical imaging,
that is mapped on an Intel Pentium® II processor.