We present a novel, compile-time method for determining the cache performance of the loop nests in a program. The cache hit-rates are produced by applying the reference string, determined during compilation, to an architecturally parameterized cache simulator. We also describe a heuristic that uses this method for compile-time optimization of loop ranges in iteration-space blocking. The results of the loop program optimizations are presented for different parallel program benchmarks and various processor architectures, such as IBM SP1 RS/6000, the SuperSPARC, and the Intel 1860.