Abstract
The efficiency of D4 Gaussian elimination on a vector computer, the Cray- 1/S, it examined. The algorithm used in this work is employed routinely in Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar Phillips Petroleum Co. reservoir simulation models. Comparisons of scalar and vector Cray-1/S times are given for various example cases including multiple unknowns per gridblock. Vectorization of the program on the Cray- 1/S is discussed.
Introduction
In reservoir simulation, the solution of large systems of linear equations accounts for a substantial percentage of the computation time. Methods used today consist of both iterative and direct solution algorithms. Because of the theoretical savings in both storage and computing labor, D4 Gaussian elimination is a popular direct solution algorithm and is used widely on conventional scalar computers. In this paper we investigate the efficiency of the D4 algorithm on a computer with vector processing capabilities-the Cray-1/S. The D4 (or alternate diagonal) algorithm originally was presented by Price and Coats in 1973. Since that time much work has been done on the Price and Coats in 1973. Since that time much work has been done on the algorithm including an investigation by Nolen on the vector performance of D4 on the CDC Star 100 and Cyber 203 on single-unknown-per-gridblock example cases. Levesque has presented a comparison of the Cray-1 and Cyber 205 in reservoir simulation that includes the D4 algorithm. Vector performance of the Cray-1 on linear algebra kernels, both sparse and dense, performance of the Cray-1 on linear algebra kernels, both sparse and dense, also has been reported. Vector performance on these kernels typically is expressed in terms of million floating point operations per second (MFLOPS). Our objective here is to evaluate vector performance on a typical production code written in FORTRAN for a scalar computer. Therefore, performance, or efficiency, will be evaluated in terms of both scalar and vector CPU times on the Cray-1/S. We include vector performance on the original code with automatic vectorization enabled, and vector performance on the same code with minor restructuring, automatic performance on the same code with minor restructuring, automatic vectorization enabled, and the use of Cray assembly language (CAL) basic linear algebra kernels. Example cases for multiple unknowns per gridblock are presented.
Reservoir Flow Equations
The reservoir flow equations written using a seven-point finite difference formulation can be expressed as
...........................(1)
where the terms A, B... G are matrices of order N equal to the number of unknowns per gridblock. represents the vector of unknowns at cell i, j, k, and H is the vector of residuals of the flow equations at cell i, j, k at iteration . Values of N from 1 to 10 typically are encountered depending on the type of simulator and the degree of implicitness used. For example, N is equal to one for an implicit pressure, explicit saturation (IMPES) black-oil model; three for a fully implicit black-oil model; five for an implicit three-component steamflood model and usually 10 or less for an implicit compositional model.
Driver Program
To facilitate timing studies in this work, a driver program was written to calculate coefficients for the D4 Gaussian elimination routine. Input to the program consists of grid dimensions and the number of unknowns per gridblock. All elements of the off-diagonal matrices (A, C, D... G) were set equal to 1. To guarantee a nonsingular solution, the B matrix was set equal to -5 for one unknown and as below for N unknowns.
............................(2)
Right-side coefficients, H, were calculated by assuming a unit solution for . No-flow boundary conditions were used, which require specific matrices, such as A for I = 1 and C for I = NX, to be set equal to zero.
Description of Hardware and Software
All run times reported in this work were obtained on the Cray-1/S, Serial No. 23, at United Computing Systems in Kansas City, MO. Serial No. 23 contains 1 million 64-bit words of central memory interleaved in 16 memory banks and no input/output (I/O) subsystems. The FORTRAN compiler used was CFT 1.09. CPU times were obtained by calling SECOND, a FORTRAN-callable utility routine that returns CPU time since the start of the job in FPS'S. CPU overhead incurred for each call to SECOND is approximately 2.5 microseconds. For all reported Cray-1/S times, "vector" refers to the original FORTRAN code run with automatic vectorization enabled, which is the normal operating mode.
SPEJ
p. 121