In the discretization of the 3‐D partial differential equations of many physics problems, it is found that the resultant system of linear equations can be represented by a block tridiagonal matrix. Depending on the substructure of the blocks, one can devise many algorithms for the solution of these systems. For plasma physics problems of interest to the authors, several interesting matrix problems arise that should be useful in other applications as well. In one case, where the blocks are dense, it was found that by using a multitasked cyclic reduction procedure, it was possible to reach gigaflop rates on a Cray‐2 for the direct solve of these large linear systems. The recently built code PAMS (parallelized matrix solver) embodies this technique and uses fast vendor‐supplied routines and obtains this good performance. Manipulations within the blocks are done by these highly optimized linear algebra subroutines that exploit vectorization as well as overlap of the functional units within each CPU. In unitasking mode, speeds well above 340 Mflops have been measured. The cyclic reduction method multitasks quite well with overlap factors in the range of three to four. In multitasking mode, average speeds of 1.1 gigaflops have been measured for the entire PAMS algorithm. In addition to the presentation of the PAMS algorithm, it is shown how related systems having banded blocks may be treated efficiently by multitasked cyclic reduction in the Cray‐2 multiprocessor environment. The PAMS method is intended for multiprocessors and would not be a method of choice on a uniprocessor. Furthermore, this method’s advantage was found to be critically dependent on the hardware, software, and charging algorithm installed on any given multiprocessor system.

This content is only available via PDF.