The eigenvalue/eigenvector and linear solve problems arising in computational quantum dynamics applications (e.g. rovibrational spectroscopy, reaction cross-sections, etc.) often involve large sparse matrices that exhibit a certain block structure. In such cases, specialized iterative methods that employ optimal separable basis (OSB) preconditioners (derived from a block Jacobi diagonalization procedure) have been found to be very efficient, vis-à-vis reducing the required CPU effort on serial computing platforms. Recently,1,2 a parallel implementation was introduced, based on a nonstandard domain decomposition scheme. Near-perfect parallel scalability was observed for the OSB preconditioner construction routines up to hundreds of nodes; however, the fundamental matrix–vector product operation itself was found not to scale well, in general. In addition, the number of nodes was selectively chosen, so as to ensure perfect load balancing. In this paper, two essential improvements are discussed: (1) new algorithm for the matrix–vector product operation with greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and basis sizes. These improvements render the resultant parallel quantum dynamics codes suitable for robust application to a wide range of real molecular problems, running on massively parallel computing architectures.