Efficient large-scale replica-exchange simulations on production infrastructure
Replica-exchange (RE) algorithms are used to understand physical phenomena—ranging from protein folding dynamics to binding affinity calculations. They represent a class of algorithms that involve a large number of loosely coupled ensembles, and are thus amenable to using distributed resources. We develop a framework for RE that supports different replica pairing (synchronous versus asynchronous) and exchange coordination mechanisms (centralized versus decentralized) and which can use a range of production cyberinfrastructures concurrently. We characterize the performance of both RE algorithms at an unprecedented number of cores employed—the number of replicas and the typical number of cores per replica—on the production distributed infrastructure. We find that the asynchronous algorithms outperform the synchronous algorithms, even though details of the specific implementations are important determinants of performance.