CHARACTERIZING THE SPACE OF INTERATOMIC DISTANCE DISTRIBUTION FUNCTIONS CONSISTENT WITH SOLUTION SCATTERING DATA
Scattering of neutrons and X-rays from molecules in solution offers alternative approaches to the study of a wide range of macromolecular structures in their solution state without crystallization. We study one part of the problem of elucidating three-dimensional structure from solution scattering data, determining the distribution of interatomic distances, P(r), where r is the distance between two atoms in the protein molecule. This problem is known to be ill-conditioned: for a single observed diffraction pattern, there may be many consistent distance distribution functions, and there is a risk of overfitting the observed scattering data. We propose a new approach to avoiding this problem: accepting the validity of multiple alternative P(r) curves rather than seeking a single "best." We place linear constraints to ensure that a computed P(r) is consistent with the experimental data. The constraints enforce smoothness in the P(r) curve, ensure that the P(r) curve is a probability distribution, and allow for experimental error. We use these constraints to precisely describe the space of all consistent P(r) curves as a polytope of histogram values or Fourier coefficients. We develop a linear programming approach to sampling the space of consistent, realistic P(r) curves. On both experimental and simulated scattering data, our approach efficiently generates ensembles of such curves that display substantial diversity.