ABSTRACTMapping information from different brains gathered using different modalities into a common coordinate space corresponding to a reference brain is an aspirational goal in modern neuroscience, analogous in importance to mapping genomic data to a reference genome. While brain-atlas mapping workflows exist for single-modality data (3D MRI or STPT image volumes), generally speaking data sets need to be combined across modalities with different contrast mechanisms and scale, in the presence of missing data as well as signals not present in the reference. This has so far been an unsolved problem. We have solved this problem in its full generality by developing and implementing a rigorous, non-parametric generative framework, that learns unknown mappings between contrast mechanisms from data and infers missing data. Our methodology permits rigorous quantification of the local sale changes between different individual brains, which has so far been neglected. We are also able to quantitatively characterize the individual variation in shape. Our work establishes a quantitative, scalable and streamlined workflow for unifying a broad spectrum of multi-modal whole-brain light microscopic data volumes into a coordinate-based atlas framework, a step that is a prerequisite for large scale integration of whole brain data sets in modern neuroscience.SummaryA current focus of research in neuroscience is to enumerate, map and annotate neuronal cell types in whole vertebrate brains using different modalities of data acquisition. A key challenge remains: can the large multiplicities of molecular anatomical data sets from many different modalities, and at widely different scales, be all assembled into a common reference space? Solving this problem is as important for modern neuroscience as mapping to reference genomes was for molecular biology. While workable brain-to-atlas mapping workflows exist for single modalities (e.g. mapping serial two photon (STP) brains to STP references) and largely for clean data, this is generally not a solved problem for mapping across contrast modalities, where data sets can be partial, and often carry signal not present in the reference brain (e.g. tracer injections). Presenting these types of anatomical data into a common reference frame for all to use is an aspirational goal for the neuroscience community. However so far this goal has been elusive due to the difficulties pointed to above and real integration is lacking.We have solved this problem in its full generality by developing and implementing a rigorous, generative framework, that learns unknown mappings between contrast mechanisms from data and infers missing data. The key idea in the framework is to minimize the difference between synthetic image volumes and real data over function classes of non-parametric mappings, including a diffeomorphic mapping, the contrast map and locations and types of missing data/non-reference signals. The non-parametric mappings are instantiated as regularized but over-parameterized functional forms over spatial grids. A final, manual refinement step is included to ensure scientific quality of the results.Our framework permits rigorous quantification of the local metric distortions between different individual brains, which is important for quantitative joint analysis of data gathered in multiple animals. Existing methods for atlas mapping do not provide metric quantifications and analyses of the resulting individual variations. We apply this pipeline to data modalities including various combinations of in-vivo and ex-vivo MRI, 3D STP and fMOST data sets, 2D serial histology sections including a 3D reassembly step, and brains processed for snRNAseq with tissue partially removed. Median local linear scale change with respect to a histologically processed Nissl reference brain, as measured using the Jacobian of the diffeomorphic transformations, was found to be 0.93 for STPT imaged brains (7% shrinkage) and 0.84 for fMOST imaged brains (16% shrinkage between reference brains and imaged volumes). Shrinkage between in-vivo and ex-vivo MRI for a mouse brain was found to be 0.96, and the distortion between the perfused brain and tape-cut digital sections was shown to be minimal (1.02 for Nissl histology sections). We were able to quantitatively characterize the individual variation in shape across individuals by studying variations in the tangent space of the diffeomorphic transformation around the reference brain. Based on this work we are able to establish co-variation patterns in metric distortions across the entire brain, across a large population set. We note that the magnitude of individual variation is often greater than differences between different sample preparation techniques. Our work establishes a quantitative, scalable and streamlined workflow for unifying a broad spectrum of multi-modal whole-brain light microscopic data volumes into a coordinate-based atlas framework, a step that is a prerequisite for large scale integration of whole brain data sets in modern neuroscience.