Abstract. We theoretically and numerically investigate the problem of assimilating lidar observations of extinction and backscattering coefficients of aerosols into a chemical transport model. More specifically, we consider the inverse problem of determining the chemical composition of aerosols from these observations. The main questions are how much information the observations contain to constrain the particles' chemical composition, and how one can optimise a chemical data assimilation system to make maximum use of the available information. We first quantify the information content of the measurements by computing the singular values of the observation operator. From the singular values we can compute the number of signal degrees of freedom and the reduction in Shannon entropy. For an observation standard deviation of 10 %, it is found that simultaneous measurements of extinction and backscattering allows us to constrain twice as many model variables as extinction measurements alone. The same holds for measurements at two wavelengths compared to measurements at a single wavelength. However, when we extend the set of measurements from two to three wavelengths then we observe only a small increase in the number of signal degrees of freedom, and a minor change in the Shannon entropy. The information content is strongly sensitive to the observation error; both the number of signal degrees of freedom and the reduction in Shannon entropy steeply decrease as the observation standard deviation increases in the range between 1 and 100 %. The right singular vectors of the observation operator can be employed to transform the model variables into a new basis in which the components of the state vector can be divided into signal-related and noise-related components. We incorporate these results in a chemical data assimilation algorithm by introducing weak constraints that restrict the assimilation algorithm to acting on the signal-related model variables only. This ensures that the information contained in the measurements is fully exploited, but not over-used. Numerical experiments confirm that the constrained data assimilation algorithm solves the inverse problem in a way that automatises the choice of control variables, and that restricts the minimisation of the costfunction to the signal-related model variables.