AbstractRaman spectroscopy has the ability to retrieve molecular information from live biological samples non-invasively through optical means. Coupled with machine learning, it is possible to use the large amount of information contained in a Raman spectrum to create models that can predict the state of new samples based on statistical analysis from previous measurements. Furthermore, in case of linear models, the separation coefficients can be used to interpret which bands are contributing to the discrimination between experimental conditions, which correspond here to single-cell measurements of macrophages under in vitro immune stimulation. We here evaluate a typical linear method using discriminant analysis and PCA, and compare it to regularized logistic regression (Lasso). We find that the use of PCA is not beneficial to the classification performance. Furthermore, the Lasso approach yields sparse separation vectors, since it suppresses spectral coefficients which do not improve classification, making interpretation easier. To further evaluate the approach, we apply the Lasso technique to a well-defined case where protein synthesis is inhibited, and show that the separating features are consistent with RNA accumulation and protein levels depletion. Surprisingly, when Raman features are selected purely in terms of their classification power (Lasso), the selected coefficients are contained in side bands, while typical strong Raman peaks are not present in the discrimination vector. We propose that this occurs because large Raman bands are representative of a wide variety of cellular molecules and are therefore less suited for accurate classification.