Machine-learning Interpretation of the Correlation between Infrared Emission Features of Interstellar Polycyclic Aromatic Hydrocarbons
Abstract Supervised machine-learning models are trained with various molecular descriptors to predict infrared (IR) emission spectra of interstellar polycyclic aromatic hydrocarbons. We demonstrate that a feature importance analysis based on the random forest algorithm can be utilized to explore the physical correlation between emission features. Astronomical correlations between IR bands are analyzed as examples of demonstration by finding the common molecular fragments responsible for different bands, which improves the current understanding of the long-observed correlations. We propose a way to quantify the band correlation by measuring the similarity of the feature importance arrays of different bands, by which a correlation map is obtained for emissions in the out-of-plane bending region. Moreover, a comparison between the predictions using different combinations of descriptors underscores the strong prediction power of the extended-connectivity molecular fingerprint, and shows that the combinations of multiple descriptors of other types in general lead to improved predictivity.