Identification of new particle formation events with deep learning
Abstract. New particle formation (NPF) in the atmosphere is globally an important source of climate relevant aerosol particles. Occurrence of NPF events is typically analyzed manually by researchers from particle size distribution data day by day, which is time consuming and the classification of event types may be inconsistent. To get more reliable and consistent results, the NPF event analysis should be automatized. We have developed an automatic analysis method based on deep learning, a subarea of machine learning, for NPF event identification. To our knowledge, this is the first time when NPF events have been successfully classified automatically into different classes from particle size distribution images. The developed method is based on image analysis of particle size distributions using a pre-trained deep Convolutional Neural Networks (CNN), named AlexNet, which was transfer learned to recognize NPF event classes (six different types). In transfer learning, a partial set of particle size distribution images were used in the training stage of the CNN and the rest of images for testing the success of the training. The method was utilized for a 15-year long dataset measured at San Pietro Capofiume in Italy. We studied performance of the training with different training and testing image number ratios as well as with different regions of interest in the images. The results show that clear event (i.e., Classes 1 and 2) and non-event days can be identified with an accuracy of ca. 80 %, when the CNN classification is compared with that of an expert, which is a good first result for automatic NPF event analysis. In the event classification, the choice between different event classes is not an easy task even for trained researchers, thus overlapping or confusion between different classes occurs. Hence, we cross validated the learning results of CNN with the expert made classification. The results show that the overlapping occurs typically between the adjacent or similar type of classes, e.g., a manually classified Class 1 is categorized mainly into Classes 1 and 2 by CNN, indicating that the manual and CNN classifications are very consist for the most of the days. The classification would be more consistent, by both human and CNN, if only two different classes are used for event days instead of three classes. Thus, we recommend that in the future analysis, event days should be categorized into classes of Quantifiable (i.e. clear events, Classes 1 and 2) and Non-Quantifiable (i.e. weak events, Class 3). This would better describe the difference of those classes: both formation and growth rates can be determined for Quantifiable days but not both for Non-Quantifiable days. Furthermore, we investigated more deeply the days that are classified as clear events by experts and recognized as non-events by the CNN and vice versa. Clear misclassifications seem to occur more commonly in manual analysis than in the CNN categorization, which is mostly due to the inconsistency in the human-made classification or errors in the booking of the event class. In general, the automatic CNN classifier has a better reliability and repeatability in NPF event classification than human-made classification and, thus, the transfer learned pre-trained CNNs are powerful tools to analyze long-term datasets. The developed NPF event classifier can be easily utilized to analyze any long-term datasets more accurately and consistently, which helps us to understand in detail aerosol-climate interactions and the long-term effects of climate change on NPF in the atmosphere.