Human-efficient labeling of a solar flux emergence video dataset by a deep learning model
Abstract Machine learning is becoming a critical tool for interrogation of large complex data. However, labeling large datasets is time-consuming. Here we show that convolutional neural networks (CNNs), trained on crudely labeled astronomical videos, can be leveraged to improve the quality of data labeling and reduce the need for human intervention. We use videos of the solar photospheric magnetic field, crudely labeled into two classes: emergence or non-emergence of large bipolar magnetic regions (BMRs). We train the CNN using crude labeling, manually verify, correct labeling vs. CNN disagreements, and repeat this process until convergence. This results in a high-quality labeled dataset requiring the manual verification of only ~50% of all videos. Furthermore, by gradually masking the videos and looking for maximum change in CNN inference, we locate BMR emergence time without retraining the CNN. This demonstrates the versatility of CNNs for simplifying the challenging task of labeling complex dynamic events.