Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.