Abstract
Due to the problems of unbalanced data sets and distribution differences in long-term rainfall prediction, the current rainfall prediction model had poor generalization performance and could not achieve good prediction results in real scenarios. This study uses multiple atmospheric parameters (such as temperature, humidity, atmospheric pressure, etc.) to establish a TabNet-LightGbm rainfall probability prediction model. This research uses feature engineering (such as generating descriptive statistical features, feature fusion) to improve model accuracy, Borderline Smote algorithm to improve data set imbalance, and confrontation verification to improve distribution differences. The experiment uses 5 years of precipitation data from 26 stations in the Beijing-Tianjin-Hebei region of China to verify the proposed rainfall prediction model. The test set is to predict the rainfall of each station in one month. The experimental results shows that the model has good performance with AUC larger than 92%. The method proposed in this study further improves the accuracy of rainfall prediction, and provides a reference for data mining tasks.