Detecting Financial Statement Fraud with Interpretable Machine Learning
Abstract In this study, we explored a stable and explainable model in the detection of financial fraud. To effectively handle imbalanced datasets, we selected the Smote oversampling algorithm with the highest AUC value and compared it with Borderline Smote and ADASYN algorithms. Using the MCB method, we found that the Adaptive Lasso algorithm had higher stability than SCAD, MCP, Stepwise, and SQRT Lasso algorithms. Moreover, the AUC value was improved by WoE encoding and IV value testing of the features. Finally, we ranked the fraud factors based on the importance of the features, and the partial dependence function was used to make the model interpretable. By comparing the AUC and KS values, the integrated models XGBoost, LightGBM, and RF showed better ability to identify financial fraud compared with traditional models such as SVM and LR.