Research on Credit Risk Identification of Internet Financial Enterprises Based on Big Data

The advent of the era of big data has provided a new way of development for Internet financial credit collection. The traditional methods of credit risk identification of Internet financial enterprises cannot get the characteristics of credit risk zoning, leading to large errors in the results of credit risk identification. Therefore, this paper proposes a new method of credit risk identification based on big data for Internet financial enterprises. According to the big data perspective, the credit risk assessment steps of Internet financial enterprises are analyzed and the weight of assessment indicators is calculated using the improved analytic hierarchy process (AHP), and the linear weighted synthesis method is applied to comprehensively assess the credit of clients. Using the unique characteristics of big data credit risk region division, the big data credit risk is determined by rule-based matching method. The eXtreme Gradient Boosting (XGBoost) machine learning algorithm is used to establish a credit risk identification model of Internet financial enterprises. The kappa coefficient and ROC curve are used to evaluate the performance of the proposed method. Experimental results show that the proposed method can accurately assess the credit risk of Internet financial enterprises.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

An Extreme Gradient Boosting Algorithm for Short-Term Load Forecasting Using Power Grid Big Data

Proceedings of 2018 Chinese Intelligent Systems Conference - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-13-2288-4_46 ◽

2018 ◽

pp. 479-490

Author(s):

Liqiang Ren ◽

Limin Zhang ◽

Haipeng Wang ◽

Qiang Guo

Keyword(s):

Big Data ◽

Power Grid ◽

Load Forecasting ◽

Gradient Boosting ◽

Short Term ◽

Extreme Gradient Boosting ◽

Short Term Load Forecasting ◽

Boosting Algorithm

Download Full-text

Detection and Identification of Organic Pollutants in Drinking Water from Fluorescence Spectra Based on Deep Learning Using Convolutional Autoencoder

Water ◽

10.3390/w13192633 ◽

2021 ◽

Vol 13 (19) ◽

pp. 2633

Author(s):

Jie Yu ◽

Yitong Cao ◽

Fei Shi ◽

Jiegen Shi ◽

Dibo Hou ◽

...

Keyword(s):

Drinking Water ◽

Deep Learning ◽

Fluorescence Spectroscopy ◽

Organic Pollutants ◽

Learning Algorithm ◽

Three Dimensional ◽

Gradient Boosting ◽

Spectral Processing ◽

Extreme Gradient Boosting ◽

Convolutional Autoencoder

Three dimensional fluorescence spectroscopy has become increasingly useful in the detection of organic pollutants. However, this approach is limited by decreased accuracy in identifying low concentration pollutants. In this research, a new identification method for organic pollutants in drinking water is accordingly proposed using three-dimensional fluorescence spectroscopy data and a deep learning algorithm. A novel application of a convolutional autoencoder was designed to process high-dimensional fluorescence data and extract multi-scale features from the spectrum of drinking water samples containing organic pollutants. Extreme Gradient Boosting (XGBoost), an implementation of gradient-boosted decision trees, was used to identify the organic pollutants based on the obtained features. Method identification performance was validated on three typical organic pollutants in different concentrations for the scenario of accidental pollution. Results showed that the proposed method achieved increasing accuracy, in the case of both high-(>10 μg/L) and low-(≤10 μg/L) concentration pollutant samples. Compared to traditional spectrum processing techniques, the convolutional autoencoder-based approach enabled obtaining features of enhanced detail from fluorescence spectral data. Moreover, evidence indicated that the proposed method maintained the detection ability in conditions whereby the background water changes. It can effectively reduce the rate of misjudgments associated with the fluctuation of drinking water quality. This study demonstrates the possibility of using deep learning algorithms for spectral processing and contamination detection in drinking water.

Download Full-text

Prediction of West Nile Virus using Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9810.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3744-3749

Keyword(s):

West Nile Virus ◽

Random Forest ◽

Learning Algorithm ◽

Traditional Approach ◽

The United States ◽

Gradient Boosting ◽

Ensemble Classifiers ◽

Human Beings ◽

West Nile ◽

Extreme Gradient Boosting

West Nile Virus (WNV) is a disease caused by mosquitoes where human beings get infected by the mosquito’s bite. The disease is considered to be a serious threat to the society especially in the United States where it is frequently found in localities having water bodies. The traditional approach is to collect the traps of mosquitoes from a locality and check whether they are infected with virus. If there is a virus found then that locality is sprayed with pesticides. But this process is very time consuming and requires a lot of financial support. Machine learning methods can provide an efficient approach to predict the presence of virus in a locality using data related to the location and weather. This paper uses the dataset present in Kaggle which includes information related to the traps found in the locality and also about the information related to the locality’s weather. The dataset is found to be imbalanced hence Synthetic Minority Over sampling Technique (SMOTE), an upsampling method, is used to sample the dataset to balance it. Ensemble learning classifiers like random forest, gradient boosting and Extreme Gradient Boosting (XGB). The performance of ensemble classifiers is compared with the performance of the best supervised learning algorithm, SVM. Among the models, XGB gave the highest F-1 score of 92.93 by performing marginally better than random forest (92.78) and also SVM (91.16).

Download Full-text

Application of eXtreme gradient boosting trees in the construction of credit risk assessment models for financial institutions

Applied Soft Computing ◽

10.1016/j.asoc.2018.09.029 ◽

2018 ◽

Vol 73 ◽

pp. 914-920 ◽

Cited By ~ 29

Author(s):

Yung-Chia Chang ◽

Kuei-Hu Chang ◽

Guan-Jhih Wu

Keyword(s):

Risk Assessment ◽

Credit Risk ◽

Financial Institutions ◽

Gradient Boosting ◽

Credit Risk Assessment ◽

Risk Assessment Models ◽

Extreme Gradient Boosting

Download Full-text

The extraction of early warning features for the predicting financial distress based on XGboost model and shap framework

International Journal of Financial Engineering ◽

10.1142/s2424786321410048 ◽

2021 ◽

pp. 2141004

Author(s):

He Yang ◽

Emma Li ◽

Yi Fang Cai ◽

Jiapei Li ◽

George X. Yuan

Keyword(s):

Machine Learning ◽

Early Warning ◽

Financial Distress ◽

Prediction Accuracy ◽

Financial Risk ◽

Learning Algorithm ◽

Listed Companies ◽

Gradient Boosting ◽

Distress Risk ◽

Extreme Gradient Boosting

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.

Download Full-text

Using Machine Learning to Predict Invasive Bacterial Infections in Young Febrile Infants Visiting the Emergency Department

Journal of Clinical Medicine ◽

10.3390/jcm10091875 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1875

Author(s):

I-Min Chiu ◽

Chi-Yung Cheng ◽

Wun-Huei Zeng ◽

Ying-Hsien Huang ◽

Chun-Hung Richard Lin

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Bacterial Infections ◽

Clinical Symptoms ◽

Learning Algorithm ◽

Gradient Boosting ◽

P Value ◽

Young Infants ◽

Extreme Gradient Boosting ◽

Sensitivity Level

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.

Download Full-text

Extreme Gradient Boosting Machine Learning Algorithm For Safe Auto Insurance Operations

2019 IEEE International Conference on Vehicular Electronics and Safety (ICVES) ◽

10.1109/icves.2019.8906396 ◽

2019 ◽

Cited By ~ 4

Author(s):

Najmeddine Dhieb ◽

Hakim Ghazzai ◽

Hichem Besbes ◽

Yehia Massoud

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

Auto Insurance ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting

Download Full-text

Detection of Potassium Deficiency and Momentary Transpiration Rate Estimation at Early Growth Stages Using Proximal Hyperspectral Imaging and Extreme Gradient Boosting

Sensors ◽

10.3390/s21030958 ◽

2021 ◽

Vol 21 (3) ◽

pp. 958

Author(s):

Shahar Weksler ◽

Offer Rozenstein ◽

Nadav Haish ◽

Menachem Moshelion ◽

Rony Wallach ◽

...

Keyword(s):

Crop Yield ◽

Transpiration Rate ◽

Learning Algorithm ◽

Irrigation Management ◽

Stress Factors ◽

Spectral Information ◽

Gradient Boosting ◽

Growth Stages ◽

Ambient Conditions ◽

Extreme Gradient Boosting

Potassium is a macro element in plants that is typically supplied to crops in excess throughout the season to avoid a deficit leading to reduced crop yield. Transpiration rate is a momentary physiological attribute that is indicative of soil water content, the plant’s water requirements, and abiotic stress factors. In this study, two systems were combined to create a hyperspectral–physiological plant database for classification of potassium treatments (low, medium, and high) and estimation of momentary transpiration rate from hyperspectral images. PlantArray 3.0 was used to control fertigation, log ambient conditions, and calculate transpiration rates. In addition, a semi-automated platform carrying a hyperspectral camera was triggered every hour to capture images of a large array of pepper plants. The combined attributes and spectral information on an hourly basis were used to classify plants into their given potassium treatments (average accuracy = 80%) and to estimate transpiration rate (RMSE = 0.025 g/min, R2 = 0.75) using the advanced ensemble learning algorithm XGBoost (extreme gradient boosting algorithm). Although potassium has no direct spectral absorption features, the classification results demonstrated the ability to label plants according to potassium treatments based on a remotely measured hyperspectral signal. The ability to estimate transpiration rates for different potassium applications using spectral information can aid in irrigation management and crop yield optimization. These combined results are important for decision-making during the growing season, and particularly at the early stages when potassium levels can still be corrected to prevent yield loss.

Download Full-text

Machine learning for predictions of cervical cancer identification – preliminary investigation based on refractive index

10.21203/rs.3.rs-948525/v1 ◽

2021 ◽

Author(s):

Michał Kruczkowski ◽

Anna Drabik-Kruczkowska ◽

Anna Marciniak ◽

Martyna Tarczewska ◽

Monika Kosowska ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Refractive Index ◽

Early Diagnosis ◽

Learning Algorithm ◽

Optical Measurements ◽

Prediction Algorithm ◽

Gradient Boosting ◽

Extreme Gradient Boosting

Abstract Cervical cancer is one of the most commonly appearing cancers, which early diagnosis is of greatest importance. Unfortunately, many diagnoses are based on subjective opinions of doctors – to date, there is no general measurement method with a calibrated standard. The problem can be solved with the measurement system being a fusion of an optoelectronic sensor and machine learning algorithm to provide reliable assistance for doctors in the early diagnosis stage of cervical cancer. We demonstrate the preliminary research on cervical cancer assessment utilizing optical sensor and prediction algorithm. Since each matter is characterized by refractive index, measuring its value and detecting changes give information about the state of the tissue. The optical measurements provided datasets for training and validating the analyzing software. We present data preprocessing, machine learning results utilizing three algorithms (Random Forest, eXtreme Gradient Boosting, Naïve Bayes) and assessment of their performance for classification of tissue as healthy or sick. All of them provided high values (>89%) of the measures describing them. Our solution allows for rapid sample measurement and automatic classification of the results constituting a potential support tool for doctors.

Download Full-text