scholarly journals Research on Credit Risk Identification of Internet Financial Enterprises Based on Big Data

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Hua Peng

The advent of the era of big data has provided a new way of development for Internet financial credit collection. The traditional methods of credit risk identification of Internet financial enterprises cannot get the characteristics of credit risk zoning, leading to large errors in the results of credit risk identification. Therefore, this paper proposes a new method of credit risk identification based on big data for Internet financial enterprises. According to the big data perspective, the credit risk assessment steps of Internet financial enterprises are analyzed and the weight of assessment indicators is calculated using the improved analytic hierarchy process (AHP), and the linear weighted synthesis method is applied to comprehensively assess the credit of clients. Using the unique characteristics of big data credit risk region division, the big data credit risk is determined by rule-based matching method. The eXtreme Gradient Boosting (XGBoost) machine learning algorithm is used to establish a credit risk identification model of Internet financial enterprises. The kappa coefficient and ROC curve are used to evaluate the performance of the proposed method. Experimental results show that the proposed method can accurately assess the credit risk of Internet financial enterprises.

Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 202
Author(s):  
Ge Gao ◽  
Hongxin Wang ◽  
Pengbin Gao

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.


Water ◽  
2021 ◽  
Vol 13 (19) ◽  
pp. 2633
Author(s):  
Jie Yu ◽  
Yitong Cao ◽  
Fei Shi ◽  
Jiegen Shi ◽  
Dibo Hou ◽  
...  

Three dimensional fluorescence spectroscopy has become increasingly useful in the detection of organic pollutants. However, this approach is limited by decreased accuracy in identifying low concentration pollutants. In this research, a new identification method for organic pollutants in drinking water is accordingly proposed using three-dimensional fluorescence spectroscopy data and a deep learning algorithm. A novel application of a convolutional autoencoder was designed to process high-dimensional fluorescence data and extract multi-scale features from the spectrum of drinking water samples containing organic pollutants. Extreme Gradient Boosting (XGBoost), an implementation of gradient-boosted decision trees, was used to identify the organic pollutants based on the obtained features. Method identification performance was validated on three typical organic pollutants in different concentrations for the scenario of accidental pollution. Results showed that the proposed method achieved increasing accuracy, in the case of both high-(>10 μg/L) and low-(≤10 μg/L) concentration pollutant samples. Compared to traditional spectrum processing techniques, the convolutional autoencoder-based approach enabled obtaining features of enhanced detail from fluorescence spectral data. Moreover, evidence indicated that the proposed method maintained the detection ability in conditions whereby the background water changes. It can effectively reduce the rate of misjudgments associated with the fluctuation of drinking water quality. This study demonstrates the possibility of using deep learning algorithms for spectral processing and contamination detection in drinking water.


West Nile Virus (WNV) is a disease caused by mosquitoes where human beings get infected by the mosquito’s bite. The disease is considered to be a serious threat to the society especially in the United States where it is frequently found in localities having water bodies. The traditional approach is to collect the traps of mosquitoes from a locality and check whether they are infected with virus. If there is a virus found then that locality is sprayed with pesticides. But this process is very time consuming and requires a lot of financial support. Machine learning methods can provide an efficient approach to predict the presence of virus in a locality using data related to the location and weather. This paper uses the dataset present in Kaggle which includes information related to the traps found in the locality and also about the information related to the locality’s weather. The dataset is found to be imbalanced hence Synthetic Minority Over sampling Technique (SMOTE), an upsampling method, is used to sample the dataset to balance it. Ensemble learning classifiers like random forest, gradient boosting and Extreme Gradient Boosting (XGB). The performance of ensemble classifiers is compared with the performance of the best supervised learning algorithm, SVM. Among the models, XGB gave the highest F-1 score of 92.93 by performing marginally better than random forest (92.78) and also SVM (91.16).


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


2021 ◽  
Vol 10 (9) ◽  
pp. 1875
Author(s):  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
Wun-Huei Zeng ◽  
Ying-Hsien Huang ◽  
Chun-Hung Richard Lin

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 958
Author(s):  
Shahar Weksler ◽  
Offer Rozenstein ◽  
Nadav Haish ◽  
Menachem Moshelion ◽  
Rony Wallach ◽  
...  

Potassium is a macro element in plants that is typically supplied to crops in excess throughout the season to avoid a deficit leading to reduced crop yield. Transpiration rate is a momentary physiological attribute that is indicative of soil water content, the plant’s water requirements, and abiotic stress factors. In this study, two systems were combined to create a hyperspectral–physiological plant database for classification of potassium treatments (low, medium, and high) and estimation of momentary transpiration rate from hyperspectral images. PlantArray 3.0 was used to control fertigation, log ambient conditions, and calculate transpiration rates. In addition, a semi-automated platform carrying a hyperspectral camera was triggered every hour to capture images of a large array of pepper plants. The combined attributes and spectral information on an hourly basis were used to classify plants into their given potassium treatments (average accuracy = 80%) and to estimate transpiration rate (RMSE = 0.025 g/min, R2 = 0.75) using the advanced ensemble learning algorithm XGBoost (extreme gradient boosting algorithm). Although potassium has no direct spectral absorption features, the classification results demonstrated the ability to label plants according to potassium treatments based on a remotely measured hyperspectral signal. The ability to estimate transpiration rates for different potassium applications using spectral information can aid in irrigation management and crop yield optimization. These combined results are important for decision-making during the growing season, and particularly at the early stages when potassium levels can still be corrected to prevent yield loss.


2021 ◽  
Author(s):  
Michał Kruczkowski ◽  
Anna Drabik-Kruczkowska ◽  
Anna Marciniak ◽  
Martyna Tarczewska ◽  
Monika Kosowska ◽  
...  

Abstract Cervical cancer is one of the most commonly appearing cancers, which early diagnosis is of greatest importance. Unfortunately, many diagnoses are based on subjective opinions of doctors – to date, there is no general measurement method with a calibrated standard. The problem can be solved with the measurement system being a fusion of an optoelectronic sensor and machine learning algorithm to provide reliable assistance for doctors in the early diagnosis stage of cervical cancer. We demonstrate the preliminary research on cervical cancer assessment utilizing optical sensor and prediction algorithm. Since each matter is characterized by refractive index, measuring its value and detecting changes give information about the state of the tissue. The optical measurements provided datasets for training and validating the analyzing software. We present data preprocessing, machine learning results utilizing three algorithms (Random Forest, eXtreme Gradient Boosting, Naïve Bayes) and assessment of their performance for classification of tissue as healthy or sick. All of them provided high values (>89%) of the measures describing them. Our solution allows for rapid sample measurement and automatic classification of the results constituting a potential support tool for doctors.


Sign in / Sign up

Export Citation Format

Share Document