scholarly journals Identification of Accounting Fraud Based on Support Vector Machine and Logistic Regression Model

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Rongyuan Qin

The authenticity of the company’s accounting information is an important guarantee for the effective operation of the capital market. Accounting fraud is the tampering and distortion of the company’s public disclosure information. The continuous outbreak of fraud cases has dealt a heavy blow to the confidence of investors, shaken the credit foundation of the capital market, and hindered the healthy and stable development of the capital market. Therefore, it is of great theoretical and practical significance to carry out the research on the identification and governance of accounting fraud. Traditionally, accounting fraud identification is mostly based on linear thinking to build the fraud identification model. However, more and more studies show that fraud has typical nonlinear characteristics, and the multiobjective of fraud means also determines the limitations of using the linear model for identification. Considering that the traditional identification methods may have the defects of model setting error and insufficient information extraction, this paper constructs the support vector machine and logistic regression model to identify accounting fraud. The support vector machine is used to improve the learning ability and generalization ability of unknown phenomena, and the explanatory power of each variable to the whole model is identified by the logistic regression model. This paper breaks through the linear constraint hypothesis and explores the model setting form which is more suitable for the law of corporate fraud behaviour to extract the fraud identification information more fully and provide more powerful support for investors to effectively identify fraud.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xinyun Liu ◽  
Jicheng Jiang ◽  
Lili Wei ◽  
Wenlu Xing ◽  
Hailong Shang ◽  
...  

Abstract Background Machine learning (ML) can include more diverse and more complex variables to construct models. This study aimed to develop models based on ML methods to predict the all-cause mortality in coronary artery disease (CAD) patients with atrial fibrillation (AF). Methods A total of 2037 CAD patients with AF were included in this study. Three ML methods were used, including the regularization logistic regression, random forest, and support vector machines. The fivefold cross-validation was used to evaluate model performance. The performance was quantified by calculating the area under the curve (AUC) with 95% confidence intervals (CI), sensitivity, specificity, and accuracy. Results After univariate analysis, 24 variables with statistical differences were included into the models. The AUC of regularization logistic regression model, random forest model, and support vector machines model was 0.732 (95% CI 0.649–0.816), 0.728 (95% CI 0.642–0.813), and 0.712 (95% CI 0.630–0.794), respectively. The regularization logistic regression model presented the highest AUC value (0.732 vs 0.728 vs 0.712), specificity (0.699 vs 0.663 vs 0.668), and accuracy (0.936 vs 0.935 vs 0.935) among the three models. However, no statistical differences were observed in the receiver operating characteristic (ROC) curve of the three models (all P > 0.05). Conclusion Combining the performance of all aspects of the models, the regularization logistic regression model was recommended to be used in clinical practice.


2015 ◽  
Vol 39 (3) ◽  
pp. 71-91 ◽  
Author(s):  
Dorien Herremans ◽  
Kenneth Sörensen ◽  
David Martens

In this article a number of musical features are extracted from a large musical database and these were subsequently used to build four composer-classification models. The first two models, an if–then rule set and a decision tree, result in an understanding of stylistic differences between Bach, Haydn, and Beethoven. The other two models, a logistic regression model and a support vector machine classifier, are more accurate. The probability of a piece being composed by a certain composer given by the logistic regression model is integrated into the objective function of a previously developed variable neighborhood search algorithm that can generate counterpoint. The result is a system that can generate an endless stream of contrapuntal music with composer-specific characteristics that sounds pleasing to the ear. This system is implemented as an Android app called FuX.


2020 ◽  
Author(s):  
Yao Tan ◽  
Ling Huo ◽  
Shu Wang ◽  
Cuizhi Geng ◽  
Yi Li ◽  
...  

Abstract Background: The accuracy of breast cancer (BC) screening based on conventional ultrasound imaging examination largely depends on the experience of clinicians. Further, the effectiveness of BC screening and diagnosis in primary hospitals need to be improved. This study aimed to establish and evaluate the usefulness of a simple, practical, and easy-to-promote machine learning model based on ultrasound imaging features for diagnosing BC.Methods: Logistic regression, random forest, extra trees, support vector, multilayer perceptron, and XG boost models were developed. The modeling data set was divided into a training set and test set in a 75%:25% ratio, and these were used to establish the models and test their performance, respectively. The validation data set of primary hospitals was used for external validation of the model. The area under the receiver operating characteristic curve (AUC) was used as the main evaluation index, and pathological biopsy was used as the gold standard for evaluating each model. Diagnostic capability was also compared with those of clinicians. Results: Among the six models, the logistic model showed superior capability, with an AUC of 0.771 and 0.906 in the test and validation sets, respectively, and Brier scores of 0.18 and 0.165. The AUC of the logistic model in tertiary class A hospitals and primary hospitals was 0.875 and 0.921, respectively. The AUCs of the clinician diagnosis and the logistic model were 0.913 and 0.906. Their AUCs in the tertiary class A hospitals were 0.890 and 0.875, respectively, and were 0.924 and 0.921 in primary hospitals, respectively. Conclusions: The logistic regression model has better overall performance in primary hospitals, and the logistic regression model can be further extended to the basic level. A more balanced clinical prediction model can be further established on the premise of improving accuracy to assist clinicians in decision making and improve diagnosis.Trial Registration: http://www.clinicaltrials.gov. ClinicalTrials.gov ID: NCT03080623.


2021 ◽  
Vol 27 ◽  
Author(s):  
Ning Xu ◽  
Hui Guo ◽  
Xurui Li ◽  
Qian Zhao ◽  
Jianguo Li

Background: Acute respiratory distress syndrome (ARDS) is a frequent and serious complication of sepsis without specific and sensitive diagnostic signatures.Methods: The mRNA profiles, including 60 blood samples with sepsis-induced ARDS and 86 blood samples with sepsis alone, were obtained from the Gene Expression Omnibus (GEO). The differently expressed genes (DEGs) were analyzed by limma package of R language. Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were carried out using the clusterProfiler package of R. Eventually, multivariate logistic regression model was established through the glm function of R, and support vector machine (SVM) model was constructed via the e1071 package of R.Results: A total of 242 DEGs in GSE32707 and 102 DEGs in GSE66890 were identified. Notably, five genes exhibited significant differences between the two datasets and were considered to be closely associated with the occurrence of ARDS induced by sepsis. Furthermore, functional enrichment analysis based on the DEGs showed there were 80 overlapped GO terms and one KEGG pathway which were significantly enriched in the two datasets. The logistic regression model and SVM model constructed could efficiently distinguish sepsis patients with or without ARDS.Conclusion: In brief, our study suggested that NKG7, SPTA1, FGL2, RGS2, and IFI27 might be potential diagnostic signatures for sepsis-induced ARDS, which contributed to the future exploration in mechanism of ARDS occurrence and development.


Sign in / Sign up

Export Citation Format

Share Document