Predicting Public Procurement Irregularity: An Application of Neural Networks

2018 ◽  
Vol 15 (1) ◽  
pp. 141-154 ◽  
Author(s):  
Ting Sun ◽  
Leonardo J. Sales

ABSTRACT Using the data describing the characteristics of contractors provided by the Comptroller General of the Union, Brazil (CGU), this paper mainly implements two artificial neural networks, traditional neural network (TNN) and deep neural network (DNN), to develop prediction models of public procurement irregularities designed for the initial screening of contractors. This is the first application of DNN in the context of government auditing. To examine the effectiveness of DNN, the authors compare its predictive performance to TNN and two other algorithms (logistic regression and discriminant function analysis) and find that DNN significantly outperforms TNN and other algorithms in terms of accuracy, precision, F-scores, AUC, and other metrics, as suggested by the high Z-scores of the Z-tests. Although TNN has a higher recall than DNN, the difference of recall between TNN and DNN is insignificant. Logistic regression and discriminant function analysis achieve the highest recall scores, but their Z-scores are much lower than those of other metrics. Therefore, DNN generally performs more accurately than other approaches and meets the requirement of the CGU for an early alarm system.

Author(s):  
Easwaran Iyer ◽  
Vinod Kumar Murti

Logistic Regression is one of the popular techniques used for bankruptcy prediction and its popularity is attributed due to its robust nature in terms of data characteristics. Recent developments have explored Artificial Neural Networks for bankruptcy prediction. In this study, a paired sample of 174 cases of Indian listed manufacturing companies have been used for building bankruptcy prediction models based on Logistic Regression and Artificial Neural Networks. The time period of study was year 2000 through year 2009. The classification accuracies have been compared for built models and for hold-out sample of 44 paired cases. In analysis and hold-out samples, both the models have shown appreciable classification results, three years prior to bankruptcy. Thus, both the models can be used (by banks, SEBI etc.) for bankruptcy prediction in Indian Context, however, Artificial Neural Network has shown marginal supremacy over Logistic Regression.


2020 ◽  
Vol 65 (5) ◽  
pp. 1685-1691
Author(s):  
Bjørn Peare Bartholdy ◽  
Elena Sandoval ◽  
Menno L. P. Hoogland ◽  
Sarah A. Schrader

1996 ◽  
Vol 42 (4) ◽  
pp. 604-612 ◽  
Author(s):  
J S Jørgensen ◽  
J B Pedersen ◽  
S M Pedersen

Abstract We investigated several aspects of using neural networks as a diagnostic tool: the design of an optimal network, the amount of patients' data needed to train the network, the question of training the network optimally while avoiding overfitting, and the influence of redundant variables. The specific clinical problem chosen for illustration was the diagnosis of acute myocardial infarction, given only the electrocardiogram and the concentration of potassium in serum at the time of admission. We found that, in contrast to usual practice, the termination of the training process should be based on the generalization performance and not on the training performance. We also found that a principal component analysis can be used to eliminate redundant variables, thereby reducing the data space. The diagnostic performance of the neural network we used was 78%--superior to that of linear discriminant function analysis but similar to that of quadratic discriminant function analysis.


MAUSAM ◽  
2021 ◽  
Vol 67 (4) ◽  
pp. 913-918
Author(s):  
VANDITA KUMARI ◽  
RANJANA AGRAWAL ◽  
AMRENDER KUMAR

The performance of ordinal logistic regression and discriminant function analysis has been compared in crop yield forecasting of wheat crop for Kanpur district of Uttar Pradesh. Crop years were divided into two or three groups based on the detrended yield. Crop yield forecast models have been developed using probabilities obtained through ordinal logistic regression along with year as regressors and validated using subsequent years data. In discriminant function approach two types of models were developed, one using scores and another using posterior probabilities. Performance of the models obtained at different weeks was compared using Adj R2, PRESS (Predicted error sum of square), number of misclassifications and forecasts were compared using RMSE (Root Mean Square Error) and MAPE (Mean absolute percentage error) of forecast. Ordinal logistic regression based approach was found to be better than discriminant function analysis approach.  


1992 ◽  
Vol 38 (1) ◽  
pp. 34-38 ◽  
Author(s):  
Michael L Astion ◽  
Peter Wilding

Abstract Neural networks are a relatively new method of multivariate analysis. The purpose of this study was to investigate the ability of neural networks to differentiate benign from malignant breast conditions on the basis of the pattern of nine variables: patient age, total cholesterol, high-density lipoprotein cholesterol, triglycerides, apolipoprotein A-I, apolipoprotein B, albumin, the tumor marker CA15-3, and the Fossel index (measurement of methylene and methyl line-widths in proton NMR spectra). The laboratory analyses were made with blood plasma or serum specimens. The neural network was "trained" with 57 patients: 23 patients with breast malignancies and 34 patients with benign breast conditions. A neural network with nine input neurons, 15 hidden neurons, and two output neurons correctly classified all 57 patients. The ability of the network to predict the diagnoses of patients that it had no encountered in training was tested with a separate group (cross-validation group) of 20 patients. The network correctly predicted the diagnoses for 80% of these patients. For comparison we analyzed the same sets of 57 training patients and 20 cross-validation patients by quadratic discriminant function analysis. The quadratic discriminant function, calculated from the same 57 patients used to train the neural network, correctly classified 84% of the 57 patients, and correctly diagnosed 75% of the 20 cross-validation patients. The results suggest that neural networks are a potentially useful multivariate method for optimizing the diagnostic utility of laboratory data.


2000 ◽  
Vol 203 (17) ◽  
pp. 2641-2656 ◽  
Author(s):  
S. Parsons ◽  
G. Jones

We recorded echolocation calls from 14 sympatric species of bat in Britain. Once digitised, one temporal and four spectral features were measured from each call. The frequency-time course of each call was approximated by fitting eight mathematical functions, and the goodness of fit, represented by the mean-squared error, was calculated. Measurements were taken using an automated process that extracted a single call from background noise and measured all variables without intervention. Two species of Rhinolophus were easily identified from call duration and spectral measurements. For the remaining 12 species, discriminant function analysis and multilayer back-propagation perceptrons were used to classify calls to species level. Analyses were carried out with and without the inclusion of curve-fitting data to evaluate its usefulness in distinguishing among species. Discriminant function analysis achieved an overall correct classification rate of 79% with curve-fitting data included, while an artificial neural network achieved 87%. The removal of curve-fitting data improved the performance of the discriminant function analysis by 2 %, while the performance of a perceptron decreased by 2 %. However, an increase in correct identification rates when curve-fitting information was included was not found for all species. The use of a hierarchical classification system, whereby calls were first classified to genus level and then to species level, had little effect on correct classification rates by discriminant function analysis but did improve rates achieved by perceptrons. This is the first published study to use artificial neural networks to classify the echolocation calls of bats to species level. Our findings are discussed in terms of recent advances in recording and analysis technologies, and are related to factors causing convergence and divergence of echolocation call design in bats.


Author(s):  
Kathleen D. White ◽  
Steven F. Daly

Breakup ice jam prediction methods are desirable to provide early warning and allow rapid, effective ice jam mitigation due to the suddenness with which breakup jams and related flooding occur. However, prediction models are limited to empirical or stochastic models rather than deterministic models because of the difficulties in using deterministic models to forecast the formation of breakup ice jams. Existing ice jam prediction methods range from empirical single-variable threshold-type analyses to statistical methods such as logistic regression and discriminant function analysis. Empirical methods are highly site-specific and tend to over predict jam occurrence. In addition, existing models do not provide quantitative information regarding the risk of errors in prediction, which limits their usefulness in emergency situations. In this paper, existing methods are reviewed and a three-step process to predict breakup ice jams is proposed.


2013 ◽  
Vol 41 (1) ◽  
pp. 37-44 ◽  
Author(s):  
HEATHER R. TAFT ◽  
DEREK A. ROFF ◽  
ATTE KOMONEN ◽  
JANNE S. KOTIAHO

SUMMARYThe International Union for Conservation of Nature (IUCN) Red List provides a globally-recognized evaluation of the conservation status of species, with the aim of catalysing appropriate conservation action. However, in some parts of the world, species data may be lacking or insufficient to predict risk status. If species with shared ecological or life history characteristics also tend to share their risk of extinction, then ecological or life history characteristics may be used to predict which species may be at risk, although perhaps not yet classified as such by the IUCN. Statistical models may be a means to determine whether there are non-threatened or unclassified species that share the characteristics of threatened species, however there are no data on which model might be most appropriate or whether multiple models should be used. In this paper, three types of statistical models, namely regression trees, logistic regression and discriminant function analysis are compared using data on the ecological characteristics of Finnish lepidopterans (butterflies and moths). Overall, logistic regression performed slightly better than discriminant function analysis in predicting species status, and both outperformed regression trees. Uncertainty in species classification suggests that multiple analyses should be performed and particular attention devoted to those species for which the methods disagree. Such standard statistical methods may be a valuable additional tool in assessing the likely threat status of a species where there is a paucity of abundance data.


Sign in / Sign up

Export Citation Format

Share Document