scholarly journals The impact of Data structure on classification ability of financial failure prediction model

2020 ◽  
Vol 74 ◽  
pp. 05024
Author(s):  
Lucia Svabova ◽  
Lucia Michalkova

The creation of prediction models to reveal the threat of financial difficulties of the companies is realized by the application of various multivariate statistical methods. From a global perspective, prediction models serve to classify a company into a group of prosperous or non-prosperous companies, or to quantify the probability of financial difficulties in the company. In many countries around the world, real financial data about the companies are used in developing these prediction models. In Slovakia, standard data from the financial statements and annual reports of Slovak companies are used for the creation of the company’s failure model. Since in this case there are generally large data files, it is necessary to pre-process the data by the selected methods before the prediction model is constructed. A database of the companies needs to be prepared for the subsequent application of statistical methods, and it is also highly appropriate to focus globally on the detection of potential extreme and remote observations. Therefore, the article will focus on quantifying the impact of the data structure detected, for example, the occurrence of extreme and remote observations in the data set, on the resulting overall classification of the prediction ability of the models created.

Equilibrium ◽  
2017 ◽  
Vol 12 (4) ◽  
pp. 775-791 ◽  
Author(s):  
Maria Kovacova ◽  
Tomas Kliestik

Research background: Prediction of bankruptcy is an issue of interest of various researchers and practitioners since the first study dedicated to this topic was published in 1932. Finding the suitable bankruptcy prediction model is the task for economists and analysts from all over the world. forecasting model using. Despite a large number of various models, which have been created by using different methods with the aim to achieve the best results, it is still challenging to predict bankruptcy risk, as corporations have become more global and more complex. Purpose of the article: The aim of the presented study is to construct, via an empirical study of relevant literature and application of suitable chosen mathematical statistical methods, models for bankruptcy prediction of Slovak companies and provide the comparison of overall prediction ability of the two developed models. Methods: The research was conducted on the data set of Slovak corporations covering the period of the year 2015, and two mathematical statistical methods were applied. The methods are logit and probit, which are both symmetric binary choice models, also known as conditional probability models. On the other hand, these methods show some significant differences in process of model formation, as well as in achieved results. Findings & Value added: Given the fact that mostly discriminant analysis and logistic regression are used for the construction of bankruptcy prediction models, we have focused our attention on the development bankruptcy prediction model in the Slovak Republic via logistic regression and probit. The results of the study suggest that the model based on a logit functions slightly outperforms the classification accuracy of probit model. Differences were obtained also in the detection of the most significant predictors of bankruptcy prediction in these types of models constructed in Slovak companies.


Author(s):  
Guizhou Hu ◽  
Martin M. Root

Background No methodology is currently available to allow the combining of individual risk factor information derived from different longitudinal studies for a chronic disease in a multivariate fashion. This paper introduces such a methodology, named Synthesis Analysis, which is essentially a multivariate meta-analytic technique. Design The construction and validation of statistical models using available data sets. Methods and results Two analyses are presented. (1) With the same data, Synthesis Analysis produced a similar prediction model to the conventional regression approach when using the same risk variables. Synthesis Analysis produced better prediction models when additional risk variables were added. (2) A four-variable empirical logistic model for death from coronary heart disease was developed with data from the Framingham Heart Study. A synthesized prediction model with five new variables added to this empirical model was developed using Synthesis Analysis and literature information. This model was then compared with the four-variable empirical model using the first National Health and Nutrition Examination Survey (NHANES I) Epidemiologic Follow-up Study data set. The synthesized model had significantly improved predictive power ( x2 = 43.8, P < 0.00001). Conclusions Synthesis Analysis provides a new means of developing complex disease predictive models from the medical literature.


2020 ◽  
Vol 4 (4) ◽  
pp. 33
Author(s):  
Toni Pano ◽  
Rasha Kashef

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.


Energies ◽  
2019 ◽  
Vol 12 (24) ◽  
pp. 4786
Author(s):  
Yanpeng Hao ◽  
Zhaohong Yao ◽  
Junke Wang ◽  
Hao Li ◽  
Ruihai Li ◽  
...  

Icing forecasting for transmission lines is of great significance for anti-icing strategies in power grids, but existing prediction models have some disadvantages such as application limitations, weak generalization, and lack of global prediction ability. To overcome these shortcomings, this paper suggests a new conception about a segmental icing prediction model for transmission lines in which the classification of icing process plays a crucial role. In order to obtain the classification, a hierarchical K-means clustering method is utilized and 11 characteristic parameters are proposed. Based on this method, 97 icing processes derived from the Icing Monitoring System in China Southern Power Grid are clustered into six categories according to their curve shape and the abstracted icing evolution curves are drawn based on the clustering centroid. Results show that the processes of ice events are probably different and the icing process can be considered as a combination of several segments and nodes, which reinforce the suggested conception of the segmental icing prediction model. Based on monitoring data and clustering, the obtained types of icing evolution are more comprehensive and specific, and the work lays the foundation for the model construction and contributes to other fields.


2020 ◽  
Vol 120 (10) ◽  
pp. 1941-1957
Author(s):  
Futao Zhao ◽  
Zhong Yao

PurposeThe purpose of this paper is to identify the impact factors that might influence audiences' voluntary donation to content creators on the online platforms, and to build an effective prediction model by considering both content and creator-related features.Design/methodology/approachThis study collected the real-world data of content consumption from Xueqiu.com and extracted both content and creator characteristics from the data set. The best donation prediction model based on such features was determined by evaluating four prevalent classifiers with various performance metrics. Furthermore, three feature selection methods were applied to validate the robustness of the constructed model, and then the predictability of different feature groups was examined. Finally, we conducted an interpretive analysis to identify relatively important predictors.FindingsThe experimental results show that the random classifier with all extracted features outperformed other built models and achieved excellent performance, indicating the usefulness of these factors in predicting the donations. Moreover, the predictability of content features was demonstrated to be relatively better than that of creator ones. Finally, several particularly important predictors were identified such as the number of modal particles in the article.Originality/valueThis study is among the first to investigate what factors might drive customers' voluntary donation to content contributors on social websites. Different from previous studies focusing on live video streaming, we expand the research vision by examining the donations to user-generated text content, calling for attention to other important topics in the burgeoning industry.


The Bank Marketing data set at Kaggle is mostly used in predicting if bank clients will subscribe a long-term deposit. We believe that this data set could provide more useful information such as predicting whether a bank client could be approved for a loan. This is a critical choice that has to be made by decision makers at the bank. Building a prediction model for such high-stakes decision does not only require high model prediction accuracy, but also needs a reasonable prediction interpretation. In this research, different ensemble machine learning techniques have been deployed such as Bagging and Boosting. Our research results showed that the loan approval prediction model has an accuracy of 83.97%, which is approximately 25% better than most state-of-the-art other loan prediction models found in the literature. As well, the model interpretation efforts done in this research was able to explain a few critical cases that the bank decision makers may encounter; therefore, the high accuracy of the designed models was accompanied with a trust in prediction. We believe that the achieved model accuracy accompanied with the provided interpretation information are vitally needed for decision makers to understand how to maintain balance between security and reliability of their financial lending system, while providing fair credit opportunities to their clients.


2020 ◽  
Vol 3 (3) ◽  
pp. 138-146
Author(s):  
Camilla Matos Pedreira ◽  
José Alves Barros Filho ◽  
Carolina Pereira ◽  
Thamine Lessa Andrade ◽  
Ricardo Mingarini Terra ◽  
...  

Objectives: This study aims to evaluate the impact of using three predictive models of lung nodule malignancy in a population of patients at high-risk for neoplasia according to previous analysis by physicians, as well as evaluate the clinical and radiological malignancy-predictors of the images. Material and Methods: This is a retrospective cohort study, with 135 patients, undergone surgical in the period from 01/07/2013 to 10/05/2016. The study included nodules with dimensions between 5mm and 30mm, excluding multiple nodules, alveolar consolidation, pleural effusion, and lymph node enlargement. The main variables analyzed were age, sex, smoking history, extrathoracic cancer, diameter, location, and presence of spiculation. The calculation of the area under the ROC curve assessed the accuracy of each prediction model. Results: The study analyzed 135 individuals, of which 96 (71.1%) had malignant nodules. The areas under the ROC curves for each prediction model were: Swensen 0.657; Brock 0.662; and Herder 0.633. The models Swensen, Brock, and Herder presented positive predictive values in high-risk patients, corresponding to 83.3%, 81.8%, and 82.9%, respectively. Patients with the intermediate and low-risk presented a high malignant nodule rate, ranging from 69.3-72.5% and 42.8-52.6%, respectively. Conclusion: None of the three quantitative models analyzed in this study was considered satisfactory (AUC> 0.7) and should be used with caution after specialized evaluation to avoid underestimation of the risk of neoplasia. The pretest calculations might not contemplate other factors than those predicted in the regressions, that could present a role in the clinical decision of resection.


2019 ◽  
Author(s):  
Angela M M Kotsopoulos ◽  
Piet Vos ◽  
Nichon E Jansen ◽  
Ewald M Bronkhorst ◽  
Johannes G van der Hoeven ◽  
...  

BACKGROUND Controlled donation after circulatory death (cDCD) is a major source of organs for transplantation. A potential cDCD donor poses considerable challenges in terms of identification of those dying within the predefined time frame of warm ischemia after withdrawal of life-sustaining treatment (WLST) to circulatory arrest. Several attempts have been made to develop models predicting the time between treatment withdrawal and circulatory arrest. This time window determines whether organ donation can occur and influences the quality of the donated organs. However, the selected patients used for these models were not always restricted to potential cDCD donors (eg, patients with cancer or severe infections were also included). This severely limits the generalizability of those data. OBJECTIVE The objectives of this study are the following: (1) to develop a model predicting time to death within 60 minutes in potential cDCD patients; (2) to validate and update previous prediction models on time to death after WLST; (3) to determine timing and patient characteristics that are associated with prognostication and the decision-making process that leads to initiating end-of-life care; (4) to evaluate the impact of timing of family approach on organ donation approval; and (5) to assess the influence of variation in WLST processes on postmortem organ donor potential and actual postmortem organ donors. METHODS In this multicenter observational prospective cohort study, all patients admitted to the intensive care unit of 3 university hospitals and 3 teaching hospitals who met the criteria of the cDCD protocol as defined by the Dutch Transplant Foundation were included. The target of enrolment was set to 400 patients. Previously developed models will be refitted in our data set. To further update previous prediction models, we will apply least absolute shrinkage and selection operator (LASSO) as a tool for efficient variable selection to develop the multivariable logistic regression model. RESULTS This protocol was funded in August 2014 by the Dutch Transplant Foundation. We expect to have the results of this study in July 2020. Patient enrolment was completed in July 2018 and data collection was completed in April 2020. CONCLUSIONS This study will provide a robust multimodal prediction model, based on clinical and physiological parameters, that can predict time to circulatory arrest in cDCD donors. In addition, it will add valuable insight in the process of WLST in cDCD donors and will fill an important knowledge gap in this essential field of health care. CLINICALTRIAL ClinicalTrials.gov NCT04123275; https://clinicaltrials.gov/ct2/show/NCT04123275 INTERNATIONAL REGISTERED REPORT DERR1-10.2196/16733


2019 ◽  
Author(s):  
Sungjun Hong ◽  
Sungjoo Lee ◽  
Jeonghoon Lee ◽  
Won Chul Cha ◽  
Kyunga Kim

BACKGROUND The development and application of clinical prediction models using machine learning in clinical decision support systems is attracting increasing attention. OBJECTIVE The aims of this study were to develop a prediction model for cardiac arrest in the emergency department (ED) using machine learning and sequential characteristics and to validate its clinical usefulness. METHODS This retrospective study was conducted with ED patients at a tertiary academic hospital who suffered cardiac arrest. To resolve the class imbalance problem, sampling was performed using propensity score matching. The data set was chronologically allocated to a development cohort (years 2013 to 2016) and a validation cohort (year 2017). We trained three machine learning algorithms with repeated 10-fold cross-validation. RESULTS The main performance parameters were the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). The random forest algorithm (AUROC 0.97; AUPRC 0.86) outperformed the recurrent neural network (AUROC 0.95; AUPRC 0.82) and the logistic regression algorithm (AUROC 0.92; AUPRC=0.72). The performance of the model was maintained over time, with the AUROC remaining at least 80% across the monitored time points during the 24 hours before event occurrence. CONCLUSIONS We developed a prediction model of cardiac arrest in the ED using machine learning and sequential characteristics. The model was validated for clinical usefulness by chronological visualization focused on clinical usability.


10.2196/16733 ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. e16733
Author(s):  
Angela M M Kotsopoulos ◽  
Piet Vos ◽  
Nichon E Jansen ◽  
Ewald M Bronkhorst ◽  
Johannes G van der Hoeven ◽  
...  

Background Controlled donation after circulatory death (cDCD) is a major source of organs for transplantation. A potential cDCD donor poses considerable challenges in terms of identification of those dying within the predefined time frame of warm ischemia after withdrawal of life-sustaining treatment (WLST) to circulatory arrest. Several attempts have been made to develop models predicting the time between treatment withdrawal and circulatory arrest. This time window determines whether organ donation can occur and influences the quality of the donated organs. However, the selected patients used for these models were not always restricted to potential cDCD donors (eg, patients with cancer or severe infections were also included). This severely limits the generalizability of those data. Objective The objectives of this study are the following: (1) to develop a model predicting time to death within 60 minutes in potential cDCD patients; (2) to validate and update previous prediction models on time to death after WLST; (3) to determine timing and patient characteristics that are associated with prognostication and the decision-making process that leads to initiating end-of-life care; (4) to evaluate the impact of timing of family approach on organ donation approval; and (5) to assess the influence of variation in WLST processes on postmortem organ donor potential and actual postmortem organ donors. Methods In this multicenter observational prospective cohort study, all patients admitted to the intensive care unit of 3 university hospitals and 3 teaching hospitals who met the criteria of the cDCD protocol as defined by the Dutch Transplant Foundation were included. The target of enrolment was set to 400 patients. Previously developed models will be refitted in our data set. To further update previous prediction models, we will apply least absolute shrinkage and selection operator (LASSO) as a tool for efficient variable selection to develop the multivariable logistic regression model. Results This protocol was funded in August 2014 by the Dutch Transplant Foundation. We expect to have the results of this study in July 2020. Patient enrolment was completed in July 2018 and data collection was completed in April 2020. Conclusions This study will provide a robust multimodal prediction model, based on clinical and physiological parameters, that can predict time to circulatory arrest in cDCD donors. In addition, it will add valuable insight in the process of WLST in cDCD donors and will fill an important knowledge gap in this essential field of health care. Trial Registration ClinicalTrials.gov NCT04123275; https://clinicaltrials.gov/ct2/show/NCT04123275 International Registered Report Identifier (IRRID) DERR1-10.2196/16733


Sign in / Sign up

Export Citation Format

Share Document