Personal Credit Default Discrimination Model Based on Super Learner Ensemble

Assessing the default of customers is an essential basis for personal credit issuance. This paper considers developing a personal credit default discrimination model based on Super Learner heterogeneous ensemble to improve the accuracy and robustness of default discrimination. First, we select six kinds of single classifiers such as logistic regression, SVM, and three kinds of homogeneous ensemble classifiers such as random forest to build a base classifier candidate library for Super Learner. Then, we use the ten-fold cross-validation method to exercise the base classifier to improve the base classifier’s robustness. We compute the base classifier’s total loss using the difference between the predicted and actual values and establish a base classifier-weighted optimization model to solve for the optimal weight of the base classifier, which minimizes the weighted total loss of all base classifiers. Thus, we obtain the heterogeneous ensembled Super Learner classifier. Finally, we use three real credit datasets in the UCI database regarding Australia, Japanese, and German and the large credit dataset GMSC published by Kaggle platform to test the ensembled Super Learner model’s effectiveness. We also employ four commonly used evaluation indicators, the accuracy rate, type I error rate, type II error rate, and AUC. Compared with the base classifier’s classification results and heterogeneous models such as Stacking and Bstacking, the results show that the ensembled Super Learner model has higher discrimination accuracy and robustness.

Download Full-text

Correction: “Influence of Selection Bias on the Test Decision – A Simulation Study”

Methods of Information in Medicine ◽

10.3414/me11-01-0043e ◽

2014 ◽

Vol 53 (05) ◽

pp. 343-343

Keyword(s):

Selection Bias ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

Block Size ◽

Error Rates ◽

Type I ◽

Type I Error Rate ◽

Representation Error ◽

Numeric Representation

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.

Download Full-text

Controlling type I error rate for fast track drug development programmes

Statistics in Medicine ◽

10.1002/sim.1396 ◽

2003 ◽

Vol 22 (5) ◽

pp. 665-675 ◽

Cited By ~ 6

Author(s):

Weichung J. Shih ◽

Peter Ouyang ◽

Hui Quan ◽

Yong Lin ◽

Bart Michiels ◽

...

Keyword(s):

Drug Development ◽

Error Rate ◽

Fast Track ◽

Type I Error ◽

Type I ◽

Type I Error Rate

Download Full-text

Alternative models and randomization techniques for Bayesian response-adaptive randomization with binary outcomes

Clinical Trials ◽

10.1177/17407745211010139 ◽

2021 ◽

pp. 174077452110101

Author(s):

Jennifer Proper ◽

John Connett ◽

Thomas Murray

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Error Rate ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Binary Outcomes ◽

Type I ◽

Operating Characteristics ◽

Type I Error Rate

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.

Download Full-text

A study on model-based error rate estimation for automatic speech recognition

IEEE Transactions on Speech and Audio Processing ◽

10.1109/tsa.2003.818030 ◽

2003 ◽

Vol 11 (6) ◽

pp. 581-589 ◽

Cited By ~ 7

Author(s):

Chao-Shih Huang ◽

Hsiao-Chuan Wang ◽

Chin-Hui Lee

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Error Rate ◽

Rate Estimation ◽

Model Based

Download Full-text

Anova Tests for Homogeneity of Variance: Nonnormality and Unequal Samples

Journal of Educational Statistics ◽

10.3102/10769986002003187 ◽

1977 ◽

Vol 2 (3) ◽

pp. 187-206 ◽

Cited By ~ 10

Author(s):

Charles G. Martin ◽

Paul A. Games

Keyword(s):

Error Rate ◽

Type I Error ◽

Type I ◽

Empirical Comparison ◽

Jackknife Test ◽

Type I Error Rate ◽

Power And Control ◽

Homogeneity Of Variance ◽

Test Use ◽

And Control

This paper presents an exposition and an empirical comparison of two potentially useful tests for homogeneity of variance. Control of Type I error rate, P(EI), and power are investigated for three forms of the Box test and for two forms of the jackknife test with equal and unequal n's under conditions of normality and nonnormality. The Box test is shown to be robust to violations of the assumption of normality. The jackknife test is shown not to be robust. When n's are unequal, the problem of heterogeneous within-cell variances of the transformed values and unequal n's affects the jackknife and Box tests. Previously reported suggestions for selecting subsample sizes for the Box test are shown to be inappropriate, producing an inflated P(EI). Two procedures which alleviate this problem are presented for the Box test. Use of the jack-knife test with a reduced alpha is shown to provide power and control of P(EI) at approximately the same level as the Box test. Recommendations for the use of these techniques and computational examples of each are provided.

Download Full-text

Non-inferiority test for a continuous variable with a flexible margin in an active controlled trial : An application to the "Stratall ANRS 12110 / ESTHER" trial

10.21203/rs.3.rs-21325/v2 ◽

2020 ◽

Author(s):

Arsene Sandie ◽

Nicholas Molinari ◽

Anthony Wanjoya ◽

Charles Kouanfack ◽

Christian Laurent ◽

...

Keyword(s):

Confidence Interval ◽

Active Control ◽

Error Rate ◽

Type I Error ◽

Clinical Monitoring ◽

Small Scale ◽

Type I ◽

Test Procedures ◽

Power Estimate ◽

Control Intervention

Abstract Background: The non-inferiority trials are becoming increasingly popular in public health and clinical research. The choice of the non-inferiority margin is the cornerstone of the non-inferiority trial. When the eﬀect of active control intervention is unknown, it can be interesting to choose the non-inferiority margin as a function of the active control intervention eﬀect. In this case, the uncertainty surrounding the non-inferiority margin should be accounted for in statistical tests. In this work, we explored how to perform the non-inferiority test with a ﬂexible margin for continuous endpoint.Methods: It was proposed in this study two procedures for the non-inferiority test with a ﬂexible margin for the continuous endpoint. The proposed test procedures are based on test statistic and conﬁdence interval approach. Simulations have been used to assess the performances and properties of the proposed test procedures. An application was done on clinical real data, which the purpose was to assess the eﬃcacy of clinical monitoring alone versus laboratory and clinical monitoring in HIV-infected adult patients.Results: Basically, the two proposed test procedures have good properties. In the test based on a statistic, the actual type 1 error rate estimate is approximatively equal to the nominal value. It has been found that the conﬁdence interval level determines approximately the level of signiﬁcance. The 80%, 90%, and 95%one-sided conﬁdence interval levels led approximately to a type I error of 10%, 5% and 2.5% respectively. The power estimate was almost 100% for two proposed tests, except for the small scale values of the reference treatment where the power was relatively low when the sample sizes were small.Conclusions: Based on type I error rate and power estimates, the proposed non-inferiority hypothesis test procedures have good performance and are applicable in practice.Trial registration: The trial data used in this study was from the ”Stratall ANRS 12110 / ESTHER”, registered with ClinicalTrials.gov, number NCT00301561. Date : March 13, 2006, url : https://clinicaltrials.gov/ct2/show/NCT00301561.

Download Full-text

Comparision Mann-Whitney U Test and Students’ t Test in Terms of Type I Error Rate and Test Power: A Monte Carlo Sımulation Study

Afyon Kocatepe University Journal of Sciences and Engineering ◽

10.5578/fmbd.7380 ◽

2014 ◽

Vol 14 (1) ◽

pp. 5-11

Author(s):

Recep Bindak

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Simulation Study ◽

Error Rate ◽

Type I Error ◽

T Test ◽

Type I ◽

Monte Carlo Simulation Study ◽

Test Power ◽

Type I Error Rate

Download Full-text

Group sequential designs with robust semiparametric recurrent event models

Statistical Methods in Medical Research ◽

10.1177/0962280218780538 ◽

2018 ◽

Vol 28 (8) ◽

pp. 2385-2403 ◽

Cited By ~ 1

Author(s):

Tobias Mütze ◽

Ekkehard Glimm ◽

Heinz Schmidli ◽

Tim Friede

Keyword(s):

Error Rate ◽

Joint Distribution ◽

Recurrent Events ◽

Type I Error ◽

Type I ◽

Sequential Designs ◽

Group Sequential ◽

Type I Error Rate ◽

Sequential Procedures ◽

Group Sequential Designs

Robust semiparametric models for recurrent events have received increasing attention in the analysis of clinical trials in a variety of diseases including chronic heart failure. In comparison to parametric recurrent event models, robust semiparametric models are more flexible in that neither the baseline event rate nor the process inducing between-patient heterogeneity needs to be specified in terms of a specific parametric statistical model. However, implementing group sequential designs in the robust semiparametric model is complicated by the fact that the sequence of Wald statistics does not follow asymptotically the canonical joint distribution. In this manuscript, we propose two types of group sequential procedures for a robust semiparametric analysis of recurrent events. The first group sequential procedure is based on the asymptotic covariance of the sequence of Wald statistics and it guarantees asymptotic control of the type I error rate. The second procedure is based on the canonical joint distribution and does not guarantee asymptotic type I error rate control but is easy to implement and corresponds to the well-known standard approach for group sequential designs. Moreover, we describe how to determine the maximum information when planning a clinical trial with a group sequential design and a robust semiparametric analysis of recurrent events. We contrast the operating characteristics of the proposed group sequential procedures in a simulation study motivated by the ongoing phase 3 PARAGON-HF trial (ClinicalTrials.gov identifier: NCT01920711) in more than 4600 patients with chronic heart failure and a preserved ejection fraction. We found that both group sequential procedures have similar operating characteristics and that for some practically relevant scenarios, the group sequential procedure based on the canonical joint distribution has advantages with respect to the control of the type I error rate. The proposed method for calculating the maximum information results in appropriately powered trials for both procedures.

Download Full-text