Sequential Importance Sampling for Logistic Regression Model

Author(s):  
Ruriko Yoshida ◽  
Hisayuki Hara ◽  
Patrick M. Saluke

Logistic regression is one of the most popular models to classify in data science, and in general, it is easy to use. However, in order to conduct a goodness-of-fit test, we cannot apply asymptotic methods if we have sparse datasets. In the case, we have to conduct an exact conditional inference via a sampler, such as Markov Chain Monte Carlo (MCMC) or Sequential Importance Sampling (SIS). In this chapter, the authors investigate the rejection rate of the SIS procedure on a multiple logistic regression models with categorical covariates. Using tools from algebra, they show that in general SIS can have a very high rejection rate even though we apply Linear Integer Programming (IP) to compute the support of the marginal distribution for each variable. More specifically, the semigroup generated by the columns of the design matrix for a multiple logistic regression has infinitely many “holes.” They end with application of a hybrid scheme of MCMC and SIS to NUN study data on Alzheimer disease study.

2020 ◽  
Vol 2 (2) ◽  
pp. 323-336
Author(s):  
Santosh Kumar Shah

Introduction: Banks play an important role in ensuringthe economicand social stability, and the sustainablegrowth of the economy. The savings and other accounts in financial institutions, including banks, finances, microfinances and cooperatives, enable people to execute important financial functions. Thus, households that have accounts in any of financial institutions can have access to various banking services. Objective: The objective of the study is to identify the factors associated with households having bank accounts in Nepal. Methods: The analysis is based on household data extracted from the dataset of Nepal Demographic and Health Survey, 2016. The dependent variable is dichotomous, as the households with bank accounts and without bank accounts in any formal financial channels. In order to identify the factors associated with households receiving financial services in Nepal, multiple logistic regression models were developed by examining the model adequacy test. Results: The study finds that a total of 66.9% of the households had bank accounts. Several variables were found to be 1% of significance level. The predictive power of the model is found to be 31.2% and multicollinearity among the independent variables was absent. The Hosmer-Lemoshow goodness of fit test revealed that the data were poorly (p-value=0.056) fitted by the model. However, Osius-Rojek goodness of fit test (z=0.11; p-value=0.911), Stukel test (Z=0.683, p-value=0.494), likelihood ratio test (χ2=2770; p-value<0.0001) and area under receiver operating curve (79.8%) revealed that fitted model was good. Conclusion: Multiple logistic regression model revealed that in mountainous and hilly regions, women-headed households have less chances of not having bank accounts compared to the Terai region and men-headed households. The chances of having a bank account in province-2 is even worse than in Karnali and other provinces. The odds of not having bank accounts gradually decreased with the increase in size of agricultural land, wealth index, increase in family size and the number of family members who have completed secondary education.


2021 ◽  
Author(s):  
Rui Yang ◽  
Wen Ma ◽  
Tao Huang ◽  
Lu-Ming Zhang ◽  
Di-Di Han ◽  
...  

Abstract Background: The purpose of this study was to identify the factors influencing the 90-day mortality of acute myocardial infarction(AMI) patients, and to establish a prognostic model for these patients based on the MIMIC-III database.Methods: Retrospective study methods were used to collect AMI patient data that met the inclusion criteria from the MIMIC-III database. Variable importance selection was determined using the random forest algorithm. Multiple logistic regression was used to determine AMI-related risk factors, with the results represented as a nomogram.Results: The baseline scores for the training and validation groups were very flat, and indicators for developing risk-model nomograms were obtained after random forest and multiple logistic regression. The AUC of the risk model was the highest (0.826 and 0.818 in the training and validation groups, respectively) . The Hosmer-Lemeshow goodness-of-fit test and standard curve both produced very consistent results. Both the NRI and IDI values indicated that the risk model had significant predictive power, and DCA results indicated that the risk model had good net benefits for clinical application.Conclusions: The results of this study indicated that age, troponinT, VT, VFI, MI_his, APS-III, bypass, and PCI were risk factors for 90-day mortality in AMI patients. Interactive nomograms could provide intuitive and concise personalized 90-day mortality predictions for AMI patients.


1980 ◽  
Vol 19 (01) ◽  
pp. 42-49 ◽  
Author(s):  
B. W. Brown ◽  
C. Engelhard ◽  
J. Haipern ◽  
J. F. Fries ◽  
L. S. Coles

In solving a clinical problem of diagnosis, prognosis, or treatment choice, a physician must select from among a large group of possible tests. In general, an ordering exists specifying which tests are most valuable in providing relevant information concerning the problem on hand. The computer program package to be described (MW) extracts appropriate data from the ARAMIS data banks and then analyzes the data by stepwise logistic regression. A binary outcome (diagnosis, prognostic event, or treatment response) is sequentially associated with possible tests, and the most powerful combination of tests is identified. For example, the most valuable predictor variable of early mortality in SLE is proteinuria, followed sequentially by anemia and absence of arthritis. Experience with these techniques suggests : 1. optimal certainty is usually reached after only three or four tests; 2. several different test sequences may lead to the same level of certainty; 3. diagnosis may usually be ascertained with greater certainty than prognosis; 4. many medical problems contain considerable non-reducible uncertainty; 5. a relatively small group of tests are typically found among the most powerful; 6. results are consistent across several patient populations; 7. results are largely independent of the particular statistic employed. These observations suggest strategies for maximizing information while minimizing risk and expense.


Author(s):  
Paulin Paul ◽  
Noel George ◽  
B. Priestly Shan

Background: The accuracy of Joint British Society calculator3 (JBS3) cardiovascular risk prediction may vary within Indian population, and is not yet studied using south Indian Kerala based population data. Objectives: To evaluate the cardiovascular disease (CV) risk estimation using the traditional CVD risk factors (TRF) in Kerala based population. Methods: This cross sectional study has 977 subjects aged between 30 and 80 years. The traditional CVD risk markers are recorded from the medical archives of clinical locations at Ernakulum district, in Kerala The 10 year risk categories used are low (<7.5%), intermediate (≥7.5% and <20%), and high (≥20%). The lifetime classifications low lifetime (≤39%) and high lifetime (≥40%) are used. The study was evaluated using statistical analysis. Chi-square test was done for dependent and categorical CVD risk variable comparison. Multivariate ordinal logistic regression for 10-year risk model and odds logistic regression analysis for lifetime model was used to identify significant risk variables. Results: The mean age of the study population is 52.56±11.43 years. The risk predictions has 39.1% in low, 25.0% in intermediate, and 35.9% had high 10-year risk. The low lifetime risk had 41.1% and 58.9% is high lifetime risk. Reclassifications to high lifetime are higher from intermediate 10-year risk category. The Hosmer-Lemeshow goodness-of-fit statistics indicates a good model fit. Conclusion: The risk prediction and timely intervention with appropriate therapeutic and lifestyle modification is useful in primary prevention. Avoiding short-term incidences and reclassifications to high lifetime can reduce the CVD mortality rates.


Author(s):  
Byunghyun Kang ◽  
Cheol Choi ◽  
Daeun Sung ◽  
Seongho Yoon ◽  
Byoung-Ho Choi

In this study, friction tests are performed, via a custom-built friction tester, on specimens of natural rubber used in automotive suspension bushings. By analyzing the problematic suspension bushings, the eleven candidate factors that influence squeak noise are selected: surface lubrication, hardness, vulcanization condition, surface texture, additive content, sample thickness, thermal aging, temperature, surface moisture, friction speed, and normal force. Through friction tests, the changes are investigated in frictional force and squeak noise occurrence according to various levels of the influencing factors. The degree of correlation between frictional force and squeak noise occurrence with the factors is determined through statistical tests, and the relationship between frictional force and squeak noise occurrence based on the test results is discussed. Squeak noise prediction models are constructed by considering the interactions among the influencing factors through both multiple logistic regression and neural network analysis. The accuracies of the two prediction models are evaluated by comparing predicted and measured results. The accuracies of the multiple logistic regression and neural network models in predicting the occurrence of squeak noise are 88.2% and 87.2%, respectively.


2021 ◽  
pp. 1-10
Author(s):  
Guang Fu ◽  
Hai-chao Zhan ◽  
Hao-li Li ◽  
Jun-fu Lu ◽  
Yan-hong Chen ◽  
...  

Objective: The objective of this study was to assess the relationship between serum procalcitonin (PCT) and acute kidney injury (AKI) induced by bacterial septic shock. Methods: A retrospective study was designed which included patients who were admitted to the ICU from January 2015 to October 2018. Multiple logistic regression and receiver operating characteristic (ROC) as well as smooth curve fitting analysis were used to assess the relationship between the PCT level and AKI. Results: Of the 1,631 patients screened, 157 patients were included in the primary analysis in which 84 (53.5%) patients were with AKI. Multiple logistic regression results showed that PCT (odds ratio [OR] = 1.017, 95% confidence interval [CI] 1.009–1.025, p < 0.001) was associated with AKI induced by septic shock. The ROC analysis showed that the cutoff point for PCT to predict AKI development was 14 ng/mL, with a sensitivity of 63% and specificity 67%. Specifically, in multivariate piecewise linear regression, the occurrence of AKI decreased with the elevation of PCT when PCT was between 25 ng/mL and 120 ng/mL (OR 0.963, 95% CI 0.929–0.999; p = 0.042). The AKI increased with the elevation of PCT when PCT was either <25 ng/mL (OR 1.077, 95% CI 1.022–1.136; p = 0.006) or >120 ng/mL (OR 1.042, 95% CI 1.009–1.076; p = 0.013). Moreover, the PCT level was significantly higher in the AKI group only in female patients aged ≤75 years (p = 0.001). Conclusions: Our data revealed a nonlinear relationship between PCT and AKI in septic shock patients, and PCT could be used as a potential biomarker of AKI in female patients younger than 75 years with bacterial septic shock.


Sign in / Sign up

Export Citation Format

Share Document