scholarly journals A simulation study for evaluating the performance of clustering measures in multilevel logistic regression

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Nicholas Siame Adam ◽  
Halima S. Twabi ◽  
Samuel O.M. Manda

Abstract Background Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)). Methods We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi. Results Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively. Conclusion At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses.

Author(s):  
Moza S. Al-Balushi ◽  
Mohammed S. Ahmed ◽  
M. Mazharul Islam

In this paper, multilevel logistic regression models are developed for examining the hierarchical effects of contraceptive use and its selected determinants in Oman using the 2008 Oman National Reproductive Health Survey (ONRHS). Comparison between single level and multilevel logistic regression models has been made to examine the plausibility of multilevel effects of contraceptive use. From the multilevel logistic regression model analysis, it was found that there is real multilevel variation among contraceptive users in Oman. The results indicate that a multilevel logistic regression model is the best fit over ordinary multiple logistic regression models. Generally, this study revealed that women’s age, education, number of living children and region of residence are important factors that affect contraceptive use in Oman. The effect of regional variation for age of women, education of women and number of living children further implies that there exists considerable differences in modern contraceptive use among regions, and a model with a random coefficient or slope is more appropriate to explain the regional variation than a model with fixed coefficients or without random effects. The study suggests that researchers should use multilevel models rather than traditional regression methods when their data structure is hierarchal.  


2006 ◽  
Vol 52 (2) ◽  
pp. 325-328 ◽  
Author(s):  
Paul Froom ◽  
Zvi Shimoni

Abstract Background: The aim of this study was to explore whether electronically retrieved laboratory data can predict mortality in internal medicine departments in a regional hospital. Methods: All 10 308 patients hospitalized in internal medicine departments over a 1-year period were included in the cohort. Nearly all patients had a complete blood count and basic clinical chemistries on admission. We used logistic regression analysis to predict the 573 deaths (5.6%), including all variables that added significantly to the model. Results: Eight laboratory variables and age significantly and independently contributed to a logistic regression model (area under the ROC curve, 88.7%). The odds ratio for the final model per quartile of risk was 6.44 (95% confidence interval, 5.42–7.64), whereas for age alone, the odds ratio per quartile was 2.01 (95% confidence interval, 1.84–2.19). Conclusions: A logistic regression model including only age and electronically retrieved laboratory data highly predicted mortality in internal medicine departments in a regional hospital, suggesting that age and routine admission laboratory tests might be used to ensure a fair comparison when using mortality monitoring for hospital quality control.


Author(s):  
Pouya Gholizadeh ◽  
Behzad Esmaeili

The ability to identify factors that influence serious injuries and fatalities would help construction firms triage hazardous situations and direct their resources towards more effective interventions. Therefore, this study used odds ratio analysis and logistic regression modeling on historical accident data to investigate the contributing factors impacting occupational accidents among small electrical contracting enterprises. After conducting a thorough content analysis to ensure the reliability of reports, the authors adopted a purposeful variable selection approach to determine the most significant factors that can explain the fatality rates in different scenarios. Thereafter, this study performed an odds ratio analysis among significant factors to determine which factors increase the likelihood of fatality. For example, it was found that having a fatal accident is 4.4 times more likely when the source is a “vehicle” than when it is a “tool, instrument, or equipment”. After validating the consistency of the model, 105 accident scenarios were developed and assessed using the model. The findings revealed which severe accident scenarios happen commonly to people in this trade, with nine scenarios having fatality rates of 50% or more. The highest fatality rates occurred in “fencing, installing lights, signs, etc.” tasks in “alteration and rehabilitation” projects where the source of injury was “parts and materials”. The proposed analysis/modeling approach can be applied among all specialty contracting companies to identify and prioritize more hazardous situations within specific trades. The proposed model-development process also contributes to the body of knowledge around accident analysis by providing a framework for analyzing accident reports through a multivariate logistic regression model.


2015 ◽  
Vol 32 (1) ◽  
pp. 288 ◽  
Author(s):  
Daniel Lapresa ◽  
Javier Arana ◽  
M.Teresa Anguera ◽  
J.Ignacio Pérez-Castellanos ◽  
Mario Amatria

This study shows how simple and multiple logistic regression can be used in observational methodology and more specifically, in the fields of physical activity and sport. We demonstrate this in a study designed to determine whether three-a-side futsal or five-a-side futsal is more suited to the needs and potential of children aged 6-to-8 years. We constructed a multiple logistic regression model to analyze use of space (depth of play) and three simple logistic regression models to determine which game format is more likely to potentiate effective technical and tactical performance.


2021 ◽  
Vol 84 (2) ◽  
pp. 117-131
Author(s):  
Marta Sternal ◽  
Barbara Kwiatkowska ◽  
Krzysztof Borysławski ◽  
Agnieszka Tomaszewska

Abstract The relationship between maternal age and the occurrence of cerebral palsy is still highly controversial. The aim of the study was to examine the effect of maternal age on the risk of CP development, taking into account all significant risk factors and the division into single, twin, full-term, and pre-term pregnancies. The survey covered 278 children with CP attending selected educational institutions in Poland. The control group consisted of data collected from the medical records of 435 children born at Limanowa county hospital, Poland. The analyses included socio-economic factors, factors related to pregnancy and childbirth, and factors related to the presence of comorbidities and diseases in the child. Constructed logistic regression models were used for statistical analyses. For all age categories included in the estimated models (assessing the effect of demographic factors on the development of CP), only the category of ≤24 years of age (in the group of all children) was significant. It was estimated that in this mother’s age category, the risk of CP is lower (OR 0.6, 95% CI: 0.3–1.0) in comparison to mothers aged 25-29 (p = 0.03). However, estimation with the use of a complex logistic regression model did not show any significant effect of maternal age on the incidence of CP in groups from different pregnancies types. It became apparent that maternal age is a weak predictor of CP, insignificant in the final logistic regression model. It seems correct to assume that the studies conducted so far, showing a significant effect of maternal age in this respect, may be associated with bias in the estimators used to assess the risk of CP due to the fact that other important risk factors for CP development were not included in the research.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1517
Author(s):  
Hao Yang Teng ◽  
Zhengjun Zhang

Logistic regression is widely used in the analysis of medical data with binary outcomes to study treatment effects through (absolute) treatment effect parameters in the models. However, the indicative parameters of relative treatment effects are not introduced in logistic regression models, which can be a severe problem in efficiently modeling treatment effects and lead to the wrong conclusions with regard to treatment effects. This paper introduces a new enhanced logistic regression model that offers a new way of studying treatment effects by measuring the relative changes in the treatment effects and also incorporates the way in which logistic regression models the treatment effects. The new model, called the Absolute and Relative Treatment Effects (AbRelaTEs) model, is viewed as a generalization of logistic regression and an enhanced model with increased flexibility, interpretability, and applicability in real data applications than the logistic regression. The AbRelaTEs model is capable of modeling significant treatment effects via an absolute or relative or both ways. The new model can be easily implemented using statistical software, with the logistic regression model being treated as a special case. As a result, the classical logistic regression models can be replaced by the AbRelaTEs model to gain greater applicability and have a new benchmark model for more efficiently studying treatment effects in clinical trials, economic developments, and many applied areas. Moreover, the estimators of the coefficients are consistent and asymptotically normal under regularity conditions. In both simulation and real data applications, the model provides both significant and more meaningful results.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0256725
Author(s):  
Rezwanul Haque ◽  
Syed Afroz Keramat ◽  
Syed Mahbubur Rahman ◽  
Maimun Ur Rashid Mustafa ◽  
Khorshed Alam

Background Obesity prevalence is increasing in many countries in the world, including Asia. Maternal obesity is highly associated with fetal and neonatal deaths. This study investigated whether maternal obesity is a risk factor of fetal death (measured in terms of miscarriage and stillbirth) and neonatal mortality in South and South-East Asian countries. Methods This cross-sectional study pooled the most recent Demographic and Health Surveys (DHS) from eight South and South-East Asian countries (2014–2018). Multivariate logistic regression was deployed to check the relationships between maternal obesity with fetal and neonatal deaths. Finally, multilevel logistic regression model was employed since the DHS data has a hierarchical structure. Results The pooled logistic regression model illustrated that maternal obesity is associated with higher odds of miscarriage (adjusted odds ratio [aOR]: 1.26, 95% CI: 1.20–1.33) and stillbirths (aOR: 1.46, 95% CI: 1.27–1.67) after adjustment of confounders. Children of obese mothers were at 1.18 (aOR: 1.18, 95% CI: 1.08–1.28) times greater risk of dying during the early neonatal period than mothers with a healthy weight. However, whether maternal obesity is statistically a significant risk factor for the offspring’s late neonatal deaths was not confirmed. The significant association between maternal obesity with miscarriage, stillbirth and early neonatal mortality was further confirmed by multilevel logistic regression results. Conclusion Maternal obesity in South and South-East Asian countries is associated with a greater risk of fetal and early neonatal deaths. This finding has substantial public health implications. Strategies to prevent and reduce obesity should be developed before planning pregnancy to reduce the fetal and neonatal death burden. Obese women need to deliver at the institutional facility centre that can offer obstetrics and early neonatal care.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Jun Muratsu ◽  
Masahiko Hara ◽  
Atsuyuki Morishima ◽  
Katsuhiko Sakaguchi ◽  
Takashi Fujimoto

Abstract Background and Aims Unhealthy life-behaviors such as dietary habits, lack of exercise, drinking large amount of alcohol and smoking cause obesity, hypertension, diabetes, dyslipidemia, cardiovascular disease (CVD). These are also closely associated with chronic kidney disease (CKD). CKD is characterized by proteinuria and low glomerular filtration rate (GFR). Independent of GFR, proteinuria is an important predictor of ESKD. Few studies have assessed which is the most clinical impact among the unhealthy life-behaviors: skipping breakfast, snacking, late-night dinner, smoking, heavy alcohol intake and lack of exercise habits for proteinuria in normal renal function patients. Method This cross-sectional study included 29,780 normal renal function patients: eGFR≥60 mL/min/1.73 m2 and no history of kidney diseases who underwent health checkup at the Physical Checkup Center of Sumitomo Hospital. The endpoint of this investigation is defined as dipstick proteinuria of≥ 1+. To assess the association of life-behaviors and the presence of proteinuria, their odds ratios were calculated in adjusted univariable and multivariable logistic regression model. Multivariable logistic regression model was performed by not selected items, the same with univariable model. We would like to investigate the most impact unhealthy life-behavior for the proteinuria. Results Among 29,780 total study subjects (male: 60.3%; mean age: 49±11 years), 1,118 (3.75%) subjects were shown as urinary protein above 1+. The presence of unhealthy dietary life-behaviors: skipping breakfast, snacking and late-night dinner was 5,293 (17.3%), 3,899 (13.1%) and 11,231 (37.7%), respectively. About sleeping duration, the population of <6 hours, 6-8 hours and >8 hours were 12,027 (40.4%), 17,236 (57.9%) and 517 (1.7%). The population of exercise habits: over 3 days/weeks, 1-2 days/weeks and none were 5,138 (17.3%), 9,375 (31.5%) and 15,237 (51.3%), suggesting half of them did not have exercise habits. About smoking habits, the population of current smoking, past smoking and never smoking were shown 6,445 (21.6%), 8,459 (28.4%) and 14,876 (50.0%). In addition, about alcohol amount per day, the population of over 60g, 40-60g, 20-40g and 0-20g were 1,840 (6.18%), 4,504 (15.1%), 6,727 (22.6%) and 16,709 (56.1%). To investigate the impact of life-behavior for proteinuria, we obtained odds ratio of adjusted multivariable logistic regression model. In multivariable regression, among the life-behavior: skipping breakfast, current smoking, alcohol amount (ethanol over 60 g/day), none of exercise habits and snacking were strongly associated with the prevalence of proteinuria (skipping breakfast, adjusted odds ratio 1.45 [1.26, 1.68]; current smoking, 1.35 [1.14, 1.59]; alcohol amount (ethanol over 60 g/day), 1.35 [1.08, 1.69]; none of exercise habits, 1.29 [1.07, 1.57]; snacking, 1.23 [1.04, 1.46]). In addition, among the history of medical history, diabetes mellitus, hypertension and dyslipidemia were significantly associated with the prevalence of proteinuria (diabetes mellitus, adjusted odds ratio 2.39 [1.93, 2.96]; hypertension, 1.83 [1.53, 2.17]; 1.22 [1.03, 1.45]). Conclusion Among the unhealthy life-behaviors, skipping breakfast is the most impact factor for the presence of proteinuria.


Spinal Cord ◽  
2020 ◽  
Author(s):  
Omar Khan ◽  
Jetan H. Badhiwala ◽  
Michael G. Fehlings

Abstract Study design Retrospective analysis of prospectively collected data. Objectives Recently, logistic regression models were developed to predict independence in bowel function 1 year after spinal cord injury (SCI) on a multicenter European SCI (EMSCI) dataset. Here, we evaluated the external validity of these models against a prospectively accrued North American SCI dataset. Setting Twenty-five SCI centers in the United States and Canada. Methods Two logistic regression models developed by the EMSCI group were applied to data for 277 patients derived from three prospective multicenter SCI studies based in North America. External validation was evaluated for both models by assessing their discrimination, calibration, and clinical utility. Discrimination and calibration were assessed using ROC curves and calibration curves, respectively, while clinical utility was assessed using decision curve analysis. Results The simplified logistic regression model, which used baseline total motor score as the predictor, demonstrated the best performance, with an area under the ROC curve of 0.869 (95% confidence interval: 0.826–0.911), a sensitivity of 75.5%, and a specificity of 88.5%. Moreover, the model was well calibrated across the full range of observed probabilities and displayed superior clinical benefit on the decision curve. Conclusions A logistic regression model using baseline total motor score as a predictor of independent bowel function 1 year after SCI was successfully validated against an external dataset. These findings provide evidence supporting the use of this model to enhance the care for individuals with SCI.


2020 ◽  
Vol 35 (6) ◽  
pp. 933-933
Author(s):  
Rolin S ◽  
Kitchen Andren K ◽  
Mullen C ◽  
Kurniadi N ◽  
Davis J

Abstract Objective Previous research in a Veterans Affairs sample proposed using single items on the Neurobehavioral Symptom Inventory (NSI) to screen for anxiety (item 19) and depression (item 20). This study examined the approach in an outpatient physical medicine and rehabilitation sample. Method Participants (N = 84) underwent outpatient neuropsychological evaluation using the NSI, BDI-II, GAD-7, MMPI-2-RF, and Memory Complaints Inventory (MCI) among other measures. Anxiety and depression were psychometrically determined via cutoffs on the GAD-7 (>4) and MMPI-2-RF ANX (>64 T), and BDI-II (>13) and MMPI-2-RF RC2 (>64 T), respectively. Analyses included receiver operating characteristic analysis (ROC) and logistic regression. Logistic regression models used dichotomous anxiety and depression as outcomes and relevant NSI items and MCI average score as predictors. Results ROC analysis using NSI items to classify cases showed area under the curve (AUC) values of .77 for anxiety and .85 for depression. The logistic regression model predicting anxiety correctly classified 80% of cases with AUC of .86. The logistic regression model predicting depression correctly classified 79% of cases with AUC of .88. Conclusion Findings support the utility of NSI anxiety and depression items as screening measures in a rehabilitation population. Consideration of symptom validity via the MCI improved classification accuracy of the regression models. The approach may be useful in other clinical settings for quick assessment of psychological issues warranting further evaluation.


Sign in / Sign up

Export Citation Format

Share Document