Nonlinear logistic regression mixture experiment modeling for binary data using dimensionally reduced components

Students’ persistence is the ability of students to survive in carrying out the study. In Universitas Terbuka (UT), there are no real dropped out student, but there are considered as non-active or non persistence students. Length of study time among UT’s students can be divided into binary data categories, which are valued as persistence (1) and non persistence (0). Logistic regression analysis is one type of statistical data analysis to be used for binary data. The purposes of writing this article are to identify the factors which influence the length of study time among students of the Department of Management, Faculty of Economics in UT, and to determine appropriate model in order to explain the relationship between the response variables (length of study time) with explanatory variables using logistic regression. The method used in this research is a case study with a number of samples as 2,936 college students. The result of the study shows that the factors influence the length of study time with alpha levels 0.05 are: age, the number of the courses taken, the employment status of the student, the participation in tutorials, the first semester achievement index, and the cumulative grade point.

Download Full-text

Marginal logistic regression for spatially clustered binary data

Journal of the Royal Statistical Society Series C (Applied Statistics) ◽

10.1111/rssc.12270 ◽

2018 ◽

Vol 67 (4) ◽

pp. 939-959 ◽

Cited By ~ 1

Author(s):

Manuela Cattelan ◽

Cristiano Varin

Keyword(s):

Logistic Regression ◽

Binary Data ◽

Clustered Binary Data

Download Full-text

Optimal designs for binary data under logistic regression

Journal of Statistical Planning and Inference ◽

10.1016/s0378-3758(00)00173-7 ◽

2001 ◽

Vol 93 (1-2) ◽

pp. 295-307 ◽

Cited By ~ 35

Author(s):

Thomas Mathew ◽

Bikas Kumar Sinha

Keyword(s):

Logistic Regression ◽

Binary Data ◽

Optimal Designs

Download Full-text

Statistical analysis of correlated binary data in ophthalmology: a weighted logistic regression approach

Ophthalmic Epidemiology ◽

10.1076/opep.5.3.117.8365 ◽

1998 ◽

Vol 5 (3) ◽

pp. 117-131 ◽

Cited By ~ 6

Author(s):

Maria Léa Corrêa Leite ◽

Alfredo Nicolosi

Keyword(s):

Logistic Regression ◽

Statistical Analysis ◽

Binary Data ◽

Correlated Binary Data ◽

Regression Approach ◽

Weighted Logistic Regression

Download Full-text

072009 (M10) Logistic regression for correlated binary data

Insurance Mathematics and Economics ◽

10.1016/0167-6687(95)97076-7 ◽

1995 ◽

Vol 16 (3) ◽

pp. 267

Keyword(s):

Logistic Regression ◽

Binary Data ◽

Correlated Binary Data

Download Full-text

A Logistic Regression Mixture Model for Interval Mapping of Genetic Trait Loci Affecting Binary Phenotypes

Genetics ◽

10.1534/genetics.105.047241 ◽

2005 ◽

Vol 172 (2) ◽

pp. 1349-1358 ◽

Cited By ~ 8

Author(s):

Weiping Deng ◽

Hanfeng Chen ◽

Zhaohai Li

Keyword(s):

Logistic Regression ◽

Mixture Model ◽

Interval Mapping ◽

Genetic Trait ◽

Trait Loci ◽

Regression Mixture

Download Full-text

Reliability of Pharmacodynamic Analysis by Logistic Regression

Anesthesiology ◽

10.1097/00000542-200312000-00005 ◽

2003 ◽

Vol 99 (6) ◽

pp. 1255-1262 ◽

Cited By ~ 22

Author(s):

Wei Lu ◽

James G. Ramsay ◽

James M. Bailey

Keyword(s):

Logistic Regression ◽

Coefficient Of Variation ◽

Binary Data ◽

Sparse Data ◽

Unbiased Estimation ◽

Data Sets ◽

Data Set ◽

True Value ◽

Data Points ◽

Data Point

Background Many pharmacologic studies record data as binary, yes-or-no, variables with analysis using logistic regression. In a previous study, it was shown that estimates of C50, the drug concentration associated with a 50% probability of drug effect, were unbiased, whereas estimates of gamma, the term describing the steepness of the concentration-effect relationship, were biased when sparse data were naively pooled for analysis. In this study, it was determined whether mixed-effects analysis improved the accuracy of parameter estimation. Methods Pharmacodynamic studies with binary, yes-or-no, responses were simulated and analyzed with NONMEM. The bias and coefficient of variation of C50 and gamma estimates were determined as a function of numbers of patients in the simulated study, the number of simulated data points per patient, and the "true" value of gamma. In addition, 100 sparse binary human data sets were generated from an evaluation of midazolam for postoperative sedation of adult patients undergoing cardiac surgery by random selection of a single data point (sedation score vs. midazolam plasma concentration) from each of the 30 patients in the study. C50 and gamma were estimated for each of these data sets by using NONMEM and were compared with the estimates from the complete data set of 656 observations. Results Estimates of C50 were unbiased, even for sparse data (one data point per patient) with coefficients of variation of 30-50%. Estimates of gamma were highly biased for sparse data for all values of gamma greater than 1, and the value of gamma was overestimated. Unbiased estimation of gamma required 10 data points per patient. The coefficient of variation of gamma estimates was greater than that of the C50 estimates. Clinical data for sedation with midazolam confirmed the simulation results, showing an overestimate of gamma with sparse data. Conclusion Although accurate estimations of C50 from sparse binary data are possible, estimates of gamma are biased. Data with 10 or more observations per patient is necessary for accurate estimations of gamma.

Download Full-text

Developing drought impact functions for drought risk management

Natural Hazards and Earth System Science ◽

10.5194/nhess-17-1947-2017 ◽

2017 ◽

Vol 17 (11) ◽

pp. 1947-1960 ◽

Cited By ~ 16

Author(s):

Sophie Bachmair ◽

Cecilia Svensson ◽

Ilaria Prosdocimi ◽

Jamie Hannaford ◽

Kerstin Stahl

Keyword(s):

Risk Management ◽

Logistic Regression ◽

Random Forest ◽

Binary Data ◽

Hurdle Model ◽

Drought Risk ◽

Drought Impact ◽

Drought Impacts ◽

The Impact ◽

Drought Risk Management

Abstract. Drought management frameworks are dependent on methods for monitoring and prediction, but quantifying the hazard alone is arguably not sufficient; the negative consequences that may arise from a lack of precipitation must also be predicted if droughts are to be better managed. However, the link between drought intensity, expressed by some hydrometeorological indicator, and the occurrence of drought impacts has only recently begun to be addressed. One challenge is the paucity of information on ecological and socioeconomic consequences of drought. This study tests the potential for developing empirical drought impact functions based on drought indicators (Standardized Precipitation and Standardized Precipitation Evaporation Index) as predictors and text-based reports on drought impacts as a surrogate variable for drought damage. While there have been studies exploiting textual evidence of drought impacts, a systematic assessment of the effect of impact quantification method and different functional relationships for modeling drought impacts is missing. Using Southeast England as a case study we tested the potential of three different data-driven models for predicting drought impacts quantified from text-based reports: logistic regression, zero-altered negative binomial regression (hurdle model), and an ensemble regression tree approach (random forest). The logistic regression model can only be applied to a binary impact/no impact time series, whereas the other two models can additionally predict the full counts of impact occurrence at each time point. While modeling binary data results in the lowest prediction uncertainty, modeling the full counts has the advantage of also providing a measure of impact severity, and the counts were found to be reasonably predictable. However, there were noticeable differences in skill between modeling methodologies. For binary data the logistic regression and the random forest model performed similarly well based on leave-one-out cross validation. For count data the random forest outperformed the hurdle model. The between-model differences occurred for total drought impacts and for two subsets of impact categories (water supply and freshwater ecosystem impacts). In addition, different ways of defining the impact counts were investigated and were found to have little influence on the prediction skill. For all models we found a positive effect of including impact information of the preceding month as a predictor in addition to the hydrometeorological indicators. We conclude that, although having some limitations, text-based reports on drought impacts can provide useful information for drought risk management, and our study showcases different methodological approaches to developing drought impact functions based on text-based data.

Download Full-text