binary dependent variable
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 16)

H-INDEX

8
(FIVE YEARS 2)

2022 ◽  
Vol 10 (4) ◽  
pp. 617-623
Author(s):  
Silvia Elsa Suryana ◽  
Budi Warsito ◽  
Suparti Suparti

Telemarketing is another form of marketing which is conducted via telephone. Bank can use telemarketing to offer its products such as term deposit. One of the most important strategy to the success of telemarketing is opting the potential customer to create effective telemarketing. Predicting the success of telemarketing can use machine learning. Gradient boosting is machine learning method with advanced decision tree. Gardient boosting involves many classification trees which are continually upgraded from previous tree. The optimal classification result cannot be separated from the role of the optimal hyperparameter.  Hyperopt is Python library that can be used to tune hyperparameter effectively because it uses Bayesian optimization. Hyperopt uses hyperparameter prior distribution to find optimal hyperparameter. Data in this study including 20 independent variables and binary dependent variable which has ‘yes’ and ‘no’ classes. The study showed that gradient boosting reached classification accuracy up to 90,39%, precision 94,91%, and AUC 0,939. These values describe gradient boosting method is able to predict both classes ‘yes’ and ‘no’ relatively accurate.


2021 ◽  
pp. 131-232
Author(s):  
Thanh V. Tran ◽  
Keith T. Chan

This chapter reviews the basic ideas of logistic regression involving a binary dependent regressed on independent variables, along with assumptions for analysis and interpretations of results. It provides strategies and practical guides for data analysis using Stata and explains the basic assumptions of logistic regression and its applications for cross-cultural data analysis. The chapter also provides examples of logistic regression models for cross-cultural comparison, and outlines the techniques for testing the equivalence of effects across groups. The text includes examples of charts and graphs that can be used to explain differences in effects across cultural groups.


2021 ◽  
Vol 29 (1) ◽  
Author(s):  
Hezlin Aryani Abd Rahman ◽  
Yap Bee Wah ◽  
Ong Seng Huat

Logistic regression is often used for the classification of a binary categorical dependent variable using various types of covariates (continuous or categorical). Imbalanced data will lead to biased parameter estimates and classification performance of the logistic regression model. Imbalanced data occurs when the number of cases in one category of the binary dependent variable is very much smaller than the other category. This simulation study investigates the effect of imbalanced data measured by imbalanced ratio on the parameter estimate of the binary logistic regression with a categorical covariate. Datasets were simulated with controlled different percentages of imbalance ratio (IR), from 1% to 50%, and for various sample sizes. The simulated datasets were then modeled using binary logistic regression. The bias in the estimates was measured using MSE (Mean Square Error). The simulation results provided evidence that the effect of imbalance ratio on the parameter estimate of the covariate decreased as sample size increased. The bias of the estimates depended on sample size whereby for sample size 100, 500, 1000 – 2000 and 2500 – 3500, the estimates were biased for IR below 30%, 10%, 5% and 2% respectively. Results also showed that parameter estimates were all biased at IR 1% for all sample size. An application using a real dataset supported the simulation results.


Author(s):  
Liam F. Beiser-McGrath

Abstract When separation is a problem in binary dependent variable models, many researchers use Firth's penalized maximum likelihood in order to obtain finite estimates (Firth, 1993; Zorn, 2005; Rainey, 2016). In this paper, I show that this approach can lead to inferences in the opposite direction of the separation when the number of observations are sufficiently large and both the dependent and independent variables are rare events. As large datasets with rare events are frequently used in political science, such as dyadic data measuring interstate relations, a lack of awareness of this problem may lead to inferential issues. Simulations and an empirical illustration show that the use of independent “weakly-informative” prior distributions centered at zero, for example, the Cauchy prior suggested by Gelman et al. (2008), can avoid this issue. More generally, the results caution researchers to be aware of how the choice of prior interacts with the structure of their data, when estimating models in the presence of separation.


2020 ◽  
Author(s):  
Eric Arnaud Diendéré ◽  
Apoline Kognimisson Sondo/Ouédraogo ◽  
ismael Diallo ◽  
Absetou Ky/Ba ◽  
Toussaint Rouamba ◽  
...  

Abstract Background The factors that expose the severity of dengue are still controversial, particularly the relationship between severe dengue and secondary dengue. More importantly, the severity of dengue infection remains poorly studied in Africa. The objective of this study was to compare severity signs between patients with primary and secondary dengue infection during the 2016 dengue outbreak in Burkina Faso.Methods This was a cross-sectional study through a retrospective examination of patient medical records managed in Ouagadougou for dengue fever from 1 January 2015 to 31 December 2017. All health facilities with the capacity to perform dengue diagnosis in Ouagadougou were considered in the survey. Primary dengue was defined as the presence of AgNS1 and/or IgM and secondary dengue as the presence of IgG associated with one of these two markers. Patients with only IgG were excluded. Univariate and multivariable analyzes were performed using a logistic regression with dengue infection (primary or secondary dengue) as the binary dependent variable. The statistical significant level was set at 0.05.Results Of the 811 patients managed for dengue fever during the study period, 418 (51.5%) were male. Thirty-five patients (4.3%) had primary dengue infection (AgNS1 + and / or IgM + with negative IgG) and seven hundred seventy-six (776) patients (95.7%) had secondary dengue infection. 245 patients (30.2%) experienced severe signs. Renal failure (13.1%) was the main sign of severity, followed by severe bleeding (10.6%). In univariate analysis, severe bleeding were associated with primary dengue infection (OR = 2.65, 95%IC: 1.16 -6.03, p = 0.01). Twenty-four deaths (9.8%) were reported during the period. Conclusion Signs of gravity can occur during primary dengue fever. This study highlight the need to conduct more studies on the severity factors of dengue fever.


2020 ◽  
Vol 28 (4) ◽  
Author(s):  
Hezlin Aryani Abd Rahman ◽  
Yap Bee Wah ◽  
Ong Seng Huat

Logistic regression is often used for the classification of a binary categorical dependent variable using various types of covariates (continuous or categorical). Imbalanced data will lead to biased parameter estimates and classification performance of the logistic regression model. Imbalanced data occurs when the number of cases in one category of the binary dependent variable is very much smaller than the other category. This simulation study investigates the effect of imbalanced data measured by imbalanced ratio on the parameter estimate of the binary logistic regression with a categorical covariate. Datasets were simulated with controlled different percentages of imbalance ratio (IR), from 1% to 50%, and for various sample sizes. The simulated datasets were then modeled using binary logistic regression. The bias in the estimates was measured using Mean Square Error (MSE). The simulation results provided evidence that the effect of imbalance ratio on the parameter estimate of the covariate decreased as sample size increased. The bias of the estimated depends on sample size whereby for sample size 100, 500, 1000 - 2000 and 2500 - 3500, the estimated were biased for IR below 30%, 10%, 5% and 2% respectively. Results also showed that parameter estimates were all biased at IR 1% for all sample size. An application using a real dataset supported the simulation results.


2020 ◽  
Vol 231 (11) ◽  
Author(s):  
André Cardoso Mühlig ◽  
Otto Klemm ◽  
Fábio Luiz Teixeira Gonçalves

AbstractThis study investigates the long-term development of fog occurrences in the Metropolitan Area of São Paulo (MASP). Specifically, it analyzes the roles of meteorological and air quality parameters as potential drivers for fog formation. A dataset reaching back to the year 1933 shows that the overall trends of the annual fog occurrences (AFO) coincide with those of the annual mean temperature. Air quality data have been available since 1998, allowing us to perform a statistical analysis of the contributions of meteorology and air quality to AFO for the period from 1998 to 2018. The logistic regression model shows that the binary dependent variable (daily fog occurrence, FO) is explained by its independent predictors PM10, relative humidity (rH), and daily minimum temperature (Tmin), in that order. FO was not found to be significantly influenced by atmospheric pressure (aP) and nitrogen oxides (NOx). While the influence of SO2 was minor and associated with less confidence, it was negative. Potential causes for these surprising results are discussed. We conclude that the parameters PM10, rH, and Tmin are significant drivers of fog formation in the MASP, whereby the total explanatory power of the drivers for the dichotomous variable FO is 16%.


Sign in / Sign up

Export Citation Format

Share Document