binary variables
Recently Published Documents


TOTAL DOCUMENTS

248
(FIVE YEARS 48)

H-INDEX

24
(FIVE YEARS 1)

2022 ◽  
Vol 12 ◽  
Author(s):  
Yi Sheng ◽  
Qing Yi ◽  
Miguel-Ángel Gómez-Ruano ◽  
Peijie Chen

The purpose of this study was to identify the effects of the technical and context-related variables of last strokes in rallies on the point outcomes of both men’s and women’s players in elite singles badminton matches. A total of 100 matches during the 2018 and 2019 seasons were analyzed, and the data of 4,080 men’s rallies and 4,339 women’s rallies were collected. The technical variables including strokes per rally, forehand strokes, overhead strokes, and defensive action, and the context-related variables including game status, result against serve, importance of rally, and importance of set, were modeled with Probit regression modeling as the predictor variables. The binary variables of “winner or not” and “error or not” were considered the response variables. The results showed that defensive actions had the greatest impacts on the winners and errors of both the men’s and women’s singles players, and the forehand and overhead strokes were negatively associated with the winners and errors of the women’s singles players and the winners of the men’s singles players. No significant effects were found for the strokes per rally on the winners and errors of the men’s singles players, while significant effects were found for the women’s singles players. The context-related variables appeared to have positive effects on the winners and negative effects on the errors of both sexes. These findings can provide important insights for coaches and players to evaluate their performances of last strokes in rallies and to improve training interventions and match tactics and strategies.


Author(s):  
John Alasdair Warwicker ◽  
Steffen Rebennack

The problem of fitting continuous piecewise linear (PWL) functions to discrete data has applications in pattern recognition and engineering, amongst many other fields. To find an optimal PWL function, the positioning of the breakpoints connecting adjacent linear segments must not be constrained and should be allowed to be placed freely. Although the univariate PWL fitting problem has often been approached from a global optimisation perspective, recently, two mixed-integer linear programming approaches have been presented that solve for optimal PWL functions. In this paper, we compare the two approaches: the first was presented by Rebennack and Krasko [Rebennack S, Krasko V (2020) Piecewise linear function fitting via mixed-integer linear programming. INFORMS J. Comput. 32(2):507–530] and the second by Kong and Maravelias [Kong L, Maravelias CT (2020) On the derivation of continuous piecewise linear approximating functions. INFORMS J. Comput. 32(3):531–546]. Both formulations are similar in that they use binary variables and logical implications modelled by big-[Formula: see text] constructs to ensure the continuity of the PWL function, yet the former model uses fewer binary variables. We present experimental results comparing the time taken to find optimal PWL functions with differing numbers of breakpoints across 10 data sets for three different objective functions. Although neither of the two formulations is superior on all data sets, the presented computational results suggest that the formulation presented by Rebennack and Krasko is faster. This might be explained by the fact that it contains fewer complicating binary variables and sparser constraints. Summary of Contribution: This paper presents a comparison of the mixed-integer linear programming models presented in two recent studies published in the INFORMS Journal on Computing. Because of the similarity of the formulations of the two models, it is not clear which one is preferable. We present a detailed comparison of the two formulations, including a series of comparative experimental results across 10 data sets that appeared across both papers. We hope that our results will allow readers to take an objective view as to which implementation they should use.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yasuharu Okamoto

AbstractBy using the Ising model formulation for combinatorial optimization with 0–1 binary variables, we investigated the extent to which partisan gerrymandering is possible from a random but even distribution of supporters. Assuming that an electoral district consists of square subareas and that each subarea shares at least one edge with other subareas in the district, it was possible to find the most tilted assignment of seats in most cases. However, in cases where supporters' distribution included many enclaves, the maximum tilted assignment was usually found to fail. We also discussed the proposed algorithm is applicable to other fields such as the redistribution of delivery destinations.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Sonja Hartnack ◽  
Malgorzata Roos

Abstract Background One of the emerging themes in epidemiology is the use of interval estimates. Currently, three interval estimates for confidence (CI), prediction (PI), and tolerance (TI) are at a researcher's disposal and are accessible within the open access framework in R. These three types of statistical intervals serve different purposes. Confidence intervals are designed to describe a parameter with some uncertainty due to sampling errors. Prediction intervals aim to predict future observation(s), including some uncertainty present in the actual and future samples. Tolerance intervals are constructed to capture a specified proportion of a population with a defined confidence. It is well known that interval estimates support a greater knowledge gain than point estimates. Thus, a good understanding and the use of CI, PI, and TI underlie good statistical practice. While CIs are taught in introductory statistical classes, PIs and TIs are less familiar. Results In this paper, we provide a concise tutorial on two-sided CI, PI and TI for binary variables. This hands-on tutorial is based on our teaching materials. It contains an overview of the meaning and applicability from both a classical and a Bayesian perspective. Based on a worked-out example from veterinary medicine, we provide guidance and code that can be directly applied in R. Conclusions This tutorial can be used by others for teaching, either in a class or for self-instruction of students and senior researchers.


Author(s):  
Judith J. M. Rijnhart ◽  
Matthew J. Valente ◽  
Heather L. Smyth ◽  
David P. MacKinnon

AbstractMediation analysis is an important statistical method in prevention research, as it can be used to determine effective intervention components. Traditional mediation analysis defines direct and indirect effects in terms of linear regression coefficients. It is unclear how these traditional effects are estimated in settings with binary variables. An important recent methodological advancement in the mediation analysis literature is the development of the causal mediation analysis framework. Causal mediation analysis defines causal effects as the difference between two potential outcomes. These definitions can be applied to any mediation model to estimate natural direct and indirect effects, including models with binary variables and an exposure–mediator interaction. This paper aims to clarify the similarities and differences between the causal and traditional effect estimates for mediation models with a binary mediator and a binary outcome. Causal and traditional mediation analyses were applied to an empirical example to demonstrate these similarities and differences. Causal and traditional mediation analysis provided similar controlled direct effect estimates, but different estimates of the natural direct effects, natural indirect effects, and total effect. Traditional mediation analysis methods do not generalize well to mediation models with binary variables, while the natural effect definitions can be applied to any mediation model. Causal mediation analysis is therefore the preferred method for the analysis of mediation models with binary variables.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yingxin Liu ◽  
Shiyu Zhou ◽  
Hongxia Wei ◽  
Shengli An

Abstract Background As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF. Methods In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction. Results Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets. For variable selection performance, When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term. The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation. When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF. When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF Conclusions All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.


2021 ◽  
Vol 17 (8) ◽  
pp. e1009275
Author(s):  
Xiaochuan Zhao ◽  
Germán Plata ◽  
Purushottam D. Dixit

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers’ identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same ‘bath’ whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using several datasets spanning several time- and length-scales.


Author(s):  
Alessandro Maria Selvitella ◽  
Julio J. Valdés

In this paper, we discuss the problem of estimating the minimum error reachable by a regression model given a dataset, prior to learning. More specifically, we extend the Gamma Test estimates of the variance of the noise from the continuous case to the binary case. We give some heuristics for further possible extensions of the theory in the continuous case with the [Formula: see text]-norm and conclude with some applications and simulations. From the point of view of machine learning, the result is relevant because it gives conditions under which there is no need to learn the model in order to predict the best possible performance.


2021 ◽  
pp. 1-36
Author(s):  
Nicola Bulso ◽  
Yasser Roudi

We study the type of distributions that restricted Boltzmann machines (RBMs) with different activation functions can express by investigating the effect of the activation function of the hidden nodes on the marginal distribution they impose on observed bi nary nodes. We report an exact expression for these marginals in the form of a model of interacting binary variables with the explicit form of the interactions depending on the hidden node activation function. We study the properties of these interactions in detail and evaluate how the accuracy with which the RBM approximates distributions over binary variables depends on the hidden node activation function and the number of hidden nodes. When the inferred RBM parameters are weak, an intuitive pattern is found for the expression of the interaction terms, which reduces substantially the differences across activation functions. We show that the weak parameter approximation is a good approximation for different RBMs trained on the MNIST data set. Interestingly, in these cases, the mapping reveals that the inferred models are essentially low order interaction models.


Sign in / Sign up

Export Citation Format

Share Document