ordinal response
Recently Published Documents


TOTAL DOCUMENTS

131
(FIVE YEARS 28)

H-INDEX

18
(FIVE YEARS 1)

2021 ◽  
Author(s):  
◽  
Young-Min Kwon

<p>This thesis illustrates statistical methodology for identifying the effects of explanatory variables, for the response variables with an ordinal nature. The dataset applied to this methodology is a Listening Strategy dataset collected by The Language Learner Strategy Team at the National Institute of Education from Singapore. In this dataset, eight strategies were formed from 38 questions based on Linguistic theory. The core objective of this thesis is to validate whether 38 questions were aggregated appropriately. We use the proportional odds model, which is the most popular for ordinal responses, and the generalised estimating equations (GEE) method to analyse repeated measurements. Although there are several ways to analyse repeated categorical responses, this thesis only demonstrates the marginal approach using the GEE method. By fitting proportional odds models, we evaluate whether student’s English Language test result associated with the questions are at the same level within each strategy. Results show that the English Language test result effects for the questions associated with Self-initiation, Planning, Monitoring and Evaluating, Prediction and Utilisation strategies are similar. On the other hand, the effects for the questions associated with Perceptual processing, Inferencing and Socio-affective strategies are significantly different. We also use a simulation study to show that when the ordinal response is treated as continuous, ordinary least square regression might have misleading results.</p>


2021 ◽  
Author(s):  
◽  
Young-Min Kwon

<p>This thesis illustrates statistical methodology for identifying the effects of explanatory variables, for the response variables with an ordinal nature. The dataset applied to this methodology is a Listening Strategy dataset collected by The Language Learner Strategy Team at the National Institute of Education from Singapore. In this dataset, eight strategies were formed from 38 questions based on Linguistic theory. The core objective of this thesis is to validate whether 38 questions were aggregated appropriately. We use the proportional odds model, which is the most popular for ordinal responses, and the generalised estimating equations (GEE) method to analyse repeated measurements. Although there are several ways to analyse repeated categorical responses, this thesis only demonstrates the marginal approach using the GEE method. By fitting proportional odds models, we evaluate whether student’s English Language test result associated with the questions are at the same level within each strategy. Results show that the English Language test result effects for the questions associated with Self-initiation, Planning, Monitoring and Evaluating, Prediction and Utilisation strategies are similar. On the other hand, the effects for the questions associated with Perceptual processing, Inferencing and Socio-affective strategies are significantly different. We also use a simulation study to show that when the ordinal response is treated as continuous, ordinary least square regression might have misleading results.</p>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yiran Zhang ◽  
Kellie J. Archer

Abstract Background Acute myeloid leukemia (AML) is a heterogeneous cancer of the blood, though specific recurring cytogenetic abnormalities in AML are strongly associated with attaining complete response after induction chemotherapy, remission duration, and survival. Therefore recurring cytogenetic abnormalities have been used to segregate patients into favorable, intermediate, and adverse prognostic risk groups. However, it is unclear how expression of genes is associated with these prognostic risk groups. We postulate that expression of genes monotonically associated with these prognostic risk groups may yield important insights into leukemogenesis. Therefore, in this paper we propose penalized Bayesian ordinal response models to predict prognostic risk group using gene expression data. We consider a double exponential prior, a spike-and-slab normal prior, a spike-and-slab double exponential prior, and a regression-based approach with variable inclusion indicators for modeling our high-dimensional ordinal response, prognostic risk group, and identify genes through hypothesis tests using Bayes factor. Results Gene expression was ascertained using Affymetrix HG-U133Plus2.0 GeneChips for 97 favorable, 259 intermediate, and 97 adverse risk AML patients. When applying our penalized Bayesian ordinal response models, genes identified for model inclusion were consistent among the four different models. Additionally, the genes included in the models were biologically plausible, as most have been previously associated with either AML or other types of cancer. Conclusion These findings demonstrate that our proposed penalized Bayesian ordinal response models are useful for performing variable selection for high-dimensional genomic data and have the potential to identify genes relevantly associated with an ordinal phenotype.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0240948
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
William D. Beavis

In soybean variety development and genetic improvement projects, iron deficiency chlorosis (IDC) is visually assessed as an ordinal response variable. Linear Mixed Models for Genomic Prediction (GP) have been developed, compared, and used to select continuous plant traits such as yield, height, and maturity, but can be inappropriate for ordinal traits. Generalized Linear Mixed Models have been developed for GP of ordinal response variables. However, neither approach addresses the most important questions for cultivar development and genetic improvement: How frequently are the ‘wrong’ genotypes retained, and how often are the ‘correct’ genotypes discarded? The research objective reported herein was to compare outcomes from four data modeling and six algorithmic modeling GP methods applied to IDC using decision metrics appropriate for variety development and genetic improvement projects. Appropriate metrics for decision making consist of specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. Data modeling methods for GP included ridge regression, logistic regression, penalized logistic regression, and Bayesian generalized linear regression. Algorithmic modeling methods include Random Forest, Gradient Boosting Machine, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, and Artificial Neural Network. We found that a Support Vector Machine model provided the most specific decisions of correctly discarding IDC susceptible genotypes, while a Random Forest model resulted in the best decisions of retaining IDC tolerant genotypes, as well as the best outcomes when considering all decision metrics. Overall, the predictions from algorithmic modeling result in better decisions than from data modeling methods applied to soybean IDC.


2021 ◽  
Author(s):  
Yiran Zhang ◽  
Kellie J. Archer

Abstract Background: Acute myeloid leukemia (AML) is a heterogeneous cancer of the blood, though specific recurring cytogenetic abnormalities in AML strongly are associated with attaining complete response after induction chemotherapy, remission duration, and survival. Therefore recurring cytogenetic abnormalities have been used to segregate patients into favorable, intermediate, and adverse prognostic risk groups. However, it is unclear how expression of genes is associated with these prognostic risk groups. We postulate that expression of genes monotonically associated with these prognostic risk groups may yield important insights into leukemogenesis. Therefore, in this paper we propose penalized Bayesian ordinal response models to predict prognostic risk group using gene expression data. We consider a double exponential prior, a spike-and-slab normal prior, a spike-and-slab double exponential prior, and a regression-based approach with variable inclusion indicators for modeling our high-dimensional ordinal response, prognostic risk group, and identify genes through hypothesis tests using Bayes Factor. Results: Gene expression was ascertained using Affymetrix HG-U133Plus2.0 GeneChips for 97 favorable, 259 intermediate, and 97 adverse risk AML patients. When applying our penalized Bayesian ordinal response models, genes identified for model inclusion were consistent among the four different models. Additionally, the genes included in the models were biologically plausible, as most have been previously associated with either AML or other types of cancer. Conclusion: These findings demonstrate that our proposed penalized Bayesian ordinal response models are useful for performing variable selection for high-dimensional genomic data and have the potential to identify genes relevantly associated with an ordinal phenotype.


2021 ◽  
Vol 11 (10) ◽  
pp. 4572
Author(s):  
Lenka Červená ◽  
Pavel Kříž ◽  
Jan Kohout ◽  
Martin Vejvar ◽  
Ludmila Verešpejová ◽  
...  

This paper focuses on the statistical analysis of mimetic muscle rehabilitation after head and neck surgery causing facial paresis in patients after head and neck surgery. Our work deals with a classificationan evaluation problem of mimetic muscle rehabilitation that is observed by a Kinect stereo-vision camera. After a specific brain surgery, patients are often affected by face palsy, and rehabilitation to renew mimetic muscle innervation takes several months. It is important to be able to observe the rehabilitation process in an objective way. The most commonly used House–Brackmann (HB) scale is based on the clinician’s subjective opinion. This paper compares different methods of supervised learning classification that should be independent of the clinician’s opinion. We compare a parametric model (based on logistic regression), non-parametric model (based on random forests), and neural networks. The classification problem that we have studied combines a limited dataset (it contains only 122 measurements of 93 patients) of complex observations (each measurement consists of a collection of time curves) with an ordinal response variable (four HB grades are considered). To balance the frequencies of the considered classes in our data set, we reclassified the samples from HB4 to HB3 and HB5 to HB6—it means that only four HB grades are used for classification algorithm. The parametric statistical model was found to be the most suitable thanks to its stability, tractability, and reasonable performance in terms of both accuracy and precision.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Erin M. Schliep ◽  
Toryn L. J. Schafer ◽  
Matthew Hawkey

Abstract Subjective wellness data can provide important information on the well-being of athletes and be used to maximize player performance and detect and prevent against injury. Wellness data, which are often ordinal and multivariate, include metrics relating to the physical, mental, and emotional status of the athlete. Training and recovery can have significant short- and long-term effects on athlete wellness, and these effects can vary across individual. We develop a joint multivariate latent factor model for ordinal response data to investigate the effects of training and recovery on athlete wellness. We use a latent factor distributed lag model to capture the cumulative effects of training and recovery through time. Current efforts using subjective wellness data have averaged over these metrics to create a univariate summary of wellness, however this approach can mask important information in the data. Our multivariate model leverages each ordinal variable and can be used to identify the relative importance of each in monitoring athlete wellness. The model is applied to professional referee daily wellness, training, and recovery data collected across two Major League Soccer seasons.


Author(s):  
Ioannis Ntzoufras ◽  
Vasilis Palaskas ◽  
Sotiris Drikos

Abstract We study and develop Bayesian models for the analysis of volleyball match outcomes as recorded by the set-difference. Due to the peculiarity of the outcome variable (set-difference) which takes discrete values from $-3$ to $3$, we cannot consider standard models based on the usual Poisson or binomial assumptions used for other sports such as football/soccer. Hence, the first and foremost challenge was to build models appropriate for the set-difference of each volleyball match. Here we consider two major approaches: (a) an ordered multinomial logistic regression model and (b) a model based on a truncated version of the Skellam distribution. For the first model, we consider the set-difference as an ordinal response variable within the framework of multinomial logistic regression models. Concerning the second model, we adjust the Skellam distribution to account for the volleyball rules. We fit and compare both models with the same covariate structure as in Karlis & Ntzoufras (2003). Both models are fitted, illustrated and compared within Bayesian framework using data from both the regular season and the play-offs of the season 2016/17 of the Greek national men’s volleyball league A1.


Sign in / Sign up

Export Citation Format

Share Document