model evaluation
Recently Published Documents


TOTAL DOCUMENTS

1614
(FIVE YEARS 370)

H-INDEX

72
(FIVE YEARS 12)

2022 ◽  
Vol 50 (1) ◽  
pp. 030006052110676
Author(s):  
Xing Liu ◽  
Abai Xu ◽  
Jingwen Huang ◽  
Haiyan Shen ◽  
Yazhen Liu

Objective To begin to understand how to prevent deep vein thrombosis (DVT) after an innovative operation termed intracorporeal laparoscopic reconstruction of detenial sigmoid neobladder, we explored the factors that influence DVT following surgery, with the aim of constructing a model for predicting DVT occurrence. Methods This retrospective study included 151 bladder cancer patients who underwent intracorporeal laparoscopic reconstruction of detenial sigmoid neobladder. Data describing general clinical characteristics and other common parameters were collected and analyzed. Thereafter, we generated model evaluation curves and finally cross-validated their extrapolations. Results Age and body mass index were risk factors for DVT, whereas postoperative use of hemostatic agents and postoperative passive muscle massage were significant protective factors. Model evaluation curves showed that the model had high accuracy and little bias. Cross-validation affirmed the accuracy of our model. Conclusion The prediction model constructed herein was highly accurate and had little bias; thus, it can be used to predict the likelihood of developing DVT after surgery.


MAUSAM ◽  
2021 ◽  
Vol 66 (3) ◽  
pp. 433-444
Author(s):  
SOMA SENROY ◽  
SUBHENDU BRATASAHA ◽  
ANANDA KUMARDAS ◽  
S.K.ROY BHOWMIK ◽  
P.K. KUNDU

2021 ◽  
Author(s):  
María Andrea Cruz Blandón ◽  
Alejandrina Cristia ◽  
Okko Räsänen

Computational models of child language development can help us understand the cognitive underpinnings of the language learning process. One advantage of computational modeling is that is has the potential to address multiple aspects of language learning within a single learning architecture. If successful, such integrated models would help to pave the way for a more comprehensive and mechanistic understanding of language development. However, in order to develop more accurate, holistic, and hence impactful models of infant language learning, the research on models also requires model evaluation practices that allow comparison of model behavior to empirical data from infants across a range of language capabilities. Moreover, there is a need for practices that can compare developmental trajectories of infants to those of models as a function of language experience. The present study aims to take the first steps to address these needs. More specifically, we will introduce the concept of comparing models with large-scale cumulative empirical data from infants, as quantified by meta-analyses conducted across a large number of individual behavioral studies. We start by formalizing the connection between measurable model and human behavior, and then present a basic conceptual framework for meta-analytic evaluation of computational models together with basic guidelines intended as a starting point for later work in this direction. We exemplify the meta-analytic model evaluation approach with two modeling experiments on infant-directed speech preference and native/non-native vowel discrimination. We also discuss the advantages, challenges, and potential future directions of meta-analytic evaluation practices.


2021 ◽  
Vol 14 (12) ◽  
pp. 7545-7571
Author(s):  
Tom Gleeson ◽  
Thorsten Wagener ◽  
Petra Döll ◽  
Samuel C. Zipper ◽  
Charles West ◽  
...  

Abstract. Continental- to global-scale hydrologic and land surface models increasingly include representations of the groundwater system. Such large-scale models are essential for examining, communicating, and understanding the dynamic interactions between the Earth system above and below the land surface as well as the opportunities and limits of groundwater resources. We argue that both large-scale and regional-scale groundwater models have utility, strengths, and limitations, so continued modeling at both scales is essential and mutually beneficial. A crucial quest is how to evaluate the realism, capabilities, and performance of large-scale groundwater models given their modeling purpose of addressing large-scale science or sustainability questions as well as limitations in data availability and commensurability. Evaluation should identify if, when, or where large-scale models achieve their purpose or where opportunities for improvements exist so that such models better achieve their purpose. We suggest that reproducing the spatiotemporal details of regional-scale models and matching local data are not relevant goals. Instead, it is important to decide on reasonable model expectations regarding when a large-scale model is performing “well enough” in the context of its specific purpose. The decision of reasonable expectations is necessarily subjective even if the evaluation criteria are quantitative. Our objective is to provide recommendations for improving the evaluation of groundwater representation in continental- to global-scale models. We describe current modeling strategies and evaluation practices, and we subsequently discuss the value of three evaluation strategies: (1) comparing model outputs with available observations of groundwater levels or other state or flux variables (observation-based evaluation), (2) comparing several models with each other with or without reference to actual observations (model-based evaluation), and (3) comparing model behavior with expert expectations of hydrologic behaviors in particular regions or at particular times (expert-based evaluation). Based on evolving practices in model evaluation as well as innovations in observations, machine learning, and expert elicitation, we argue that combining observation-, model-, and expert-based model evaluation approaches, while accounting for commensurability issues, may significantly improve the realism of groundwater representation in large-scale models, thus advancing our ability for quantification, understanding, and prediction of crucial Earth science and sustainability problems. We encourage greater community-level communication and cooperation on this quest, including among global hydrology and land surface modelers, local to regional hydrogeologists, and hydrologists focused on model development and evaluation.


2021 ◽  
Author(s):  
Johanna C. Clauser ◽  
Judith Maas ◽  
Ilona Mager ◽  
Frank R. Halfwerk ◽  
Jutta Arens

2021 ◽  
Author(s):  
◽  
Ankit Patel

<p>This doctoral thesis examines the multivariate nature of sporting performances, expressed as performance on context specific tasks, to develop a novel framework for constructing sport-based rating systems, also referred to as scoring models. The intent of this framework is to produce reliable, robust, intuitive, and transparent ratings, regarded as meaningful, for performance prevalent in the sport player and team evaluation environment. In this thesis, Bracewell’s (2003) definition of a rating as an elegant form of dimension reduction is extended. Specifically, ratings are an elegant and excessive form of dimension reduction whereby a single numerical value provides an objective interpretation of performance.  The data, provided by numerous vendors, is a summary of actions and performances completed by an individual during the evaluation period. A literature review of rating systems to measure performance, revealed a set of common methodologies, which were applied to produce a set of rating systems that were used as pilot studies to garner a set of learnings and limitations surrounding the current literature.  By reviewing rating methodologies and developing rating systems a set of limitations and communalities surrounding the current literature were identified and used to develop a novel framework for constructing sport-based rating systems which output measures of both team and player-level performance. The proposed framework adopts a multi-objective ensembling strategy and implements five key communalities present within many rating methodologies. These communalities are the application of 1) dimension reduction and feature selection techniques, 2) feature engineering tasks, 3) a multi-objective framework, 4) time-based variables and 5) an ensembling procedure to produce an overall rating.  An ensemble approach is adopted because it assumed that sporting performances are a function of the significant traits affecting performance. Therefore, performance is defined as performance=f(〖trait〗_1,…,〖trait〗_n). Moreover, the framework is a form of model stacking where information from multiple models is combined to generate a more informative model. Rating systems built using this approach provide a meaningful quantitative interpretation performance during an evaluation period. These ratings measure the quality of performance during a specific time-interval, known as the evaluation period.  The framework introduces a methodical approach for constructing rating systems within the sporting domain, which produce meaningful ratings. Meaningful ratings must 1) yield good performance when data is drawn from a wide range of probability distributions that are largely unaffected by outliers, small departures from model assumptions and small sample sizes (robust), 2) be accurate and produce highly informative predictions which are well-calibrated and sharp (reliable), 3) be interpretable and easy to communicate and (transparent), and 4) relate to real-world observable outcomes (intuitive).  The framework is developed to construct meaningful rating systems within the sporting industry to evaluate team and player performances. The approach was tested and validated by constructing both team and individual player-based rating systems within the cricketing context. The results of these systems were found to be meaningful, in that, they produced reliable, robust, transparent, and intuitive ratings. This ratings framework is not restricted within the sport of cricket to evaluate players and teams’ performances and is applicable in any sporting code where a summary of multivariate data is necessary to understand performance.  Common model evaluation metrics were found to be limited and lacked applicability when evaluating the effectiveness of meaningful ratings, therefore a novel evaluation metric was developed. The constructed metric applies a distance and magnitude-based metrics derived from the spherical scoring rule methodology. The distance and magnitude-based spherical (DMS) metric applies an analytical hierarchy process to assess the effectiveness of meaningful sport-based ratings and accounts for forecasting difficulty on a time basis. The DMS performance metric quantifies elements of the decision-making process by 1) evaluating the distance between ratings reported by the modeller and the actual outcome or the modellers ‘true’ beliefs, 2) providing an indication of “good” ratings, 3) accounting for the context and the forecasting difficulty to which the ratings are being applied, and 4) capturing the introduction of any subjective human bias within sport-based rating systems. The DMS metric is shown to outperform conventional model evaluation metrics such as the log-loss, in specific sporting scenarios of varying difficulty.</p>


2021 ◽  
Author(s):  
◽  
Ankit Patel

<p>This doctoral thesis examines the multivariate nature of sporting performances, expressed as performance on context specific tasks, to develop a novel framework for constructing sport-based rating systems, also referred to as scoring models. The intent of this framework is to produce reliable, robust, intuitive, and transparent ratings, regarded as meaningful, for performance prevalent in the sport player and team evaluation environment. In this thesis, Bracewell’s (2003) definition of a rating as an elegant form of dimension reduction is extended. Specifically, ratings are an elegant and excessive form of dimension reduction whereby a single numerical value provides an objective interpretation of performance.  The data, provided by numerous vendors, is a summary of actions and performances completed by an individual during the evaluation period. A literature review of rating systems to measure performance, revealed a set of common methodologies, which were applied to produce a set of rating systems that were used as pilot studies to garner a set of learnings and limitations surrounding the current literature.  By reviewing rating methodologies and developing rating systems a set of limitations and communalities surrounding the current literature were identified and used to develop a novel framework for constructing sport-based rating systems which output measures of both team and player-level performance. The proposed framework adopts a multi-objective ensembling strategy and implements five key communalities present within many rating methodologies. These communalities are the application of 1) dimension reduction and feature selection techniques, 2) feature engineering tasks, 3) a multi-objective framework, 4) time-based variables and 5) an ensembling procedure to produce an overall rating.  An ensemble approach is adopted because it assumed that sporting performances are a function of the significant traits affecting performance. Therefore, performance is defined as performance=f(〖trait〗_1,…,〖trait〗_n). Moreover, the framework is a form of model stacking where information from multiple models is combined to generate a more informative model. Rating systems built using this approach provide a meaningful quantitative interpretation performance during an evaluation period. These ratings measure the quality of performance during a specific time-interval, known as the evaluation period.  The framework introduces a methodical approach for constructing rating systems within the sporting domain, which produce meaningful ratings. Meaningful ratings must 1) yield good performance when data is drawn from a wide range of probability distributions that are largely unaffected by outliers, small departures from model assumptions and small sample sizes (robust), 2) be accurate and produce highly informative predictions which are well-calibrated and sharp (reliable), 3) be interpretable and easy to communicate and (transparent), and 4) relate to real-world observable outcomes (intuitive).  The framework is developed to construct meaningful rating systems within the sporting industry to evaluate team and player performances. The approach was tested and validated by constructing both team and individual player-based rating systems within the cricketing context. The results of these systems were found to be meaningful, in that, they produced reliable, robust, transparent, and intuitive ratings. This ratings framework is not restricted within the sport of cricket to evaluate players and teams’ performances and is applicable in any sporting code where a summary of multivariate data is necessary to understand performance.  Common model evaluation metrics were found to be limited and lacked applicability when evaluating the effectiveness of meaningful ratings, therefore a novel evaluation metric was developed. The constructed metric applies a distance and magnitude-based metrics derived from the spherical scoring rule methodology. The distance and magnitude-based spherical (DMS) metric applies an analytical hierarchy process to assess the effectiveness of meaningful sport-based ratings and accounts for forecasting difficulty on a time basis. The DMS performance metric quantifies elements of the decision-making process by 1) evaluating the distance between ratings reported by the modeller and the actual outcome or the modellers ‘true’ beliefs, 2) providing an indication of “good” ratings, 3) accounting for the context and the forecasting difficulty to which the ratings are being applied, and 4) capturing the introduction of any subjective human bias within sport-based rating systems. The DMS metric is shown to outperform conventional model evaluation metrics such as the log-loss, in specific sporting scenarios of varying difficulty.</p>


Sign in / Sign up

Export Citation Format

Share Document