A Comparison of Robust Model Choice Criteria Within a Metalearning Study

Author(s):  
Petra Vidnerová ◽  
Jan Kalina ◽  
Yeşim Güney
Keyword(s):  
2018 ◽  
Vol 18 (3) ◽  
pp. 86-103

The effect of cultural distance (CD) on the entry mode choice (EMC) has been intensively studied but the empirical results are mixed. This study adopts the strategic fit perspective to examine how firms’ strategic motives and technological ownerships may influence the EMC in face of different cultural distances. Analyzing Taiwanese outward FDI cases from 2004 to 2007, this study found that firms entering the culture-distant countries would choose the wholly-owned subsidiary (WOS) mode when emphasizing more about the protection of technological competence than market expansion, or else would choose the joint-venture (JV) mode when the market expansion is prioritized.


2019 ◽  
Vol 23 (6) ◽  
pp. 670-679
Author(s):  
Krista Greenan ◽  
Sandra L. Taylor ◽  
Daniel Fulkerson ◽  
Kiarash Shahlaie ◽  
Clayton Gerndt ◽  
...  

OBJECTIVEA recent retrospective study of severe traumatic brain injury (TBI) in pediatric patients showed similar outcomes in those with a Glasgow Coma Scale (GCS) score of 3 and those with a score of 4 and reported a favorable long-term outcome in 11.9% of patients. Using decision tree analysis, authors of that study provided criteria to identify patients with a potentially favorable outcome. The authors of the present study sought to validate the previously described decision tree and further inform understanding of the outcomes of children with a GCS score 3 or 4 by using data from multiple institutions and machine learning methods to identify important predictors of outcome.METHODSClinical, radiographic, and outcome data on pediatric TBI patients (age < 18 years) were prospectively collected as part of an institutional TBI registry. Patients with a GCS score of 3 or 4 were selected, and the previously published prediction model was evaluated using this data set. Next, a combined data set that included data from two institutions was used to create a new, more statistically robust model using binomial recursive partitioning to create a decision tree.RESULTSForty-five patients from the institutional TBI registry were included in the present study, as were 67 patients from the previously published data set, for a total of 112 patients in the combined analysis. The previously published prediction model for survival was externally validated and performed only modestly (AUC 0.68, 95% CI 0.47, 0.89). In the combined data set, pupillary response and age were the only predictors retained in the decision tree. Ninety-six percent of patients with bilaterally nonreactive pupils had a poor outcome. If the pupillary response was normal in at least one eye, the outcome subsequently depended on age: 72% of children between 5 months and 6 years old had a favorable outcome, whereas 100% of children younger than 5 months old and 77% of those older than 6 years had poor outcomes. The overall accuracy of the combined prediction model was 90.2% with a sensitivity of 68.4% and specificity of 93.6%.CONCLUSIONSA previously published survival model for severe TBI in children with a low GCS score was externally validated. With a larger data set, however, a simplified and more robust model was developed, and the variables most predictive of outcome were age and pupillary response.


2020 ◽  
Vol 13 (5) ◽  
pp. 1020-1030
Author(s):  
Pradeep S. ◽  
Jagadish S. Kallimani

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.


2019 ◽  
Vol 23 (10) ◽  
pp. 4323-4331 ◽  
Author(s):  
Wouter J. M. Knoben ◽  
Jim E. Freer ◽  
Ross A. Woods

Abstract. A traditional metric used in hydrology to summarize model performance is the Nash–Sutcliffe efficiency (NSE). Increasingly an alternative metric, the Kling–Gupta efficiency (KGE), is used instead. When NSE is used, NSE = 0 corresponds to using the mean flow as a benchmark predictor. The same reasoning is applied in various studies that use KGE as a metric: negative KGE values are viewed as bad model performance, and only positive values are seen as good model performance. Here we show that using the mean flow as a predictor does not result in KGE = 0, but instead KGE =1-√2≈-0.41. Thus, KGE values greater than −0.41 indicate that a model improves upon the mean flow benchmark – even if the model's KGE value is negative. NSE and KGE values cannot be directly compared, because their relationship is non-unique and depends in part on the coefficient of variation of the observed time series. Therefore, modellers who use the KGE metric should not let their understanding of NSE values guide them in interpreting KGE values and instead develop new understanding based on the constitutive parts of the KGE metric and the explicit use of benchmark values to compare KGE scores against. More generally, a strong case can be made for moving away from ad hoc use of aggregated efficiency metrics and towards a framework based on purpose-dependent evaluation metrics and benchmarks that allows for more robust model adequacy assessment.


Sign in / Sign up

Export Citation Format

Share Document