Generalising Algorithm Performance in Instance Space: A Timetabling Case Study

Author(s):  
Kate Smith-Miles ◽  
Leo Lopes
2015 ◽  
Vol 54 (7) ◽  
pp. 1637-1662 ◽  
Author(s):  
Jason M. Apke ◽  
Daniel Nietfeld ◽  
Mark R. Anderson

AbstractEnhanced temporal and spatial resolution of the Geostationary Operational Environmental Satellite–R Series (GOES-R) will allow for the use of cloud-top-cooling-based convection-initiation (CI) forecasting algorithms. Two such algorithms have been created on the current generation of GOES: the University of Wisconsin cloud-top-cooling algorithm (UWCTC) and the University of Alabama in Huntsville’s satellite convection analysis and tracking algorithm (SATCAST). Preliminary analyses of algorithm products have led to speculation over preconvective environmental influences on algorithm performance. An objective validation approach is developed to separate algorithm products into positive and false indications. Seventeen preconvective environmental variables are examined for the positive and false indications to improve algorithm output. The total dataset consists of two time periods in the late convective season of 2012 and the early convective season of 2013. Data are examined for environmental relationships using principal component analysis (PCA) and quadratic discriminant analysis (QDA). Data fusion by QDA is tested for SATCAST and UWCTC on five separate case-study days to determine whether application of environmental variables improves satellite-based CI forecasting. PCA and significance testing revealed that positive indications favored environments with greater vertically integrated instability (CAPE), less stability (CIN), and more low-level convergence. QDA improved both algorithms on all five case studies using significantly different variables. This study provides an examination of environmental influences on the performance of GOES-R Proving Ground CI forecasting algorithms and shows that integration of QDA in the cloud-top-cooling-based algorithms using environmental variables will ultimately generate a more skillful product.


Author(s):  
Tania Turrubiates Lopez ◽  
Elisa Schaeffer ◽  
Dalia Domiguez-Diaz ◽  
German Dominguez-Carrillo

2002 ◽  
Vol 34 (3) ◽  
pp. 297-312 ◽  
Author(s):  
MARNE C. CARIO ◽  
JOHN J. CLIFFORD ◽  
RAYMOND R. HILL ◽  
IAEHWAN YANG ◽  
KEJIAN YANG ◽  
...  

Molecules ◽  
2021 ◽  
Vol 26 (16) ◽  
pp. 4757
Author(s):  
William E. Hackett ◽  
Joseph Zaia

Protein glycosylation that mediates interactions among viral proteins, host receptors, and immune molecules is an important consideration for predicting viral antigenicity. Viral spike proteins, the proteins responsible for host cell invasion, are especially important to be examined. However, there is a lack of consensus within the field of glycoproteomics regarding identification strategy and false discovery rate (FDR) calculation that impedes our examinations. As a case study in the overlap between software, here as a case study, we examine recently published SARS-CoV-2 glycoprotein datasets with four glycoproteomics identification software with their recommended protocols: GlycReSoft, Byonic, pGlyco2, and MSFragger-Glyco. These software use different Target-Decoy Analysis (TDA) forms to estimate FDR and have different database-oriented search methods with varying degrees of quantification capabilities. Instead of an ideal overlap between software, we observed different sets of identifications with the intersection. When clustering by glycopeptide identifications, we see higher degrees of relatedness within software than within glycosites. Taking the consensus between results yields a conservative and non-informative conclusion as we lose identifications in the desire for caution; these non-consensus identifications are often lower abundance and, therefore, more susceptible to nuanced changes. We conclude that present glycoproteomics softwares are not directly comparable, and that methods are needed to assess their overall results and FDR estimation performance. Once such tools are developed, it will be possible to improve FDR methods and quantify complex glycoproteomes with acceptable confidence, rather than potentially misleading broad strokes.


2017 ◽  
Vol 25 (4) ◽  
pp. 529-554 ◽  
Author(s):  
Mario A. Muñoz ◽  
Kate A. Smith-Miles

This article presents a method for the objective assessment of an algorithm’s strengths and weaknesses. Instead of examining the performance of only one or more algorithms on a benchmark set, or generating custom problems that maximize the performance difference between two algorithms, our method quantifies both the nature of the test instances and the algorithm performance. Our aim is to gather information about possible phase transitions in performance, that is, the points in which a small change in problem structure produces algorithm failure. The method is based on the accurate estimation and characterization of the algorithm footprints, that is, the regions of instance space in which good or exceptional performance is expected from an algorithm. A footprint can be estimated for each algorithm and for the overall portfolio. Therefore, we select a set of features to generate a common instance space, which we validate by constructing a sufficiently accurate prediction model. We characterize the footprints by their area and density. Our method identifies complementary performance between algorithms, quantifies the common features of hard problems, and locates regions where a phase transition may lie.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Yasin Kirelli ◽  
Seher Arslankaya

As the usage of social media has increased, the size of shared data has instantly surged and this has been an important source of research for environmental issues as it has been with popular topics. Sentiment analysis has been used to determine people's sensitivity and behavior in environmental issues. However, the analysis of Turkish texts has not been investigated much in literature. In this article, sentiment analysis of Turkish tweets about global warming and climate change is determined by machine learning methods. In this regard, by using algorithms that are determined by supervised methods (linear classifiers and probabilistic classifiers) with trained thirty thousand randomly selected Turkish tweets, sentiment intensity (positive, negative, and neutral) has been detected and algorithm performance ratios have been compared. This study also provides benchmarking results for future sentiment analysis studies on Turkish texts.


2014 ◽  
Vol 45 ◽  
pp. 12-24 ◽  
Author(s):  
Kate Smith-Miles ◽  
Davaatseren Baatar ◽  
Brendan Wreford ◽  
Rhyd Lewis

2021 ◽  
Vol 15 (2) ◽  
pp. 1-25
Author(s):  
Mario Andrés Muñoz ◽  
Tao Yan ◽  
Matheus R. Leal ◽  
Kate Smith-Miles ◽  
Ana Carolina Lorena ◽  
...  

The quest for greater insights into algorithm strengths and weaknesses, as revealed when studying algorithm performance on large collections of test problems, is supported by interactive visual analytics tools. A recent advance is Instance Space Analysis, which presents a visualization of the space occupied by the test datasets, and the performance of algorithms across the instance space. The strengths and weaknesses of algorithms can be visually assessed, and the adequacy of the test datasets can be scrutinized through visual analytics. This article presents the first Instance Space Analysis of regression problems in Machine Learning, considering the performance of 14 popular algorithms on 4,855 test datasets from a variety of sources. The two-dimensional instance space is defined by measurable characteristics of regression problems, selected from over 26 candidate features. It enables the similarities and differences between test instances to be visualized, along with the predictive performance of regression algorithms across the entire instance space. The purpose of creating this framework for visual analysis of an instance space is twofold: one may assess the capability and suitability of various regression techniques; meanwhile the bias, diversity, and level of difficulty of the regression problems popularly used by the community can be visually revealed. This article shows the applicability of the created regression instance space to provide insights into the strengths and weaknesses of regression algorithms, and the opportunities to diversify the benchmark test instances to support greater insights.


Author(s):  
Kai Shi ◽  
Huiqun Yu ◽  
Guisheng Fan ◽  
Jianmei Guo ◽  
Liqiong Chen ◽  
...  

An effective method for addressing the configuration optimization problem (COP) in Software Product Lines (SPLs) is to deploy a multi-objective evolutionary algorithm, for example, the state-of-the-art SATIBEA. In this paper, an improved hybrid algorithm, called SATIBEA-LSSF, is proposed to further improve the algorithm performance of SATIBEA, which is composed of a multi-children generating strategy, an enhanced mutation strategy with local searching and an elite inheritance mechanism. Empirical results on the same case studies demonstrate that our algorithm significantly outperforms the state-of-the-art for four out of five SPLs on a quality Hypervolume indicator and the convergence speed. To verify the effectiveness and robustness of our algorithm, the parameter sensitivity analysis is discussed and three observations are reported in detail.


Sign in / Sign up

Export Citation Format

Share Document