Performance Evaluation in Machine Learning: The Good, the Bad, the Ugly, and the Way Forward

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019808 ◽

2019 ◽

Vol 33 ◽

pp. 9808-9814 ◽

Cited By ~ 4

Author(s):

Peter Flach

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Causal Inference ◽

Latent Variable ◽

Measurement Theory ◽

Latent Variable Models ◽

Data Set ◽

Evaluation Measures ◽

Classifier Evaluation ◽

The Way

This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference.

Download Full-text

Comorbid science?

Behavioral and Brain Sciences ◽

10.1017/s0140525x10000609 ◽

2010 ◽

Vol 33 (2-3) ◽

pp. 153-155 ◽

Cited By ~ 6

Author(s):

David Danks ◽

Stephen Fancsali ◽

Clark Glymour ◽

Richard Scheines

Keyword(s):

Causal Inference ◽

Latent Variable ◽

Preliminary Analysis ◽

Latent Variable Models ◽

Causal Relationships

AbstractWe agree with Cramer et al.'s goal of the discovery of causal relationships, but we argue that the authors' characterization of latent variable models (as deployed for such purposes) overlooks a wealth of extant possibilities. We provide a preliminary analysis of their data, using existing algorithms for causal inference and for the specification of latent variable models.

Download Full-text

Comparison of Unsupervised Learning Methods for Natural Image Processing

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37886 ◽

2019 ◽

Vol 3 ◽

Author(s):

Wilfried Wöber ◽

Papius Tibihika ◽

Cristina Olaverri-Monreal ◽

Lars Mehnen ◽

Peter Sykacek ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Unsupervised Learning ◽

Supervised Learning ◽

Latent Variable ◽

Prior Information ◽

Latent Variable Models ◽

Component Analysis ◽

Support Vector ◽

Learning Methods

For computer vision based appraoches such as image classification (Krizhevsky et al. 2012), object detection (Ren et al. 2015) or pixel-wise weed classification (Milioto et al. 2017) machine learning is used for both feature extraction and processing (e.g. classification or regression). Historically, feature extraction (e.g. PCA; Ch. 12.1. in Bishop 2006) and processing were sequential and independent tasks (Wöber et al. 2013). Since the rise of convolutional neuronal networks (LeCun et al. 1989), a deep machine learning approach optimized for images, in 2012 (Krizhevsky et al. 2012), feature extraction for image analysis became an automated procedure. A convolutional neuronal net uses a deep architecture of artificial neurons (Goodfellow 2016) for both feature extraction and processing. Based on prior information such as image classes and supervised learning procedures, parameters of the neuronal nets are adjusted. This is known as the learning process. Simultaneously, geometric morphometrics (Tibihika et al. 2018, Cadrin and Friedland 1999) are used in biodiversity research for association analysis. Those approaches use deterministic two-dimensional locations on digital images (landmarks; Mitteroecker et al. 2013), where each position corresponds to biologically relevant regions of interest. Since this methodology is based on scientific results and compresses image content into deterministic landmarks, no uncertainty regarding those landmark positions is taken into account, which leads to information loss (Pearl 1988). Both, the reduction of this loss and novel knowledge detection, can be done using machine learning. Supervised learning methods (e.g., neuronal nets or support vector machines (Ch. 5 and 6. in Bishop 2006)) map data on prior information (e.g. labels). This increases the performance of classification or regression but affects the latent representation of the data itself. Unsupervised learning (e.g. latent variable models) uses assumptions concerning data structures to extract latent representations without prior information. Those representations does not have to be useful for data processing such as classification and due to that, the use of supervised and unsupervised machine learning and combinations of both, needs to be chosen carefully, according to the application and data. In this work, we discuss unsupervised learning algorithms in terms of explainability, performance and theoretical restrictions in context of known deep learning restrictions (Marcus 2018, Szegedy et al. 2014, Su et al. 2017). We analyse extracted features based on multiple image datasets and discuss shortcomings and performance for processing (e.g. reconstruction error or complexity measurement (Pincus 1997)) using the principal component analysis (Wöber et al. 2013), independent component analysis (Stone 2004), deep neuronal nets (auto encoders; Ch. 14 in Goodfellow 2016) and Gaussian process latent variable models (Titsias and Lawrence 2010, Lawrence 2005).

Download Full-text

Performance Evaluation of Latent Variable Models with Sparse Priors

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ◽

10.1109/icassp.2007.366270 ◽

2007 ◽

Cited By ~ 3

Author(s):

David Wipf ◽

Jason Palmer ◽

Bhaskar Rao ◽

Kenneth Kreutz-Delgado

Keyword(s):

Performance Evaluation ◽

Latent Variable ◽

Latent Variable Models ◽

Sparse Priors

Download Full-text

FINANCIAL RATIO ANALYSIS IN CONSTRUCTION INDUSTRY: AN INVESTIGATION USING MACHINE LEARNING

Proceedings of International Structural Engineering and Construction ◽

10.14455/isec.2021.8(1).con-11 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Osama Salah ◽

Maged Georgy ◽

Atef Ragab

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Financial Performance ◽

Evaluation Model ◽

Financial Ratios ◽

Ratio Analysis ◽

Data Set ◽

Financial Ratio ◽

Financial Ratio Analysis ◽

Financial Characteristics

Financial Ratio Analysis is considered one of the most fundamental ways of evaluating performance in companies. Analysis of major financial ratios of a company can help decision-makers take early business decisions/actions that could prevent, or at least alleviate, the potential hardships in the future. This paper reviews the literature and shows that various financial models have been developed in the past to evaluate an organization’s financial performance. The paper further proposes to upgrade and extend the resources used for financial performance evaluation through employing advanced artificial intelligence (AI) techniques, with application onto Egyptian construction companies. Research methodology included the gathering of a large number of financial reports/data items from relevant companies. Six major financial ratios were determined over a number of years, based on the consolidated financial accounts and income statements. These ratios include Current Ratio, Quick Ratio, Return on Equity, and others. The use of Machine Learning (ML) techniques is then investigated to analyze those ratios and to develop a financial performance evaluation model. K-means, as an un-supervised ML technique, was utilized to cluster the collected data set into three major groups. Each group has its own unique financial characteristics. Finally, future study measures are discussed where case studies will be used to verify and explain the findings.

Download Full-text

Performance Evaluation of Supervised Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction

2019 5th International Conference on Science in Information Technology (ICSITech) ◽

10.1109/icsitech46713.2019.8987479 ◽

2019 ◽

Author(s):

Melky Radja ◽

Andi Wahju Rahardjo Emanuel

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Data Set ◽

Diabetes Prediction

Download Full-text

Supplemental Material for Information-Theoretic Latent Distribution Modeling: Distinguishing Discrete and Continuous Latent Variable Models

Psychological Methods ◽

10.1037/1082-989x.11.3.228.supp ◽

2006 ◽

Keyword(s):

Latent Variable ◽

Latent Variable Models ◽

Information Theoretic ◽

Distribution Modeling ◽

Latent Distribution

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

Performance Evaluation of Machine Learning Classifiers for Epileptic Seizure Detection

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i8.122129 ◽

2019 ◽

Vol 7 (8) ◽

pp. 122-129

Author(s):

Mirwais Farahi ◽

Doreswamy .

Keyword(s):

Machine Learning ◽

Performance Evaluation ◽

Epileptic Seizure ◽

Seizure Detection ◽

Epileptic Seizure Detection ◽

Machine Learning Classifiers ◽

Learning Classifiers

Download Full-text

Right-Wing Authoritarians Aren't Very Funny: RWA, Personality, and Creative Humor Production

10.31234/osf.io/rwpgn ◽

2020 ◽

Author(s):

Paul Silvia ◽

Alexander P. Christensen ◽

Katherine N. Cotter

Keyword(s):

Young Adults ◽

Latent Variable ◽

Latent Variable Models ◽

Openness To Experience ◽

Rasch Models ◽

Right Wing ◽

Right Wing Authoritarianism ◽

Humor Appreciation ◽

Negative Effect

Right-wing authoritarianism (RWA) has well-known links with humor appreciation, such as enjoying jokes that target deviant groups, but less is known about RWA and creative humor production—coming up with funny ideas oneself. A sample of 186 young adults completed a measure of RWA, the HEXACO-100, and 3 humor production tasks that involved writing funny cartoon captions, creating humorous definitions for quirky concepts, and completing joke stems with punchlines. The humor responses were scored by 8 raters and analyzed with many-facet Rasch models. Latent variable models found that RWA had a large, significant effect on humor production (β = -.47 [-.65, -.30], p < .001): responses created by people high in RWA were rated as much less funny. RWA’s negative effect on humor was smaller but still significant (β = -.25 [-.49, -.01], p = .044) after controlling for Openness to Experience (β = .39 [.20, .59], p < .001) and Conscientiousness (β = -.21 [-.41, -.02], p = .029). Taken together, the findings suggest that people high in RWA just aren’t very funny.

Download Full-text