Overtraining in neural networks that interpret clinical data

Abstract Backpropagation neural networks are a computer-based pattern-recognition method that has been applied to the interpretation of clinical data. Unlike rule-based pattern recognition, backpropagation networks learn by being repetitively trained with examples of the patterns to be differentiated. We describe and analyze the phenomenon of overtraining in backpropagation networks. Overtraining refers to the reduction in generalization ability that can occur as networks are trained. The clinical application we used was the differentiation of giant cell arteritis (GCA) from other forms of vasculitis (OTH) based on results for 807 patients (593 OTH, 214 GCA) and eight clinical predictor variables. The 807 cases were randomly assigned to either a training set with 404 cases or to a cross-validation set with the remaining 403 cases. The cross-validation set was used to monitor generalization during training. Results were obtained for eight networks, each derived from a different random assignment of the 807 cases. Training error monotonically decreased during training. In contrast, the cross-validation error usually reached a minimum early in training while the training error was still decreasing. Training beyond the minimum cross-validation error was associated with an increased cross-validation error. The shape of the cross-validation error curve and the point during training corresponding to the minimum cross-validation error varied with the composition of the data sets and the training conditions. The study indicates that training error is not a reliable indicator of a network's ability to generalize. To find the point during training when a network generalizes best, one must monitor cross-validation error separately.

Download Full-text

Simple Convolutional-Based Models: Are They Learning the Task or the Data?

Neural Computation ◽

10.1162/neco_a_01446 ◽

2021 ◽

pp. 1-17

Author(s):

Luis Sa-Couto ◽

Andreas Wichert

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Training Data ◽

Model Complexity ◽

Data Sets ◽

Simple Task ◽

Data Set ◽

Knowing That ◽

Handwritten Digit ◽

End To End

Abstract Convolutional neural networks (CNNs) evolved from Fukushima's neocognitron model, which is based on the ideas of Hubel and Wiesel about the early stages of the visual cortex. Unlike other branches of neocognitron-based models, the typical CNN is based on end-to-end supervised learning by backpropagation and removes the focus from built-in invariance mechanisms, using pooling not as a way to tolerate small shifts but as a regularization tool that decreases model complexity. These properties of end-to-end supervision and flexibility of structure allow the typical CNN to become highly tuned to the training data, leading to extremely high accuracies on typical visual pattern recognition data sets. However, in this work, we hypothesize that there is a flip side to this capability, a hidden overfitting. More concretely, a supervised, backpropagation based CNN will outperform a neocognitron/map transformation cascade (MTCCXC) when trained and tested inside the same data set. Yet if we take both models trained and test them on the same task but on another data set (without retraining), the overfitting appears. Other neocognitron descendants like the What-Where model go in a different direction. In these models, learning remains unsupervised, but more structure is added to capture invariance to typical changes. Knowing that, we further hypothesize that if we repeat the same experiments with this model, the lack of supervision may make it worse than the typical CNN inside the same data set, but the added structure will make it generalize even better to another one. To put our hypothesis to the test, we choose the simple task of handwritten digit classification and take two well-known data sets of it: MNIST and ETL-1. To try to make the two data sets as similar as possible, we experiment with several types of preprocessing. However, regardless of the type in question, the results align exactly with expectation.

Download Full-text

Ensemble of surrogates and cross-validation for rapid and accurate predictions using small data sets

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s089006041900026x ◽

2019 ◽

Vol 33 (4) ◽

pp. 484-501 ◽

Cited By ~ 4

Author(s):

Reza Alizadeh ◽

Liangyue Jia ◽

Anand Balu Nellippallil ◽

Guoxin Wang ◽

Jia Hao ◽

...

Keyword(s):

Cross Validation ◽

Weighted Average ◽

Computational Time ◽

Small Data ◽

Data Sets ◽

Computationally Efficient ◽

Cross Validation Error ◽

Rod Rolling ◽

Ensemble Of Surrogates ◽

Time Required

AbstractIn engineering design, surrogate models are often used instead of costly computer simulations. Typically, a single surrogate model is selected based on the previous experience. We observe, based on an analysis of the published literature, that fitting an ensemble of surrogates (EoS) based on cross-validation errors is more accurate but requires more computational time. In this paper, we propose a method to build an EoS that is both accurate and less computationally expensive. In the proposed method, the EoS is a weighted average surrogate of response surface models, kriging, and radial basis functions based on overall cross-validation error. We demonstrate that created EoS is accurate than individual surrogates even when fewer data points are used, so computationally efficient with relatively insensitive predictions. We demonstrate the use of an EoS using hot rod rolling as an example. Finally, we include a rule-based template which can be used for other problems with similar requirements, for example, the computational time, required accuracy, and the size of the data.

Download Full-text

A Hybrid Higher Order Neural Structure for Pattern Recognition

Artificial Higher Order Neural Networks for Modeling and Simulation ◽

10.4018/978-1-4666-2175-6.ch017 ◽

2013 ◽

pp. 364-387 ◽

Cited By ~ 1

Author(s):

Mehdi Fallahnezhad ◽

Salman Zaferanlouei

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Structure Learning ◽

Learning Algorithm ◽

Hybrid Structure ◽

Higher Order ◽

Superior Performance ◽

Data Sets ◽

Neural Structure ◽

Higher Order Neural Networks

Considering high order correlations of selected features next to the raw features of input can facilitate target pattern recognition. In artificial intelligence, this is being addressed by Higher Order Neural Networks (HONNs). In general, HONN structures provide superior specifications (e.g. resolving the dilemma of choosing the number of neurons and layers of networks, better fitting specs, quicker, and open-box specificity) to traditional neural networks. This chapter introduces a hybrid structure of higher order neural networks, which can be generally applied in various branches of pattern recognition. Structure, learning algorithm, and network configuration are introduced, and structure is applied either as classifier (where is called HHONC) to different benchmark statistical data sets or as functional behavior approximation (where is called HHONN) to a heat and mass transfer dilemma. In each structure, results are compared with previous studies, which show its superior performance next to other mentioned advantages.

Download Full-text

Twentieth century global glacier mass change: an ensemble-based model reconstruction

The Cryosphere ◽

10.5194/tc-15-3135-2021 ◽

2021 ◽

Vol 15 (7) ◽

pp. 3135-3157

Author(s):

Jan-Hendrik Malles ◽

Ben Marzeion

Keyword(s):

Mass Balance ◽

Sea Level Rise ◽

Sea Level ◽

20Th Century ◽

Cross Validation ◽

Data Sets ◽

Mass Balances ◽

Specific Mass ◽

Near Surface ◽

The Cross

Abstract. Negative glacier mass balances in most of Earth's glacierized regions contribute roughly one-quarter to currently observed rates of sea-level rise and have likely contributed an even larger fraction during the 20th century. The distant past and future of glaciers' mass balances, and hence their contribution to sea-level rise, can only be estimated using numerical models. Since, independent of complexity, models always rely on some form of parameterizations and a choice of boundary conditions, a need for optimization arises. In this work, a model for computing monthly mass balances of glaciers on the global scale was forced with nine different data sets of near-surface air temperature and precipitation anomalies, as well as with their mean and median, leading to a total of 11 different forcing data sets. The goal is to better constrain the glaciers' 20th century sea-level budget contribution and its uncertainty. Therefore, five global parameters of the model's mass balance equations were varied systematically, within physically plausible ranges, for each forcing data set. We then identified optimal parameter combinations by cross-validating the model results against in situ annual specific mass balance observations, using three criteria: model bias, temporal correlation, and the ratio between the observed and modeled temporal standard deviation of specific mass balances. These criteria were chosen in order not to trade lower error estimates by means of the root mean squared error (RMSE) for an unrealistic interannual variability. We find that the disagreement between the different optimized model setups (i.e., ensemble members) is often larger than the uncertainties obtained via the leave-one-glacier-out cross-validation, particularly in times and places where few or no validation data are available, such as the first half of the 20th century. We show that the reason for this is that in regions where mass balance observations are abundant, the meteorological data are also better constrained, such that the cross-validation procedure only partly captures the uncertainty of the glacier model. For this reason, ensemble spread is introduced as an additional estimate of reconstruction uncertainty, increasing the total uncertainty compared to the model uncertainty merely obtained by the cross-validation. Our ensemble mean estimate indicates a sea-level contribution by global glaciers (outside of the ice sheets; including the Greenland periphery but excluding the Antarctic periphery) for 1901–2018 of 69.2 ± 24.3 mm sea-level equivalent (SLE), or 0.59 ± 0.21 mm SLE yr−1. While our estimates lie within the uncertainty range of most of the previously published global estimates, they agree less with those derived from GRACE data, which only cover the years 2002–2018.

Download Full-text

HYPER-PARAMETER SELECTION FOR SPARSE LS-SVM VIA MINIMIZATION OF ITS LOCALIZED GENERALIZATION ERROR

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691313500306 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1350030 ◽

Cited By ~ 14

Author(s):

BINBIN SUN ◽

WING W. Y. NG ◽

DANIEL S. YEUNG ◽

PATRICK P. K. CHAN

Keyword(s):

Cross Validation ◽

Parameter Selection ◽

Data Sets ◽

Generalization Error ◽

Generalization Capability ◽

Sensitivity Measure ◽

Minimum Sensitivity ◽

Training Error ◽

Leave One Out ◽

Localized Generalization Error

Sparse LS-SVM yields better generalization capability and reduces prediction time in comparison to full dense LS-SVM. However, both methods require careful selection of hyper-parameters (HPS) to achieve high generalization capability. Leave-One-Out Cross Validation (LOO-CV) and k-fold Cross Validation (k-CV) are the two most widely used hyper-parameter selection methods for LS-SVMs. However, both fail to select good hyper-parameters for sparse LS-SVM. In this paper we propose a new hyper-parameter selection method, LGEM-HPS, for LS-SVM via minimization of the Localized Generalization Error (L-GEM). The L-GEM consists of two major components: empirical mean square error and sensitivity measure. A new sensitivity measure is derived for LS-SVM to enable the LGEM-HPS select hyper-parameters yielding LS-SVM with smaller training error and minimum sensitivity to minor changes in inputs. Experiments on eleven UCI data sets show the effectiveness of the proposed method for selecting hyper-parameters for sparse LS-SVM.

Download Full-text

Interpretable multi-stream ensemble learning for radiographic pattern recognition

10.1101/2021.08.09.21261788 ◽

2021 ◽

Author(s):

Jose Raniery Ferreira ◽

Diego Armando Cardona Cardenas

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Early Detection ◽

Ensemble Learning ◽

Chest Radiography ◽

Lung Diseases ◽

Model Output ◽

Critical Factors ◽

Ensemble Strategy ◽

Computer Based

Chest radiography (CXR) remains an essential component to evaluate lung diseases. However, it is crucial nowadays to include computer-based tools to aid physicians in the early detection of chest abnormalities. Therefore, this work proposed deep ensemble models to improve the CXR evaluation, interpretability, and reproducibility. Five convolutional neural networks and six different processed image inputs yielded an AUC of 0.982. Furthermore, ensemble learning could produce more reliable outcomes as it did not consider the information of only one method. Moreover, the ensemble strategy balanced the most critical factors from each model to perform a more consistent classification. Finally, class activation and gradient propagation maps allowed locally visualizing CXR regions that most activate neurons from the trained models and explaining practically which areas of the CXR correlated to the model output.

Download Full-text

Minimizing the Cross Validation Error to Mix Kernel Matrices of Heterogeneous Biological Data

Neural Processing Letters ◽

10.1023/b:nepl.0000016845.36307.d7 ◽

2004 ◽

Vol 19 (1) ◽

pp. 63-72 ◽

Cited By ~ 5

Author(s):

Koji Tsuda ◽

Shinsuke Uda ◽

Taishin Kin ◽

Kiyoshi Asai

Keyword(s):

Cross Validation ◽

Biological Data ◽

Cross Validation Error ◽

The Cross

Download Full-text

Artificial Neural Network What-If Theory

International Journal of Information Systems and Social Change ◽

10.4018/ijissc.2015100104 ◽

2015 ◽

Vol 6 (4) ◽

pp. 52-81 ◽

Cited By ~ 4

Author(s):

Paolo Massimo Buscema ◽

William J Tastle

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Machine Learning ◽

Neural Networks ◽

Pattern Recognition ◽

Artificial Neural Network ◽

Data Sets ◽

Data Set ◽

Innovative Solutions ◽

Artificial Neural

Data sets collected independently using the same variables can be compared using a new artificial neural network called Artificial neural network What If Theory, AWIT. Given a data set that is deemed the standard reference for some object, i.e. a flower, industry, disease, or galaxy, other data sets can be compared against it to identify its proximity to the standard. Thus, data that might not lend itself well to traditional methods of analysis could identify new perspectives or views of the data and thus, potentially new perceptions of novel and innovative solutions. This method comes out of the field of artificial intelligence, particularly artificial neural networks, and utilizes both machine learning and pattern recognition to display an innovative analysis.

Download Full-text

Computational performance and cross-validation error precision of five PLS algorithms using designed and real data sets

Journal of Chemometrics ◽

10.1002/cem.1309 ◽

2010 ◽

pp. n/a-n/a ◽

Cited By ~ 1

Author(s):

João Paulo A. Martins ◽

Reinaldo F. Teófilo ◽

Márcia M. C. Ferreira

Keyword(s):

Cross Validation ◽

Real Data ◽

Data Sets ◽

Computational Performance ◽

Cross Validation Error

Download Full-text

20th century global glacier mass change: an ensemble-based model reconstruction

10.5194/tc-2020-320 ◽

2020 ◽

Author(s):

Jan-Hendrik Malles ◽

Ben Marzeion

Keyword(s):

Mass Balance ◽

Sea Level Rise ◽

Sea Level ◽

20Th Century ◽

Cross Validation ◽

Data Sets ◽

Mass Balances ◽

Validation Data ◽

Near Surface ◽

The Cross

Abstract. Negative glacier mass balances in most of Earth's glacierized regions contribute roughly one quarter to currently observed rates of sea-level rise, and have likely contributed an even larger fraction during the 20th century. The distant past and future of glaciers' mass balances, and hence their contribution to sea-level rise, can only be calculated using numerical models. Since independent of complexity, models always rely on some form of parameterizations and a choice of boundary conditions, a need for optimization arises. In this work, a model for computing monthly mass balances of glaciers on the global scale was forced with nine different data sets of near-surface air temperature and precipitation anomalies, as well as with their mean and median, leading to a total of eleven different forcing data sets. Five global parameters of the model’s mass balance equations were varied systematically, within physically plausible ranges, for each forcing data set. We then identified optimal parameter combinations by cross-validating the model results against in-situ mass balance observations, using three criteria: model bias, temporal correlation, and the ratio between the observed and modeled temporal standard deviation of specific mass balances. The goal is to better constrain the glaciers' 20th century sea-level budget contribution and its uncertainty. We find that the disagreement between the different ensemble members is often larger than the uncertainties obtained via cross-validation, particularly in times and places where few or no validation data are available, such as the first half of the 20th century. We show that the reason for this is that the availability of mass balance observations often coincides with less uncertainty in the forcing data, such that the cross-validation procedure does not capture the true out-of-sample uncertainty of the glacier model. Therefore, ensemble spread is introduced as an additional estimate of reconstruction uncertainty, increasing the total uncertainty compared to the model uncertainty obtained in the cross validation. Our ensemble mean estimate indicates a sea-level contribution by global glaciers (excluding Antarctic periphery) for 1901–2018 of 76.2 ± 5.9 mm sea-level equivalent (SLE), or 0.65 ± 0.05 mm SLE yr−1.

Download Full-text