Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles

Siddhartha Jain; Ge Liu; Jonas Mueller; David Gifford

doi:10.1609/aaai.v34i04.5849

Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5849 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4264-4271

Author(s):

Siddhartha Jain ◽

Ge Liu ◽

Jonas Mueller ◽

David Gifford

Keyword(s):

Network Models ◽

Predictive Performance ◽

Training Data ◽

Bayesian Optimization ◽

Neural Network Models ◽

Training Techniques ◽

Uncertainty Estimates ◽

Data Density ◽

Adversarial Training ◽

Diverse Ensemble

The inaccuracy of neural network models on inputs that do not stem from the distribution underlying the training data is problematic and at times unrecognized. Uncertainty estimates of model predictions are often based on the variation in predictions produced by a diverse ensemble of models applied to the same input. Here we describe Maximize Overall Diversity (MOD), an approach to improve ensemble-based uncertainty estimates by encouraging larger overall diversity in ensemble predictions across all possible inputs. We apply MOD to regression tasks including 38 Protein-DNA binding datasets, 9 UCI datasets, and the IMDB-Wiki image dataset. We also explore variants that utilize adversarial training techniques and data density estimation. For out-of-distribution test examples, MOD significantly improves predictive performance and uncertainty calibration without sacrificing performance on test data drawn from same distribution as the training data. We also find that in Bayesian optimization tasks, the performance of UCB acquisition is improved via MOD uncertainty estimates.

Download Full-text

Analysis of Fin-Tube Evaporator Performance With Limited Experimental Data Using Artificial Neural Networks

10.1115/imece2000-1466 ◽

2000 ◽

Author(s):

Arturo Pacheco-Vega ◽

Mihir Sen ◽

Rodney L. McClain

Keyword(s):

Neural Network ◽

Heat Rate ◽

Network Models ◽

Activation Function ◽

Operating Conditions ◽

Training Data ◽

Neural Network Models ◽

The Neural Network ◽

Artificial Neural ◽

Fin Tube

Abstract In the current study we consider the problem of accuracy in heat rate estimations from artificial neural network models of heat exchangers used for refrigeration applications. The network configuration is of the feedforward type with a sigmoid activation function and a backpropagation algorithm. Limited experimental measurements from a manufacturer are used to show the capability of the neural network technique in modeling the heat transfer in these systems. Results from this exercise show that a well-trained network correlates the data with errors of the same order as the uncertainty of the measurements. It is also shown that the number and distribution of the training data are linked to the performance of the network when estimating the heat rates under different operating conditions, and that networks trained from few tests may give large errors. A methodology based on the cross-validation technique is presented to find regions where not enough data are available to construct a reliable neural network. The results from three tests show that the proposed methodology gives an upper bound of the estimated error in the heat rates.

Download Full-text

SOM-based aggregation for graph convolutional neural networks

Neural Computing and Applications ◽

10.1007/s00521-020-05484-4 ◽

2020 ◽

Author(s):

Luca Pasa ◽

Nicolò Navarin ◽

Alessandro Sperduti

Keyword(s):

Neural Network ◽

Network Models ◽

Predictive Performance ◽

Aggregation Operator ◽

Graph Representation ◽

Approximation Properties ◽

Neural Network Models ◽

Self Organizing Maps ◽

Node Level ◽

Real World Datasets

AbstractGraph property prediction is becoming more and more popular due to the increasing availability of scientific and social data naturally represented in a graph form. Because of that, many researchers are focusing on the development of improved graph neural network models. One of the main components of a graph neural network is the aggregation operator, needed to generate a graph-level representation from a set of node-level embeddings. The aggregation operator is critical since it should, in principle, provide a representation of the graph that is isomorphism invariant, i.e. the graph representation should be a function of graph nodes treated as a set. DeepSets (in: Advances in neural information processing systems, pp 3391–3401, 2017) provides a framework to construct a set-aggregation operator with universal approximation properties. In this paper, we propose a DeepSets aggregation operator, based on Self-Organizing Maps (SOM), to transform a set of node-level representations into a single graph-level one. The adoption of SOMs allows to compute node representations that embed the information about their mutual similarity. Experimental results on several real-world datasets show that our proposed approach achieves improved predictive performance compared to the commonly adopted sum aggregation and many state-of-the-art graph neural network architectures in the literature.

Download Full-text

Neural Sign Language Translation Based on Human Keypoint Estimation

Applied Sciences ◽

10.3390/app9132683 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2683 ◽

Cited By ~ 12

Author(s):

Sang-Ki Ko ◽

Chang Jo Kim ◽

Hyedong Jung ◽

Choongsang Cho

Keyword(s):

Neural Network ◽

Sign Language ◽

Language Translation ◽

Network Models ◽

Training Data ◽

Translation System ◽

Body Parts ◽

Neural Network Models ◽

Translation Model ◽

Starting Point

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance.

Download Full-text

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Applied Sciences ◽

10.3390/app10228079 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8079

Author(s):

Sanglee Park ◽

Jungmin So

Keyword(s):

Data Augmentation ◽

Black Box ◽

Training Data ◽

Model Parameters ◽

Neural Network Models ◽

Practical Applications ◽

Target Network ◽

Adversarial Examples ◽

Adversarial Training ◽

Adversarial Example

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Download Full-text

Insufficient Data Can Also Rock! Learning to Converse Using Smaller Data with Augmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016698 ◽

2019 ◽

Vol 33 ◽

pp. 6698-6705

Author(s):

Juntao Li ◽

Lisong Qiu ◽

Bo Tang ◽

Dongmin Chen ◽

Dongyan Zhao ◽

...

Keyword(s):

Data Augmentation ◽

Network Models ◽

Generative Models ◽

Training Data ◽

Insufficient Data ◽

Dialogue System ◽

Neural Network Models ◽

Augmentation Techniques ◽

Training Pair ◽

Existing Data

Recent successes of open-domain dialogue generation mainly rely on the advances of deep neural networks. The effectiveness of deep neural network models depends on the amount of training data. As it is laboursome and expensive to acquire a huge amount of data in most scenarios, how to effectively utilize existing data is the crux of this issue. In this paper, we use data augmentation techniques to improve the performance of neural dialogue models on the condition of insufficient data. Specifically, we propose a novel generative model to augment existing data, where the conditional variational autoencoder (CVAE) is employed as the generator to output more training data with diversified expressions. To improve the correlation of each augmented training pair, we design a discriminator with adversarial training to supervise the augmentation process. Moreover, we thoroughly investigate various data augmentation schemes for neural dialogue system with generative models, both GAN and CVAE. Experimental results on two open corpora, Weibo and Twitter, demonstrate the superiority of our proposed data augmentation model.

Download Full-text

High Predictive Performance of Dynamic Neural Network Models for Forecasting Financial Time Series

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2019.0101289 ◽

2019 ◽

Vol 10 (12) ◽

Author(s):

Haya Alaskar

Keyword(s):

Neural Network ◽

Time Series ◽

Financial Time Series ◽

Network Models ◽

Predictive Performance ◽

Neural Network Models ◽

Dynamic Neural Network ◽

Financial Time

Download Full-text

Forecasting variance of NiftyIT index with RNN and DNN

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012005 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012005

Author(s):

C R Karthik ◽

Raghunandan ◽

B Ashwath Rao ◽

N V Subba Reddy

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Analysis ◽

Short Term Memory ◽

Stock Exchange ◽

Network Models ◽

Training Data ◽

Complex Data ◽

Neural Network Models ◽

Series Analysis

Abstract A time series is an order of observations engaged serially in time. The prime objective of time series analysis is to build mathematical models that provide reasonable descriptions from training data. The goal of time series analysis is to forecast the forthcoming values of a series based on the history of the same series. Forecasting of stock markets is a thought-provoking problem because of the number of possible variables as well as volatile noise that may contribute to the prices of the stock. However, the capability to analyze stock market leanings could be vital to investors, traders and researchers, hence has been of continued interest. Plentiful arithmetical and machine learning practices have been discovered for stock analysis and forecasting/prediction. In this paper, we perform a comparative study on two very capable artificial neural network models i) Deep Neural Network (DNN) and ii) Long Short-Term Memory (LSTM) a type of recurrent neural network (RNN) in predicting the daily variance of NIFTYIT in BSE (Bombay Stock Exchange) and NSE (National Stock Exchange) markets. DNN was chosen due to its capability to handle complex data with substantial performance and better generalization without being saturated. LSTM model was decided, as it contains intermediary memory which can hold the historic patterns and occurrence of the next prediction depends on the values that preceded it. With both networks, measures were taken to reduce overfitting. Daily predictions of the NIFTYIT index were made to test the generalizability of the models. Both networks performed well at making daily predictions, and both generalized admirably to make daily predictions of the NiftyIT data. The LSTM-RNN outpaced the DNN in terms of forecasting and thus, grips more potential for making longer-term estimates.

Download Full-text

Use of Modular Neural Network for Heart Disease

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2010.1044 ◽

2010 ◽

pp. 196-201

Author(s):

Harsh Vazirani ◽

Rahul Kala ◽

Anupam Shukla ◽

Ritu Tiwari

Keyword(s):

Neural Network ◽

Heart Disease ◽

Network Models ◽

Training Data ◽

Neural Network Models ◽

Modular Neural Network ◽

Medical Disease ◽

Research Areas ◽

Product Method ◽

Testing Accuracy

The medical field is very versatile field and one of the interested research areas for the scientist. It deals with many medical disease problems starting with the diagnosis of the disease, preventing from the disease and treatment for the disease. There are various types of medical disease and accordingly various types of treatment methods. In this paper we mostly concern about the diagnosis of the heart disease. Mainly two types of the diagnosis method are used one is manual and other is automatic diagnosis which consists of diagnosis of disease with the help of intelligent expert system. In this paper the modular neural network is used to diagnosis the heart disease. The attributes are divided and given to the two neural network models Backpropagation Neural Network (BPNN) and Radial Basis Function Neural Network (RBFNN) for training and testing. The two integration techniques are used two integrate the results and provide the final training accuracy and testing accuracy. The modular neural network with probabilistic product method gave an accuracy of 87.02% over training data and 85.88% over testing accuracy and with probabilistic product method gave an accuracy of 89.72% over training data and 84.70% over testing accuracy, which was experimentally determined to be better than monolithic neural networks.

Download Full-text

Black Box Models and Sociological Explanations: Predicting GPA Using Neural Networks

10.31235/osf.io/7nsrf ◽

2017 ◽

Author(s):

Thomas Davidson

Keyword(s):

Network Models ◽

Predictive Performance ◽

Black Box ◽

Neural Network Models ◽

Grade Point ◽

Predictive Variables ◽

Box Models ◽

Model Finding ◽

Basic Network ◽

Predicting Gpa

The Fragile Families Challenge provided an opportunity to empirically assess the applicability of black box machine learning models to sociological questions and the extent to which interpretable explanations can be extracted from these models. In this paper I use neural network models to predict high school grade-point average and examine how variations of basic network parameters affect predictive performance. Using a recently proposed technique, I identify the most important predictive variables used by the best-performing model, finding that they relate to parenting and the child’s cognitive and behavioral development, consistent with prior work. I conclude by discussing the implications of these findings for the relationship between prediction and explanation in sociological analyses.

Download Full-text

Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach

The Scientific World JOURNAL ◽

10.1100/tsw.2003.35 ◽

2003 ◽

Vol 3 ◽

pp. 455-476 ◽

Cited By ~ 5

Author(s):

Wun Wong ◽

Peter J. Fos ◽

Frederick E. Petry

Keyword(s):

Neural Network ◽

Logistic Regression ◽

Disease Process ◽

Network Models ◽

Predictive Performance ◽

Medical Outcomes ◽

Neural Network Models ◽

The Neural Network ◽

Combined Use ◽

Logistic Regression Method

The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s) between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression) and machine learning (i.e., neural network) technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.

Download Full-text