Accelerating and Improving AlphaZero Using Population Based Training

AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source state-of-the-art AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47% against ELF OpenGo under the same circumstances.

Download Full-text

Knowledge Transfer for Out-of-Knowledge-Base Entities : A Graph Neural Network Approach

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/250 ◽

2017 ◽

Cited By ~ 26

Author(s):

Takuo Hamaguchi ◽

Hidekazu Oiwa ◽

Masashi Shimbo ◽

Yuji Matsumoto

Keyword(s):

Neural Network ◽

Knowledge Base ◽

State Of The Art ◽

Test Time ◽

Network Approach ◽

Missing Information ◽

Neural Network Approach ◽

Training Time ◽

Proposed Model ◽

Graph Neural Networks

Knowledge base completion (KBC) aims to predict missing information in a knowledge base. In this paper, we address the out-of-knowledge-base (OOKB) entity problem in KBC: how to answer queries concerning test entities not observed at training time. Existing embedding-based KBC models assume that all test entities are available at training time, making it unclear how to obtain embeddings for new entities without costly retraining. To solve the OOKB entity problem without retraining, we use graph neural networks (Graph-NNs) to compute the embeddings of OOKB entities, exploiting the limited auxiliary knowledge provided at test time. The experimental results show the effectiveness of our proposed model in the OOKB setting. Additionally, in the standard KBC setting in which OOKB entities are not involved, our model achieves state-of-the-art performance on the WordNet dataset.

Download Full-text

RADIAL BASIS PROBABILISTIC NEURAL NETWORKS: MODEL AND APPLICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001499000604 ◽

1999 ◽

Vol 13 (07) ◽

pp. 1083-1101 ◽

Cited By ~ 299

Author(s):

DE-SHUANG HUANG

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Model ◽

Probabilistic Neural Network ◽

Radial Basis Function Networks ◽

Probabilistic Neural Networks ◽

Huge Amount ◽

Training Time ◽

One Dimensional ◽

Radial Basis

This paper investigates the capabilities of radial basis function networks (RBFN) and kernel neural networks (KNN), i.e. a specific probabilistic neural networks (PNN), and studies their similarities and differences. In order to avoid the huge amount of hidden units of the KNNs (or PNNs) and reduce the training time for the RBFNs, this paper proposes a new feedforward neural network model referred to as radial basis probabilistic neural network (RBPNN). This new network model inherits the merits of the two old odels to a great extent, and avoids their defects in some ways. Finally, we apply this new RBPNN to the recognition of one-dimensional cross-images of radar targets (five kinds of aircrafts), and the experimental results are given and discussed.

Download Full-text

Gated Graph Attention Network for Cancer Prediction

Sensors ◽

10.3390/s21061938 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1938

Author(s):

Linling Qiu ◽

Han Li ◽

Meihong Wang ◽

Xiaoli Wang

Keyword(s):

Neural Network ◽

Prediction Accuracy ◽

State Of The Art ◽

Network Models ◽

The State ◽

Neural Network Models ◽

Attention Network ◽

Training Time ◽

Cancer Prediction ◽

Gating Mechanism

With its increasing incidence, cancer has become one of the main causes of worldwide mortality. In this work, we mainly propose a novel attention-based neural network model named Gated Graph ATtention network (GGAT) for cancer prediction, where a gating mechanism (GM) is introduced to work with the attention mechanism (AM), to break through the previous work’s limitation of 1-hop neighbourhood reasoning. In this way, our GGAT is capable of fully mining the potential correlation between related samples, helping for improving the cancer prediction accuracy. Additionally, to simplify the datasets, we propose a hybrid feature selection algorithm to strictly select gene features, which significantly reduces training time without affecting prediction accuracy. To the best of our knowledge, our proposed GGAT achieves the state-of-the-art results in cancer prediction task on LIHC, LUAD, KIRC compared to other traditional machine learning methods and neural network models, and improves the accuracy by 1% to 2% on Cora dataset, compared to the state-of-the-art graph neural network methods.

Download Full-text

Deep Recurrent Quantization for Generating Sequential Binary Codes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/128 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jingkuan Song ◽

Xiaosu Zhu ◽

Lianli Gao ◽

Xin-Shun Xu ◽

Wu Liu ◽

...

Keyword(s):

State Of The Art ◽

Binary Codes ◽

Time Cost ◽

Code Length ◽

Training Time ◽

Retrieval Accuracy ◽

Effective Technology ◽

Benchmark Datasets ◽

Search Speed ◽

And Training

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.

Download Full-text

ARTIFICIAL METAPLASTICITY NEURAL NETWORK APPLIED TO CREDIT SCORING

International Journal of Neural Systems ◽

10.1142/s0129065711002857 ◽

2011 ◽

Vol 21 (04) ◽

pp. 311-317 ◽

Cited By ~ 28

Author(s):

ALEXIS MARCANO-CEDEÑO ◽

A. MARIN-DE-LA-BARCENA ◽

J. JIMENEZ-TRILLO ◽

J. A. PIÑUELA ◽

D. ANDINA

Keyword(s):

Neural Network ◽

Financial Institutions ◽

State Of The Art ◽

Credit Scoring ◽

Error Rates ◽

Classification Algorithms ◽

Data Sets ◽

Training Algorithm ◽

Low Probability ◽

German Data

The assessment of the risk of default on credit is important for financial institutions. Different Artificial Neural Networks (ANN) have been suggested to tackle the credit scoring problem, however, the obtained error rates are often high. In the search for the best ANN algorithm for credit scoring, this paper contributes with the application of an ANN Training Algorithm inspired by the neurons' biological property of metaplasticity. This algorithm is especially efficient when few patterns of a class are available, or when information inherent to low probability events is crucial for a successful application, as weight updating is overemphasized in the less frequent activations than in the more frequent ones. Two well-known and readily available such as: Australia and German data sets has been used to test the algorithm. The results obtained by AMMLP shown have been superior to state-of-the-art classification algorithms in credit scoring.

Download Full-text

LightOCT: Exploring the depth for Retinal disease detection

10.1101/2021.11.16.21266390 ◽

2021 ◽

Author(s):

Amandeep Kaur ◽

Vinayak Singh ◽

Gargi Chakraverty

Keyword(s):

Neural Network ◽

Network Architecture ◽

State Of The Art ◽

Computational Cost ◽

Retinal Disease ◽

Retinal Damage ◽

Training Time ◽

Deep Model ◽

Precise Diagnosis ◽

Future Work

With the advancement in technology and computation capabilities, identifying retinal damage through state-of-the-art CNNs architectures has led to the speedy and precise diagnosis, thus inhibiting further disease development. In this study, we focus on the classification of retinal damage caused by detecting choroidal neovascularization (CNV), diabetic macular edema (DME), DRUSEN, and NORMAL in optical coherence tomography (OCT) images. The emphasis of our experiment is to investigate the component of depth in the neural network architecture. We introduce a shallow convolution neural network - LightOCT, outperforming the other deep model configurations, with the lowest value of LVCEL and highest accuracy (+98\% in each class). Next, we experimented to find the best fit optimizer for LightOCT. The results proved that the combination of LightOCT and Adam gave the most optimal results. Finally, we compare our approach with transfer learning models, and LightOCT outperforms the state-of-the-art models in terms of computational cost, least training time and gives comparable results in the criteria of accuracy. We would direct our future work to improve the accuracy metrics with shallow models such that the trade-off between training time and accuracy is reduced.

Download Full-text

AN EFFICIENT FUZZY NEURAL NETWORK TRAINING MODEL FOR SUPERVISED PATTERN CLASSIFICATION SYSTEM

JOURNAL OF ADVANCES IN CHEMISTRY ◽

10.24297/jac.v12i11.822 ◽

2016 ◽

Vol 12 (11) ◽

pp. 4488-4499

Author(s):

Manjula Devi ◽

S.J. Suji Prasad ◽

Sagana C

Keyword(s):

Neural Network ◽

Training Model ◽

Classification Problem ◽

Training Algorithm ◽

Accuracy Rate ◽

Training Time ◽

Fast Training ◽

Fuzzy Neural ◽

Benchmark Datasets ◽

Hidden Layer

Among the existing NN architectures, Multilayer Feedforward Neural Network (MFNN) with single hidden layer architecture has been scrutinized thoroughly as best for solving nonlinear classification problem. The training time is consumed more for very huge training datasets in the MFNN training phase. In order to reduce the training time, a simple and fast training algorithm called Exponential Adaptive Skipping Training (EAST) Algorithm was presented that improves the training speed by significantly reducing the total number of training input samples consumed by MFNN for training at every single epoch. Although the training performance of EAST achieves faster, it still lacks in the accuracy rate due to high skipping factor. In order to improve the accuracy rate of the training algorithm, Hybrid system has been suggested in which the neural network is trained with the fuzzified data. In this paper, a z-Score Fuzzy Exponential Adaptive Skipping Training (z-FEAST) algorithm is proposed which is based on the fuzzification of EAST. The evaluation of the proposed z-FEAST algorithm is demonstrated effectively using the benchmark datasets - Iris, Waveform, Heart Disease and Breast Cancer for different learning rate. Simulation study proved that z-FEAST training algorithm improves the accuracy rate.

Download Full-text

UNIQ

ACM Transactions on Computer Systems ◽

10.1145/3444943 ◽

2021 ◽

Vol 37 (1--4) ◽

pp. 1-15

Author(s):

Chaim Baskin ◽

Natan Liss ◽

Eli Schwartz ◽

Evgenii Zheltonozhskii ◽

Raja Giryes ◽

...

Keyword(s):

Neural Network ◽

State Of The Art ◽

Low Complexity ◽

High Accuracy ◽

Trade Off ◽

Training Time ◽

Uniform Quantization ◽

Novel Method ◽

A Minor ◽

Prior State

We present a novel method for neural network quantization. Our method, named UNIQ , emulates a non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

Download Full-text

Effect of Architectural Composition of MLP ANN in Neural Network Learning for Signal Power Loss Prediction

Journal of Communications ◽

10.12720/jcm.16.1.20-29 ◽

2021 ◽

pp. 20-29

Author(s):

Virginia C. Ebhota ◽

◽

Viranjay M. Srivastava

Keyword(s):

Neural Network ◽

Power Loss ◽

Prediction Errors ◽

Signal Power ◽

Training Algorithm ◽

Ann Model ◽

Training Time ◽

Mlp Neural Network ◽

Network Training ◽

Target Values

This work analyzes the architectural complexity of a Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) model suitable for modeling and predicting signal power loss in micro-cellular environments. The MLP neural network model with one, two, and three hidden layers respectively were trained using measurement datasets used as the target values collected from a micro-cell environment that is suitable to describe different propagation paths and conditions. The neural network training has been performed by applying different training techniques to ensure a well-trained network for good generalization and avoid over-fitting during network training. Bayesian regularization algorithm (that updates weights and biases during network training) following the Levenberg-Marquardt optimization training algorithm was used as the training algorithm. A comparative analysis of training results from one, two, and three hidden layers MLP neural networks show the best prediction result of the signal power loss using a neural network with one hidden layer. A complex architectural composition of the MLP neural network involved very high training time and higher prediction errors.

Download Full-text

Pruned Cascade Neural Network Image Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f2929.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6454-6457

Keyword(s):

Neural Network ◽

Image Classification ◽

Learning Algorithm ◽

Classification Problem ◽

Training Algorithm ◽

Training Time ◽

Neural Network Learning ◽

Filter Size ◽

Long Time ◽

Cascade Neural Network

In this paper we propose a new model of deep neural network to build in deeper network. The convoluational neural network is one of the leading Image classification problem. The vanishing gradient problem requires us to use small learning rate with gradient descent which needs many small steps to converge and its take long time to proceed . By using GPU we can process more than one dataset (CIFAR-100) in a particular session. To overcome vanishing gradient problem by using the prune cascade correlation neural network learning algorithm compared to the deep cascade learning in CNN architecture. We improve the filter size, to reduce to the problem by training algorithm that trains in the network from bottom to top approach and its performing attain the task for better image classification in Google Net. We reduce the time complexity (training time ), storage capacity can be used pre training algorithm.

Download Full-text