Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks

Architecture optimization, which is a technique for finding an efficient neural network that meets certain requirements, generally reduces to a set of multiple-choice selection problems among alternative sub-structures or parameters. The discrete nature of the selection problem, however, makes this optimization difficult. To tackle this problem we introduce a novel concept of a trainable gate function. The trainable gate function, which confers a differentiable property to discrete-valued variables, allows us to directly optimize loss functions that include non-differentiable discrete values such as 0-1 selection. The proposed trainable gate can be applied to pruning. Pruning can be carried out simply by appending the proposed trainable gate functions to each intermediate output tensor followed by fine-tuning the overall model, using any gradient-based training methods. So the proposed method can jointly optimize the selection of the pruned channels while fine-tuning the weights of the pruned model at the same time. Our experimental results demonstrate that the proposed method efficiently optimizes arbitrary neural networks in various tasks such as image classification, style transfer, optical flow estimation, and neural machine translation.

Download Full-text

Towards Hallucinating Machines - Designing with Computational Vision

International Journal of Architectural Computing ◽

10.1177/1478077120963366 ◽

2020 ◽

pp. 147807712096336

Author(s):

Matias del Campo ◽

Alexandra Carlson ◽

Sandra Manninger

Keyword(s):

Neural Network ◽

Neural Networks ◽

Architectural Design ◽

Common Denominator ◽

Training Methods ◽

Style Transfer ◽

Network Algorithms ◽

Architectural Styles ◽

Large Databases ◽

The Common

There are particular similarities in how machines learn about the nature of their environment, and how humans learn to process visual stimuli. Machine Learning (ML), more specifically Deep Neural network algorithms rely on expansive image databases and various training methods (supervised, unsupervised) to “make sense” out of the content of an image. Take for example how students of architecture learn to differentiate various architectural styles. Whether this be to differentiate between Gothic, Baroque or Modern Architecture, students are exposed to hundreds, or even thousands of images of the respective styles, while being trained by faculty to be able to differentiate between those styles. A reversal of the process, striving to produce imagery, instead of reading it and understanding its content, allows machine vision techniques to be utilized as a design methodology that profoundly interrogates aspects of agency and authorship in the presence of Artificial Intelligence in architecture design. This notion forms part of a larger conversation on the nature of human ingenuity operating within a posthuman design ecology. The inherent ability of Neural Networks to process large databases opens up the opportunity to sift through the enormous repositories of imagery generated by the architecture discipline through the ages in order to find novel and bespoke solutions to architectural problems. This article strives to demystify the romantic idea of individual artistic design choices in architecture by providing a glimpse under the hood of the inner workings of Neural Network processes, and thus the extent of their ability to inform architectural design. The approach takes cues from the language and methods employed by experts in Deep Learning such as Hallucinations, Dreaming, Style Transfer and Vision. The presented approach is the base for an in-depth exploration of its meaning as a cultural technique within the discipline. Culture in the extent of this article pertains to ideas such as the differentiation between symbolic and material cultures, in which symbols are defined as the common denominator of a specific group of people.1 The understanding and exchange of symbolic values is inherently connected to language and code, which ultimately form the ingrained texture of any form of coded environment, including the coded structure of Neural Networks. A first proof of concept project was devised by the authors in the form of the Robot Garden. What makes the Robot Garden a distinctively novel project is the motion from a purely two dimensional approach to designing with the aid of Neural Networks, to the exploration of 2D to 3D Neural Style Transfer methods in the design process.

Download Full-text

Optimizing connection weights in neural networks using hybrid metaheuristics algorithm

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289569 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Solution Space ◽

Feedforward Neural Networks ◽

Fine Tuning ◽

Training Methods ◽

Control Parameters ◽

Hybrid Metaheuristics ◽

Hybrid Particle Swarm Optimization ◽

The Comparative Study

The learning process of artificial neural networks is an important and complex task in the supervised learning field. The main difficulty of training a neural network is the process of fine-tuning the best set of control parameters in terms of weight and bias. This paper presents a new training method based on hybrid particle swarm optimization with Multi-Verse Optimization (PMVO) to train the feedforward neural networks. The hybrid algorithm is utilized to search better in solution space which proves its efficiency in reducing the problems of trapping in local minima. The performance of the proposed approach was compared with five evolutionary techniques and the standard momentum backpropagation and adaptive learning rate. The comparison was benchmarked and evaluated using six bio-medical datasets. The results of the comparative study show that PMVO outperformed other training methods in most datasets and can be an alternative to other training methods.

Download Full-text

Monolingual Transfer Learning via Bilingual Translators for Style-Sensitive Paraphrase Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6314 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8042-8049

Author(s):

Tomoyuki Kajiwara ◽

Biwa Miura ◽

Yuki Arase

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

State Of The Art ◽

Experimental Results ◽

Fine Tuning ◽

Training Methods ◽

High Quality ◽

Style Transfer ◽

Parallel Corpus ◽

Paraphrase Generation

We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.

Download Full-text

TLBO-FLN: Teaching-Learning Based Optimization of Functional Link Neural Networks for Stock Closing Price Prediction

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327909666191202113015 ◽

2020 ◽

Vol 10 (4) ◽

pp. 522-532 ◽

Cited By ~ 1

Author(s):

Sarat Chandra Nayak ◽

Subhranginee Das ◽

Mohammad Dilsad Ansari

Keyword(s):

Neural Networks ◽

Computational Cost ◽

Optimization Techniques ◽

Fine Tuning ◽

Functional Link ◽

Price Prediction ◽

Closing Price ◽

Teaching Learning Based Optimization ◽

Artificial Neural ◽

Teaching Learning

Background and Objective: Stock closing price prediction is enormously complicated. Artificial Neural Networks (ANN) are excellent approximation algorithms applied to this area. Several nature-inspired evolutionary optimization techniques are proposed and used in the literature to search the optimum parameters of ANN based forecasting models. However, most of them need fine-tuning of several control parameters as well as algorithm specific parameters to achieve optimal performance. Improper tuning of such parameters either leads toward additional computational cost or local optima. Methods: Teaching Learning Based Optimization (TLBO) is a newly proposed algorithm which does not necessitate any parameters specific to it. The intrinsic capability of Functional Link Artificial Neural Network (FLANN) to recognize the multifaceted nonlinear relationship present in the historical stock data made it popular and got wide applications in the stock market prediction. This article presents a hybrid model termed as Teaching Learning Based Optimization of Functional Neural Networks (TLBO-FLN) by combining the advantages of both TLBO and FLANN. Results and Conclusion: The model is evaluated by predicting the short, medium, and long-term closing prices of four emerging stock markets. The performance of the TLBO-FLN model is measured through Mean Absolute Percentage of Error (MAPE), Average Relative Variance (ARV), and coefficient of determination (R2); compared with that of few other state-of-the-art models similarly trained and found superior.

Download Full-text

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01459-0 ◽

2021 ◽

Vol 21 (S2) ◽

Author(s):

Feihong Yang ◽

Xuwen Wang ◽

Hetong Ma ◽

Jiao Li

Keyword(s):

Language Processing ◽

Pearson Correlation ◽

Fine Tuning ◽

Entity Recognition ◽

Training Dataset ◽

Training Methods ◽

Code Size ◽

Model Framework ◽

Language Understanding ◽

Medical Language

Abstract Background Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits. Methods In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn. Results We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers’ script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers. Conclusions The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803. In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.

Download Full-text

Improving Adversarial Attacks on Deep Neural Networks via Constricted Gradient-based Perturbations

Information Sciences ◽

10.1016/j.ins.2021.04.033 ◽

2021 ◽

Author(s):

Yatie Xiao ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Gradient Based

Download Full-text

Towards pixel-to-pixel deep nucleus detection in microscopy images

BMC Bioinformatics ◽

10.1186/s12859-019-3037-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Fuyong Xing ◽

Yuanpu Xie ◽

Xiaoshuang Shi ◽

Pingjun Chen ◽

Zizhao Zhang ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Image Data ◽

Fine Tuning ◽

Cell Detection ◽

Imaging Protocol ◽

Microscopy Image ◽

Microscopy Images ◽

Target Data

Abstract Background Nucleus or cell detection is a fundamental task in microscopy image analysis and supports many other quantitative studies such as object counting, segmentation, tracking, etc. Deep neural networks are emerging as a powerful tool for biomedical image computing; in particular, convolutional neural networks have been widely applied to nucleus/cell detection in microscopy images. However, almost all models are tailored for specific datasets and their applicability to other microscopy image data remains unknown. Some existing studies casually learn and evaluate deep neural networks on multiple microscopy datasets, but there are still several critical, open questions to be addressed. Results We analyze the applicability of deep models specifically for nucleus detection across a wide variety of microscopy image data. More specifically, we present a fully convolutional network-based regression model and extensively evaluate it on large-scale digital pathology and microscopy image datasets, which consist of 23 organs (or cancer diseases) and come from multiple institutions. We demonstrate that for a specific target dataset, training with images from the same types of organs might be usually necessary for nucleus detection. Although the images can be visually similar due to the same staining technique and imaging protocol, deep models learned with images from different organs might not deliver desirable results and would require model fine-tuning to be on a par with those trained with target data. We also observe that training with a mixture of target and other/non-target data does not always mean a higher accuracy of nucleus detection, and it might require proper data manipulation during model training to achieve good performance. Conclusions We conduct a systematic case study on deep models for nucleus detection in a wide variety of microscopy images, aiming to address several important but previously understudied questions. We present and extensively evaluate an end-to-end, pixel-to-pixel fully convolutional regression network and report a few significant findings, some of which might have not been reported in previous studies. The model performance analysis and observations would be helpful to nucleus detection in microscopy images.

Download Full-text

Comparing gradient based learning methods for optimizing predictive neural networks

2014 Recent Advances in Engineering and Computational Sciences (RAECS) ◽

10.1109/raecs.2014.6799573 ◽

2014 ◽

Cited By ~ 1

Author(s):

Dharminder Kumar ◽

Sangeeta Gupta ◽

Parveen Sehgal

Keyword(s):

Neural Networks ◽

Learning Methods ◽

Gradient Based

Download Full-text

Building hydrological single-model ensembles using artificial neural networks and a combinatorial optimization approach

10.5194/egusphere-egu21-8256 ◽

2021 ◽

Author(s):

Juan F. Farfán-Durán ◽

Luis Cea

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Goodness Of Fit ◽

Hydrological Model ◽

Hill Climbing ◽

Single Model ◽

Pearson Coefficient ◽

Gradient Based ◽

Artificial Neural ◽

Model Ensembles

In recent years, the application of model ensembles has received increasing attention in the hydrological modelling community due to the interesting results reported in several studies carried out in different parts of the world. The main idea of these approaches is to combine the results of the same hydrological model or a number of different hydrological models in order to obtain more robust, better-fitting models, reducing at the same time the uncertainty in the predictions. The techniques for combining models range from simple approaches such as averaging different simulations, to more complex techniques such as least squares, genetic algorithms and more recently artificial intelligence techniques such as Artificial Neural Networks (ANN).Despite the good results that model ensembles are able to provide, the models selected to build the ensemble have a direct influence on the results. Contrary to intuition, it has been reported that the best fitting single models do not necessarily produce the best ensemble. Instead, better results can be obtained with ensembles that incorporate models with moderate goodness of fit. This implies that the selection of the single models might have a random component in order to maximize the results that ensemble approaches can provide.The present study is carried out using hydrological data on an hourly scale between 2008 and 2015 corresponding to the Mandeo basin, located in the Northwest of Spain. In order to obtain 1000 single models, a hydrological model was run using 1000 sets of parameters sampled randomly in their feasible space. Then, we have classified the models in 3 groups with the following characteristics: 1) The 25 single models with highest Nash-Sutcliffe coefficient, 2) The 25 single models with the highest Pearson coefficient, and 3) The complete group of 1000 single models.The ensemble models are built with 5 models as the input of an ANN and the observed series as the output. Then, we applied the Random-Restart Hill-Climbing (RRHC) algorithm choosing 5 random models in each iteration to re-train the ANN in order to identify a better ensemble. The algorithm is applied to build 50 ensembles in each group of models. Finally, the results are compared to those obtained by optimizing the model using a gradient-based method by means of the following goodness-of-fit measures: Nash-Sutcliffe (NSE) coefficient, adapted for high flows Nash-Sutcliffe (HF&#8722;NSE), adapted for low flows Nash-Sutcliffe (LF&#8722;W NSE) and coefficient of determination (R2).The results show that the RRHC algorithm can identify adequate ensembles. The ensembles built using the group of models selected based on the NSE outperformed the model optimized by the gradient method in 64 % of the cases in at least 3 of 4 coefficients, both in the calibration and validation stages. Followed by the ensembles built with the group of models selected based on the Pearson coefficient with 56 %. In the case of the third group, no ensembles were identified that outperformed the gradient-based method. However, the most part of the ensembles outperformed the 1000 individual models.Keywords: Multi-model ensemble; Single-model ensemble; Artificial Neural Networks; Hydrological Model; Random-restart Hill-climbing&#160;

Download Full-text

True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461456 ◽

2018 ◽

Cited By ~ 3

Author(s):

Charbel Sakr ◽

Jungwook Choi ◽

Zhuo Wang ◽

Kailash Gopalakrishnan ◽

Naresh Shanbhag

Keyword(s):

Neural Networks ◽

Gradient Based

Download Full-text