Regularized Instance Embedding for Deep Multi-Instance Learning

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.

Download Full-text

Collaborative Learning for Weakly Supervised Object Detection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/135 ◽

2018 ◽

Cited By ~ 8

Author(s):

Jiajie Wang ◽

Jiangchao Yao ◽

Ya Zhang ◽

Rui Zhang

Keyword(s):

Collaborative Learning ◽

Object Detection ◽

Supervised Learning ◽

Data Sets ◽

Learning Framework ◽

Model Consistency ◽

Feature Sharing ◽

Weakly Supervised ◽

The Cost ◽

Collaborative Detection

Weakly supervised object detection has recently received much attention, since it only requires image-level labels instead of the bounding-box labels consumed in strongly supervised learning. Nevertheless, the save in labeling expense is usually at the cost of model accuracy.In this paper, we propose a simple but effective weakly supervised collaborative learning framework to resolve this problem, which trains a weakly supervised learner and a strongly supervised learner jointly by enforcing partial feature sharing and prediction consistency. For object detection, taking WSDDN-like architecture as weakly supervised detector sub-network and Faster-RCNN-like architecture as strongly supervised detector sub-network, we propose an end-to-end Weakly Supervised Collaborative Detection Network. As there is no strong supervision available to train the Faster-RCNN-like sub-network, a new prediction consistency loss is defined to enforce consistency of predictions between the two sub-networks as well as within the Faster-RCNN-like sub-networks. At the same time, the two detectors are designed to partially share features to further guarantee the model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007 and 2012 data sets have demonstrated the effectiveness of the proposed framework.

Download Full-text

Electrocardiogram Classification for Arrhythmia using Convolutional Neural Network 2D and Adabound Optimizer

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e4591.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1277-1284

Keyword(s):

Neural Network ◽

Cardiovascular Disease ◽

Convolutional Neural Network ◽

Gradient Descent ◽

Stochastic Gradient Descent ◽

Transform Method ◽

The World ◽

Optimal Accuracy ◽

Deadly Disease ◽

Electrocardiogram Ecg

Cardiovascular disease is the number one deadly disease in the world. Arrhythmia is one of the types of cardiovascular disease which is hard to detect but by using the routine electrocardiogram (ECG) recording. Due to the variety and the noise of ECG, it is very time consuming to detect it only by experts using bare eyes.Learning from the previous research in order to help the experts, this research develop 11 layers Convolutional Neural Network 2D (CNN 2D) using MITBIH Arrhythmia Dataset. The dataset is firstly preprocessed by using wavelet transform method, then being segmented by R-peak method. The challenge is how to conquer the imbalance and small amount of data but still get the optimal accuracy. This research can be helpful in helping the doctors figure out the type of arrhythmia of the patient. Therefore, this research did the comparison of various optimizers attach in CNN 2D namely, Adabound, Adadelta, Adagrad, Amsbound, Adam and Stochastic Gradient Descent (SGD). The result is Adabound get the highest performance with 91% accuracy and faster 1s training duration than Adam which is approximately 18s per epoch.

Download Full-text

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2014.6854928 ◽

2014 ◽

Cited By ~ 5

Author(s):

Zhao You ◽

Xiaorui Wang ◽

Bo Xu

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Deep Neural Network ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Neural Network Training ◽

Network Training

Download Full-text

Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding

Entropy ◽

10.3390/e22020213 ◽

2020 ◽

Vol 22 (2) ◽

pp. 213 ◽

Cited By ~ 1

Author(s):

Yiğit Uğur ◽

George Arvanitakis ◽

Abdellatif Zaidi

Keyword(s):

Neural Networks ◽

Lower Bound ◽

Gradient Descent ◽

Gaussian Mixture ◽

Variational Inference ◽

Stochastic Gradient Descent ◽

Information Bottleneck ◽

Latent Space ◽

Type Algorithm ◽

The Cost

In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.

Download Full-text

Application and Need-Based Architecture Design of Deep Neural Networks

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142052014x ◽

2020 ◽

Vol 34 (13) ◽

pp. 2052014 ◽

Cited By ~ 1

Author(s):

Soniya ◽

Sandeep Paul ◽

Lotika Singh

Keyword(s):

Genetic Algorithm ◽

Network Structure ◽

Gradient Descent ◽

Stochastic Gradient Descent ◽

Number Of Layers ◽

Effective Manner ◽

Compact Genetic Algorithm ◽

Benchmark Datasets ◽

The Cost ◽

Optimal Set

This paper applies a hybrid evolutionary approach to a convolutional neural network (CNN) and determines the number of layers and filters based on the application and user need. It integrates compact genetic algorithm with stochastic gradient descent (SGD) for simultaneously evolving structure and parameters of the CNN. It defines an effectual string representation for combining structure and parameters of the CNN. The compact genetic algorithm helps in the evolution of network structure by optimizing the number of convolutional layers and number of filters in each convolutional layer. At the same time, an optimal set of weight parameters of the network is obtained using the SGD law. This approach amalgamates exploration in network space by compact genetic algorithm and exploitation in weight space with SGD in an effective manner. The proposed approach also incorporates user-defined parameters in the cost function in an elegant manner which controls the network structure and hence the performance of the network based on the users need. The effectiveness of the proposed approach has been demonstrated on four benchmark datasets, namely MNIST, COIL-100, CIFAR-10 and CIFAR-100. The obtained results clearly demonstrate the potential of the proposed approach by evolving architectures based on the nature of the application and the need of the user.

Download Full-text

Neural ODEs as the deep limit of ResNets with constant weights

Analysis and Applications ◽

10.1142/s0219530520400023 ◽

2020 ◽

pp. 1-41 ◽

Cited By ~ 1

Author(s):

Benny Avelin ◽

Kaj Nyström

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Deep Neural Network ◽

Theoretical Foundation ◽

Stochastic Gradient ◽

Loss Functions ◽

Stochastic Gradient Descent ◽

Decay Estimates ◽

Value Loss ◽

Fokker Planck Equations

In this paper, we prove that, in the deep limit, the stochastic gradient descent on a ResNet type deep neural network, where each layer shares the same weight matrix, converges to the stochastic gradient descent for a Neural ODE and that the corresponding value/loss functions converge. Our result gives, in the context of minimization by stochastic gradient descent, a theoretical foundation for considering Neural ODEs as the deep limit of ResNets. Our proof is based on certain decay estimates for associated Fokker–Planck equations.

Download Full-text

Challenging the Recognition of Facial Expression via Deep Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.717 ◽

2014 ◽

Vol 571-572 ◽

pp. 717-720

Author(s):

De Kun Hu ◽

Yong Hong Liu ◽

Li Zhang ◽

Gui Duo Duan

Keyword(s):

Neural Network ◽

Facial Expression ◽

Gradient Descent ◽

Deep Neural Network ◽

Back Propagation ◽

Stochastic Gradient Descent ◽

Excellent Performance ◽

Proposed Model ◽

Input Layer ◽

Fully Connected

A deep Neural Network model was trained to classify the facial expression in unconstrained images, which comprises nine layers, including input layer, convolutional layer, pooling layer, fully connected layers and output layer. In order to optimize the model, rectified linear units for the nonlinear transformation, weights sharing for reducing the complexity, “mean” and “max” pooling for subsample, “dropout” for sparsity are applied in the forward processing. With large amounts of hard training faces, the model was trained via back propagation method with stochastic gradient descent. The results of shows the proposed model achieves excellent performance.

Download Full-text

MLRDA: A Multi-Task Semi-Supervised Learning Framework for Drug-Drug Interaction Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/628 ◽

2019 ◽

Author(s):

Xu Chu ◽

Yang Lin ◽

Yasha Wang ◽

Leye Wang ◽

Jiangtao Wang ◽

...

Keyword(s):

Supervised Learning ◽

State Of The Art ◽

Multiple Drug ◽

Prediction Methods ◽

Learning Models ◽

Preventable Hospitalizations ◽

Interaction Prediction ◽

Learning Framework ◽

Task Learning ◽

Real World Datasets

Drug-drug interactions (DDIs) are a major cause of preventable hospitalizations and deaths. Recently, researchers in the AI community try to improve DDI prediction in two directions, incorporating multiple drug features to better model the pharmacodynamics and adopting multi-task learning to exploit associations among DDI types. However, these two directions are challenging to reconcile due to the sparse nature of the DDI labels which inflates the risk of overfitting of multi-task learning models when incorporating multiple drug features. In this paper, we propose a multi-task semi-supervised learning framework MLRDA for DDI prediction. MLRDA effectively exploits information that is beneficial for DDI prediction in unlabeled drug data by leveraging a novel unsupervised disentangling loss CuXCov. The CuXCov loss cooperates with the classification loss to disentangle the DDI prediction relevant part from the irrelevant part in a representation learnt by an autoencoder, which helps to ease the difficulty in mining useful information for DDI prediction in both labeled and unlabeled drug data. Moreover, MLRDA adopts a multi-task learning framework to exploit associations among DDI types. Experimental results on real-world datasets demonstrate that MLRDA significantly outperforms state-of-the-art DDI prediction methods by up to 10.3% in AUPR.

Download Full-text

Deep Convolutional Spiking Neural Networks for Image Classification

10.18122/td.1782.boisestate ◽

2021 ◽

Author(s):

Ruthvik Vaila

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Spiking Neural Networks ◽

Stochastic Gradient Descent ◽

Data Set ◽

Learning Capabilities ◽

Artificial Neural

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.

Download Full-text

Interpolation Consistency Training for Semi-supervised Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/504 ◽

2019 ◽

Cited By ~ 39

Author(s):

Vikas Verma ◽

Alex Lamb ◽

Juho Kannala ◽

Yoshua Bengio ◽

David Lopez-Paz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Supervised Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Network Architectures ◽

Low Density ◽

Decision Boundary ◽

Classification Problems

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

Download Full-text