HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN

Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.

Download Full-text

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5364 ◽

2020 ◽

Vol 34 (01) ◽

pp. 303-311 ◽

Cited By ~ 3

Author(s):

Sicheng Zhao ◽

Yunsheng Ma ◽

Yang Gu ◽

Jufeng Yang ◽

Tengfei Xing ◽

...

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

State Of The Art ◽

Source Code ◽

Cross Entropy ◽

Attention Network ◽

Audio Features ◽

End To End ◽

3D Cnn ◽

And Training

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

Download Full-text

DRHNet: A Deep Residual Network Based on Heterogeneous Kernel for Steganalysis

Security and Communication Networks ◽

10.1155/2020/8847741 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Yang Xu ◽

Zixi Fu ◽

Guiyong Xu ◽

Sicong Zhang ◽

Xiaoyao Xie

Keyword(s):

Neural Networks ◽

Heterogeneous Network ◽

State Of The Art ◽

Training Phase ◽

Image Size ◽

Residual Network ◽

Training Time ◽

Learning Framework ◽

Residual Learning ◽

The Rich

Convolutional neural networks as steganalysis have problems such as poor versatility, long training time, and limited image size. For these problems, we present a heterogeneous kernel residual learning framework called DRHNet—Dual Residual Heterogeneous Network—to save time on the networks during the training phase. Instead of using the image as an input of the network, we extract and merge the images into a feature matrix using the rich model and use the generated feature matrix as the real input of the network. The architecture we proposed has good versatility and can reduce the computation and the number of parameters while still getting higher accuracy. On BOSSbase 1.01, we evaluate the performance of DRHNet in the setting of the spatial domain and frequency domain. The preliminary experimental results show that DRHNet shows excellent steganalysis performance against the state-of-the-art steganographic algorithms.

Download Full-text

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/348 ◽

2018 ◽

Author(s):

Yao Lu ◽

Guangming Lu ◽

Yuanrong Xu ◽

Bob Zhang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Experimental Results ◽

Training Phase ◽

Low Resolution ◽

Adaptive Regularization ◽

End To End ◽

Overfitting Problem

In order to address the overfitting problem caused by the small or simple training datasets and the large model’s size in Convolutional Neural Networks (CNNs), a novel Auto Adaptive Regularization (AAR) method is proposed in this paper. The relevant networks can be called AAR-CNNs. AAR is the first method using the “abstraction extent” (predicted by AE net) and a tiny learnable module (SE net) to auto adaptively predict more accurate and individualized regularization information. The AAR module can be directly inserted into every stage of any popular networks and trained end to end to improve the networks’ flexibility. This method can not only regularize the network at both the forward and the backward processes in the training phase, but also regularize the network on a more refined level (channel or pixel level) depending on the abstraction extent’s form. Comparative experiments are performed on low resolution ImageNet, CIFAR and SVHN datasets. Experimental results show that the AAR-CNNs can achieve state-of-the-art performances on these datasets.

Download Full-text

Effective Plug-Ins for Reducing Inference-Latency of Spiking Convolutional Neural Networks During Inference Phase

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2021.697469 ◽

2021 ◽

Vol 15 ◽

Author(s):

Xuan Chen ◽

Xiaopeng Yuan ◽

Gaoming Fu ◽

Yuanyong Luo ◽

Tao Yue ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Network Architecture ◽

State Of The Art ◽

Low Cost ◽

Training Phase ◽

Working Mechanism ◽

Simulation Results ◽

New Perspective ◽

Fully Connected

Convolutional Neural Networks (CNNs) are effective and mature in the field of classification, while Spiking Neural Networks (SNNs) are energy-saving for their sparsity of data flow and event-driven working mechanism. Previous work demonstrated that CNNs can be converted into equivalent Spiking Convolutional Neural Networks (SCNNs) without obvious accuracy loss, including different functional layers such as Convolutional (Conv), Fully Connected (FC), Avg-pooling, Max-pooling, and Batch-Normalization (BN) layers. To reduce inference-latency, existing researches mainly concentrated on the normalization of weights to increase the firing rate of neurons. There are also some approaches during training phase or altering the network architecture. However, little attention has been paid on the end of inference phase. From this new perspective, this paper presents 4 stopping criterions as low-cost plug-ins to reduce the inference-latency of SCNNs. The proposed methods are validated using MATLAB and PyTorch platforms with Spiking-AlexNet for CIFAR-10 dataset and Spiking-LeNet-5 for MNIST dataset. Simulation results reveal that, compared to the state-of-the-art methods, the proposed method can shorten the average inference-latency of Spiking-AlexNet from 892 to 267 time steps (almost 3.34 times faster) with the accuracy decline from 87.95 to 87.72%. With our methods, 4 types of Spiking-LeNet-5 only need 24–70 time steps per image with the accuracy decline not more than 0.1%, while models without our methods require 52–138 time steps, almost 1.92 to 3.21 times slower than us.

Download Full-text

When the state of the art is ahead of the state of understanding: Unintuitive properties of deep neural networks

Mètode Revista de difusió de la investigació ◽

10.7203/metode.9.11035 ◽

2018 ◽

Author(s):

Joan Serrà

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Recent Work ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Computing Power ◽

The Media ◽

Insight Into ◽

Empirical Means

Deep learning is an undeniably hot topic, not only within both academia and industry, but also among society and the media. The reasons for the advent of its popularity are manifold: unprecedented availability of data and computing power, some innovative methodologies, minor but significant technical tricks, etc. However, interestingly, the current success and practice of deep learning seems to be uncorrelated with its theoretical, more formal understanding. And with that, deep learning’s state-of-the-art presents a number of unintuitive properties or situations. In this note, I highlight some of these unintuitive properties, trying to show relevant recent work, and expose the need to get insight into them, either by formal or more empirical means.

Download Full-text

Testing the Ability of Convolutional Neural Networks to Learn Radiomic Features

10.1101/2020.09.19.20198077 ◽

2020 ◽

Author(s):

Ivan S. Klyuzhin ◽

Yixi Xu ◽

Anthony Ortiz ◽

Juan M. Lavista Ferres ◽

Ghassan Hamarneh ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Tumor Diameter ◽

Prediction Errors ◽

Additional Input ◽

Shape Irregularity ◽

Synthetic Images ◽

3D Cnn ◽

Tumor Shape

Purpose: To test the ability of convolutional neural networks (CNNs) to effectively capture the intensity, shape, and texture properties of tumors as defined by standardized radiomic features. Methods: Standard 2D and 3D CNN architectures with an increasing number of convolutional layers (up to 9) were trained to predict the values of 16 standardized radiomic features from synthetic images of tumors, and tested. In addition, several ImageNet-pretrained state-of-the-art networks were tested. The synthetic images replicated the quality of real PET images. A total of 4000 images were used for training, 500 for validation, and 500 for testing. Results: Radiomic features quantifying tumor size and intensity were predicted with high accuracy, while shape irregularity features had very high prediction errors and generalized poorly between training and test sets. For example, mean normalized prediction error of tumor diameter (mean intensity) with a 5-layer 2D CNN was 4.23 ± 0.25 (1.88 ± 0.07), while the error for tumor sphericity was 15.64 ± 0.93. Similarly-high error values were found with other shape irregularity and heterogeneity features, both with standard and state-of-the-art networks. Conclusions: Standard CNN architectures and ImageNet-pretrained advanced networks have a significantly lower capacity to capture tumor shape and heterogeneity properties compared to other features. Our findings imply that CNNs trained end-to-end for clinical outcome prediction and other tasks may under-utilize tumor shape and texture information. We hypothesize, that to improve CNN performance, these radiomic features can be computed explicitly and added as auxiliary variables to the dense layers in the networks, or as additional input channels.

Download Full-text

Multi-Label Classification Neural Networks with Hard Logical Constraints

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12850 ◽

2021 ◽

Vol 72 ◽

pp. 759-818

Author(s):

Eleonora Giunchiglia ◽

Thomas Lukasiewicz

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Experimental Analysis ◽

State Of The Art ◽

Superior Performance ◽

Learning Problem ◽

Improve Performance ◽

Hard Constraints ◽

Novel Approach ◽

Logical Constraints

Multi-label classification (MC) is a standard machine learning problem in which a data point can be associated with a set of classes. A more challenging scenario is given by hierarchical multi-label classification (HMC) problems, in which every prediction must satisfy a given set of hard constraints expressing subclass relationships between classes. In this article, we propose C-HMCNN(h), a novel approach for solving HMC problems, which, given a network h for the underlying MC problem, exploits the hierarchy information in order to produce predictions coherent with the constraints and to improve performance. Furthermore, we extend the logic used to express HMC constraints in order to be able to specify more complex relations among the classes and propose a new model CCN(h), which extends C-HMCNN(h) and is again able to satisfy and exploit the constraints to improve performance. We conduct an extensive experimental analysis showing the superior performance of both C-HMCNN(h) and CCN(h) when compared to state-of-the-art models in both the HMC and the general MC setting with hard logical constraints.

Download Full-text

AutoLinker: Automatic Fragment Linking with Deep Conditional Transformer Neural Networks

10.26434/chemrxiv.12271508.v2 ◽

2020 ◽

Author(s):

Yuyao Yang ◽

Shuangjia Zheng ◽

Shimin Su ◽

Jun Xu ◽

Hongming Chen

Keyword(s):

Neural Networks ◽

Drug Discovery ◽

Drug Design ◽

State Of The Art ◽

The State ◽

Generative Model ◽

Lead Optimization ◽

Scaffold Hopping ◽

Reference Models ◽

Lead Generation

Fragment based drug design represents a promising drug discovery paradigm complimentary to the traditional HTS based lead generation strategy. How to link fragment structures to increase compound affinity is remaining a challenge task in this paradigm. Hereby a novel deep generative model (AutoLinker) for linking fragments is developed with the potential for applying in the fragment-based lead generation scenario. The state-of-the-art transformer architecture was employed to learn the linker grammar and generate novel linker. Our results show that, given starting fragments and user customized linker constraints, our AutoLinker model can design abundant drug-like molecules fulfilling these constraints and its performance was superior to other reference models. Moreover, several examples were showcased that AutoLinker can be useful tools for carrying out drug design tasks such as fragment linking, lead optimization and scaffold hopping.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00217-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Alexandru-Lucian Georgescu ◽

Alessandro Pappalardo ◽

Horia Cucu ◽

Michaela Blott

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Decision Makers ◽

Computing Power ◽

Trade Off ◽

Speech Features ◽

Commercial Applications ◽

Guided Tour ◽

Embedded Applications

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.

Download Full-text