Encouraging an appropriate representation simplifies training of neural networks

AbstractA common assumption about neural networks is that they can learn an appropriate internal representation on their own, see e.g. end-to-end learning. In this work we challenge this assumption. We consider two simple tasks and show that the state-of-the-art training algorithm fails, although the model itself is able to represent an appropriate solution. We will demonstrate that encouraging an appropriate internal representation allows the same model to solve these tasks. While we do not claim that it is impossible to solve these tasks by other means (such as neural networks with more layers), our results illustrate that integration of domain knowledge in form of a desired internal representation may improve the generalization ability of neural networks.

Download Full-text

AutoLinker: Automatic Fragment Linking with Deep Conditional Transformer Neural Networks

10.26434/chemrxiv.12271508.v2 ◽

2020 ◽

Author(s):

Yuyao Yang ◽

Shuangjia Zheng ◽

Shimin Su ◽

Jun Xu ◽

Hongming Chen

Keyword(s):

Neural Networks ◽

Drug Discovery ◽

Drug Design ◽

State Of The Art ◽

The State ◽

Generative Model ◽

Lead Optimization ◽

Scaffold Hopping ◽

Reference Models ◽

Lead Generation

Fragment based drug design represents a promising drug discovery paradigm complimentary to the traditional HTS based lead generation strategy. How to link fragment structures to increase compound affinity is remaining a challenge task in this paradigm. Hereby a novel deep generative model (AutoLinker) for linking fragments is developed with the potential for applying in the fragment-based lead generation scenario. The state-of-the-art transformer architecture was employed to learn the linker grammar and generate novel linker. Our results show that, given starting fragments and user customized linker constraints, our AutoLinker model can design abundant drug-like molecules fulfilling these constraints and its performance was superior to other reference models. Moreover, several examples were showcased that AutoLinker can be useful tools for carrying out drug design tasks such as fragment linking, lead optimization and scaffold hopping.

Download Full-text

Recursive Neural Networks Based on PSO for Image Parsing

Abstract and Applied Analysis ◽

10.1155/2013/617618 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7

Author(s):

Guo-Rong Cai ◽

Shui-Li Chen

Keyword(s):

Neural Networks ◽

Particle Swarm Optimization ◽

Objective Function ◽

State Of The Art ◽

Pso Algorithm ◽

Training Algorithm ◽

Image Parsing ◽

Swarm Optimization ◽

Recursive Neural Networks ◽

Parsing Algorithm

This paper presents an image parsing algorithm which is based on Particle Swarm Optimization (PSO) and Recursive Neural Networks (RNNs). State-of-the-art method such as traditional RNN-based parsing strategy uses L-BFGS over the complete data for learning the parameters. However, this could cause problems due to the nondifferentiable objective function. In order to solve this problem, the PSO algorithm has been employed to tune the weights of RNN for minimizing the objective. Experimental results obtained on the Stanford background dataset show that our PSO-based training algorithm outperforms traditional RNN, Pixel CRF, region-based energy, simultaneous MRF, and superpixel MRF.

Download Full-text

An Explorative Study of Virtual Product Placement

Advances in Multimedia and Interactive Technologies - Online Multimedia Advertising ◽

10.4018/978-1-60960-189-8.ch008 ◽

2011 ◽

pp. 122-147

Author(s):

Chia-Hu Chang ◽

Ja-Ling Wu

Keyword(s):

Domain Knowledge ◽

Design Space ◽

State Of The Art ◽

Product Placement ◽

The State ◽

Explorative Study ◽

Advertising Message ◽

Computational Aesthetics ◽

The Right ◽

Virtual Product

With the aid of content-based multimedia analysis, virtual product placement opens up new opportunities for advertisers to effectively monetize the existing videos in an efficient way. In addition, a number of significant and challenging issues are raising accordingly, such as how to less-intrusively insert the contextually relevant advertising message (what) at the right place (where) and the right time (when) with the attractive representation (how) in the videos. In this chapter, domain knowledge in support of delivering and receiving the advertising message is introduced, such as the advertising theory, psychology and computational aesthetics. We briefly review the state of the art techniques for assisting virtual product placement in videos. In addition, we present a framework to serve the virtual spotlighted advertising (ViSA) for virtual product placement and give an explorative study of it. Moreover, observations about the new trend and possible extension in the design space of virtual product placement will also be stated and discussed. We believe that it would inspire the researchers to develop more interesting and applicable multimedia advertising systems for virtual product placement.

Download Full-text

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5364 ◽

2020 ◽

Vol 34 (01) ◽

pp. 303-311 ◽

Cited By ~ 3

Author(s):

Sicheng Zhao ◽

Yunsheng Ma ◽

Yang Gu ◽

Jufeng Yang ◽

Tengfei Xing ◽

...

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

State Of The Art ◽

Source Code ◽

Cross Entropy ◽

Attention Network ◽

Audio Features ◽

End To End ◽

3D Cnn ◽

And Training

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

Download Full-text

Thresholded ConvNet ensembles: neural networks for technical forecasting

Neural Computing and Applications ◽

10.1007/s00521-020-04877-9 ◽

2020 ◽

Vol 32 (18) ◽

pp. 15249-15262

Author(s):

Sid Ghoshal ◽

Stephen Roberts

Keyword(s):

Neural Networks ◽

Domain Knowledge ◽

Deep Neural Network ◽

State Of The Art ◽

Financial Time Series ◽

Visual Representations ◽

Financial Time ◽

Subjective Form ◽

Processing Techniques ◽

Signal Processing Techniques

Abstract Much of modern practice in financial forecasting relies on technicals, an umbrella term for several heuristics applying visual pattern recognition to price charts. Despite its ubiquity in financial media, the reliability of its signals remains a contentious and highly subjective form of ‘domain knowledge’. We investigate the predictive value of patterns in financial time series, applying machine learning and signal processing techniques to 22 years of US equity data. By reframing technical analysis as a poorly specified, arbitrarily preset feature-extractive layer in a deep neural network, we show that better convolutional filters can be learned directly from the data, and provide visual representations of the features being identified. We find that an ensemble of shallow, thresholded convolutional neural networks optimised over different resolutions achieves state-of-the-art performance on this domain, outperforming technical methods while retaining some of their interpretability.

Download Full-text

Using Natural Language Processing and Artificial Intelligence to Explore the Nutrition and Sustainability of Recipes and Food

Frontiers in Artificial Intelligence ◽

10.3389/frai.2020.621577 ◽

2021 ◽

Vol 3 ◽

Author(s):

Marieke van Erp ◽

Christian Reynolds ◽

Diana Maynard ◽

Alain Starke ◽

Rebeca Ibáñez Martín ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

State Of The Art ◽

Interdisciplinary Approach ◽

Comprehensive Analysis ◽

The State ◽

Use Cases

In this paper, we discuss the use of natural language processing and artificial intelligence to analyze nutritional and sustainability aspects of recipes and food. We present the state-of-the-art and some use cases, followed by a discussion of challenges. Our perspective on addressing these is that while they typically have a technical nature, they nevertheless require an interdisciplinary approach combining natural language processing and artificial intelligence with expert domain knowledge to create practical tools and comprehensive analysis for the food domain.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Modified Deep Neural Networks for Dog Breeds Identification

10.20944/preprints201812.0232.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Aydin Ayanzadeh ◽

Sahand Vahidnia

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

The State ◽

Fine Tuning ◽

Test Accuracy ◽

Data Sets ◽

Data Set

In this paper, we leverage state of the art models on Imagenet data-sets. We use the pre-trained model and learned weighs to extract the feature from the Dog breeds identification data-set. Afterwards, we applied fine-tuning and dataaugmentation to increase the performance of our test accuracy in classification of dog breeds datasets. The performance of the proposed approaches are compared with the state of the art models of Image-Net datasets such as ResNet-50, DenseNet-121, DenseNet-169 and GoogleNet. we achieved 89.66% , 85.37% 84.01% and 82.08% test accuracy respectively which shows thesuperior performance of proposed method to the previous works on Stanford dog breeds datasets.

Download Full-text

Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference

Security and Communication Networks ◽

10.1155/2021/6664578 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hongwei Luo ◽

Yijie Shen ◽

Feng Lin ◽

Guoai Xu

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

State Of The Art ◽

Speaker Verification ◽

Signal To Noise Ratio ◽

The State ◽

Verification System ◽

Adversarial Examples ◽

Human Hearing

Speaker verification system has gained great popularity in recent years, especially with the development of deep neural networks and Internet of Things. However, the security of speaker verification system based on deep neural networks has not been well investigated. In this paper, we propose an attack to spoof the state-of-the-art speaker verification system based on generalized end-to-end (GE2E) loss function for misclassifying illegal users into the authentic user. Specifically, we design a novel loss function to deploy a generator for generating effective adversarial examples with slight perturbation and then spoof the system with these adversarial examples to achieve our goals. The success rate of our attack can reach 82% when cosine similarity is adopted to deploy the deep-learning-based speaker verification system. Beyond that, our experiments also reported the signal-to-noise ratio at 76 dB, which proves that our attack has higher imperceptibility than previous works. In summary, the results show that our attack not only can spoof the state-of-the-art neural-network-based speaker verification system but also more importantly has the ability to hide from human hearing or machine discrimination.

Download Full-text

Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks

Computational Intelligence and Neuroscience ◽

10.1155/2018/6747098 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 18

Author(s):

Md Zahangir Alom ◽

Paheding Sidike ◽

Mahmudul Hasan ◽

Tarek M. Taha ◽

Vijayan K. Asari

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Convolutional Neural Networks ◽

Character Recognition ◽

State Of The Art ◽

The State ◽

Superior Performance ◽

Deep Convolutional Neural Networks ◽

Practical Applications ◽

High Degree

In spite of advances in object recognition technology, handwritten Bangla character recognition (HBCR) remains largely unsolved due to the presence of many ambiguous handwritten characters and excessively cursive Bangla handwritings. Even many advanced existing methods do not lead to satisfactory performance in practice that related to HBCR. In this paper, a set of the state-of-the-art deep convolutional neural networks (DCNNs) is discussed and their performance on the application of HBCR is systematically evaluated. The main advantage of DCNN approaches is that they can extract discriminative features from raw data and represent them with a high degree of invariance to object distortions. The experimental results show the superior performance of DCNN models compared with the other popular object recognition approaches, which implies DCNN can be a good candidate for building an automatic HBCR system for practical applications.

Download Full-text