scholarly journals Image Error Concealment Based on Deep Neural Network

Algorithms ◽  
2019 ◽  
Vol 12 (4) ◽  
pp. 82
Author(s):  
Zhiqiang Zhang ◽  
Rong Huang ◽  
Fang Han ◽  
Zhijie Wang

In this paper, we propose a novel spatial image error concealment (EC) method based on deep neural network. Considering that the natural images have local correlation and non-local self-similarity, we use the local information to predict the missing pixels and the non-local information to correct the predictions. The deep neural network we utilize can be divided into two parts: the prediction part and the auto-encoder (AE) part. The first part utilizes the local correlation among pixels to predict the missing ones. The second part extracts image features, which are used to collect similar samples from the whole image. In addition, a novel adaptive scan order based on the joint credibility of the support area and reconstruction is also proposed to alleviate the error propagation problem. The experimental results show that the proposed method can reconstruct corrupted images effectively and outperform the compared state-of-the-art methods in terms of objective and perceptual metrics.

Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1351
Author(s):  
Tomasz Hachaj ◽  
Justyna Miazga

Hashtag-based image descriptions are a popular approach for labeling images on social media platforms. In practice, images are often described by more than one hashtag. Due the rapid development of deep neural networks specialized in image embedding and classification, it is now possible to generate those descriptions automatically. In this paper we propose a novel Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM) algorithm that can be used to solve multi-label hashtag recommendation problems. VDNN-ARM is a machine learning approach that utilizes an ensemble of deep neural networks to generate image features, which are then classified to potential hashtag sets. Proposed hashtags are then filtered by a voting schema. The remaining hashtags might be included in a final recommended hashtags dataset by application of associative rules mining, which explores dependencies in certain hashtag groups. Our approach is evaluated on a HARRISON benchmark dataset as a multi-label classification problem. The highest values of our evaluation parameters, including precision, recall, and accuracy, have been obtained for VDNN-ARM with a confidence threshold 0.95. VDNN-ARM outperforms state-of-the-art algorithms, including VGG-Object + VGG-Scene precision by 17.91% as well as ensemble–FFNN (intersection) recall by 32.33% and accuracy by 27.00%. Both the dataset and all source codes we implemented for this research are available for download, and our results can be reproduced.


2020 ◽  
Vol 20 (1) ◽  
pp. 346-356
Author(s):  
Lei Guo ◽  
Yongpei Wang ◽  
Xiangnan Xu ◽  
Kian-Kai Cheng ◽  
Yichi Long ◽  
...  

Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1128 ◽  
Author(s):  
Lin ◽  
Jhang ◽  
Lee ◽  
Lin ◽  
Young

This study proposed a reinforcement Q-learning-based deep neural network (RQDNN) that combined a deep principal component analysis network (DPCANet) and Q-learning to determine a playing strategy for video games. Video game images were used as the inputs. The proposed DPCANet was used to initialize the parameters of the convolution kernel and capture the image features automatically. It performs as a deep neural network and requires less computational complexity than traditional convolution neural networks. A reinforcement Q-learning method was used to implement a strategy for playing the video game. Both Flappy Bird and Atari Breakout games were implemented to verify the proposed method in this study. Experimental results showed that the scores of our proposed RQDNN were better than those of human players and other methods. In addition, the training time of the proposed RQDNN was also far less than other methods.


2019 ◽  
Author(s):  
Marek A. Pedziwiatr ◽  
Matthias Kümmerer ◽  
Thomas S.A. Wallis ◽  
Matthias Bethge ◽  
Christoph Teufel

AbstractEye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic importance across an image, have recently been proposed to support the hypothesis that meaning rather than image features guide human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.


2021 ◽  
Author(s):  
Quoc Vuong

Images are extremely effective at eliciting emotional responses in observers and have been frequently used to investigate the neural correlates of emotion. However, the image features producing this emotional response remain unclear. This study sought to use biologically inspired computational models of the brain to test the hypothesis that these emotional responses can be attributed to the estimation of arousal and valence of objects, scenes and facial expressions in the images. Convolutional neural networks were used to extract all, or various combinations, of high-level image features related to objects, scenes and facial expressions. Subsequent deep feedforward neural networks predicted the images’ arousal and valence value. The model was provided with thousands of pre-annotated images to learn the relationship between the high-level features and the images arousal and valence values. The relationship between arousal and valence was assessed by comparing models that either learnt the constructs separately or together. The results confirmed the effectiveness of using the features to predict human emotion alongside their ability to augment each other. When utilising the object, scene and facial expression information together, the model classified arousal and valence to accuracies of 88% and 87% respectively. The effectiveness of our deep neural network of emotion perception strongly suggests that these same high-level features play a critical component in producing humans’ emotional response. Moreover, performance increased across all models when arousal and valence were learnt together, suggesting a dependent relationship between these affective dimensions. These results open up numerous avenues for future work, whilst also bridging the gap between affective Neuroscience and Computer Vision.


Author(s):  
Hilman F. Pardede ◽  
Asri R. Yuliani ◽  
Rika Sustika

In many applications, speech recognition must operate in conditions where there are some distances between speakers and the microphones. This is called distant speech recognition (DSR). In this condition, speech recognition must deal with reverberation. Nowadays, deep learning technologies are becoming the the main technologies for speech recognition. Deep Neural Network (DNN) in hybrid with Hidden Markov Model (HMM) is the commonly used architecture. However, this system is still not robust against reverberation. Previous studies use Convolutional Neural Networks (CNN), which is a variation of neural network, to improve the robustness of speech recognition against noise. CNN has the properties of pooling which is used to find local correlation between neighboring dimensions in the features. With this property, CNN could be used as feature learning emphasizing the information on neighboring frames. In this study we use CNN to deal with reverberation. We also propose to use feature transformation techniques: linear discriminat analysis (LDA) and maximum likelihood linear transformation (MLLT), on mel frequency cepstral coefficient (MFCC) before feeding them to CNN. We argue that transforming features could produce more discriminative features for CNN, and hence improve the robustness of speech recognition against reverberation. Our evaluations on Meeting Recorder Digits (MRD) subset of Aurora-5 database confirm that the use of LDA and MLLT transformations improve the robustness of speech recognition. It is better by 20% relative error reduction on compared to a standard DNN based speech recognition using the same number of hidden layers.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jin Fan ◽  
Yipan Huang ◽  
Ke Zhang ◽  
Sen Wang ◽  
Jinhua Chen ◽  
...  

Multivariate time series prediction is a very important task, which plays a huge role in climate, economy, and other fields. We usually use an Attention-based Encoder-Decoder network to deal with multivariate time series prediction because the attention mechanism makes it easier for the model to focus on the really important attributes. However, the Encoder-Decoder network has the problem that the longer the length of the sequence is, the worse the prediction accuracy is, which means that the Encoder-Decoder network cannot process long series and therefore cannot obtain detailed historical information. In this paper, we propose a dual-window deep neural network (DWNet) to predict time series. The dual-window mechanism allows the model to mine multigranularity dependencies of time series, such as local information obtained from a short sequence and global information obtained from a long sequence. Our model outperforms nine baseline methods in four different datasets.


Sign in / Sign up

Export Citation Format

Share Document