scholarly journals DSTnet: Deformable Spatio-Temporal Convolutional Residual Network for Video Super-Resolution

Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2873
Author(s):  
Anusha Khan ◽  
Allah Bux Sargano ◽  
Zulfiqar Habib

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods.

2018 ◽  
Vol 4 (9) ◽  
pp. 107 ◽  
Author(s):  
Mohib Ullah ◽  
Ahmed Mohammed ◽  
Faouzi Alaya Cheikh

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2085
Author(s):  
Lei Han ◽  
Cien Fan ◽  
Ye Yang ◽  
Lian Zou

Recently, convolutional neural networks have made a remarkable performance for video super-resolution. However, how to exploit the spatial and temporal information of video efficiently and effectively remains challenging. In this work, we design a bidirectional temporal-recurrent propagation unit. The bidirectional temporal-recurrent propagation unit makes it possible to flow temporal information in an RNN-like manner from frame to frame, which avoids complex motion estimation modeling and motion compensation. To better fuse the information of the two temporal-recurrent propagation units, we use channel attention mechanisms. Additionally, we recommend a progressive up-sampling method instead of one-step up-sampling. We find that progressive up-sampling gets better experimental results than one-stage up-sampling. Extensive experiments show that our algorithm outperforms several recent state-of-the-art video super-resolution (VSR) methods with a smaller model size.


2021 ◽  
Vol 11 (8) ◽  
pp. 3636
Author(s):  
Faria Zarin Subah ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Takeshi Koshiba

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.


Author(s):  
Guoan Cheng ◽  
Ai Matsune ◽  
Huaijuan Zang ◽  
Toru Kurihara ◽  
Shu Zhan

In this paper, we propose an enhanced dual path attention network (EDPAN) for image super-resolution. ResNet is good at implicitly reusing extracted features, DenseNet is good at exploring new features. Dual Path Network (DPN) combines ResNets and DenseNet to create a more accurate architecture than the straightforward one. We experimentally show that the residual network performs best when each block consists of two convolutions, and the dense network performs best when each micro-block consists of one convolution. Following these ideas, our EDPAN exploits the advantages of the residual structure and the dense structure. Besides, to deploy the computations for features more effectively, we introduce the attention mechanism into our EDPAN. Moreover, to relieve the parameters burden, we also utilize recursive learning to propose a lightweight model. In the experiments, we demonstrate the effectiveness and robustness of our proposed EDPAN on different degradation situations. The quantitative results and visualization comparison can sufficiently indicate that our EDPAN achieves favorable performance over the state-of-the-art frameworks.


2009 ◽  
Vol 42 (2) ◽  
pp. 267-282 ◽  
Author(s):  
W. Ren ◽  
S. Singh ◽  
M. Singh ◽  
Y.S. Zhu

Author(s):  
Liangchen Luo ◽  
Wenhao Huang ◽  
Qi Zeng ◽  
Zaiqing Nie ◽  
Xu Sun

Most existing works on dialog systems only consider conversation content while neglecting the personality of the user the bot is interacting with, which begets several unsolved issues. In this paper, we present a personalized end-to-end model in an attempt to leverage personalization in goal-oriented dialogs. We first introduce a PROFILE MODEL which encodes user profiles into distributed embeddings and refers to conversation history from other similar users. Then a PREFERENCE MODEL captures user preferences over knowledge base entities to handle the ambiguity in user requests. The two models are combined into the PERSONALIZED MEMN2N. Experiments show that the proposed model achieves qualitative performance improvements over state-of-the-art methods. As for human evaluation, it also outperforms other approaches in terms of task completion rate and user satisfaction.


2015 ◽  
Vol 6 (1) ◽  
Author(s):  
Luca Lanzanò ◽  
Iván Coto Hernández ◽  
Marco Castello ◽  
Enrico Gratton ◽  
Alberto Diaspro ◽  
...  

2020 ◽  
Author(s):  
Jawad Khan

Activity recognition is a topic undergoing massive research in the field of computer vision. Applications of activity recognition include sports summaries, human-computer interaction, violence detection, surveillance etc. In this paper, we propose the modification of the standard local binary patterns descriptor to obtain a concatenated histogram of lower dimensions. This helps to encode the spatial and temporal information of various actions happening in a frame. This method helps to overcome the dimensionality problem that occurs with LBP and the results show that the proposed method performed comparably with state of the art methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11966-11973
Author(s):  
Hao Shao ◽  
Shengju Qian ◽  
Yu Liu

For a long time, the vision community tries to learn the spatio-temporal representation by combining convolutional neural network together with various temporal models, such as the families of Markov chain, optical flow, RNN and temporal convolution. However, these pipelines consume enormous computing resources due to the alternately learning process for spatial and temporal information. One natural question is whether we can embed the temporal information into the spatial one so the information in the two domains can be jointly learned once-only. In this work, we answer this question by presenting a simple yet powerful operator – temporal interlacing network (TIN). Instead of learning the temporal features, TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa. A differentiable interlacing target can be learned to control the interlacing process. In this way, a heavy temporal model is replaced by a simple interlacing operator. We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. These results push the state-of-the-art performances of video understanding by a considerable margin. Not surprising, the ensemble model of the proposed TIN won the 1st place in the ICCV19 - Multi Moments in Time challenge. Code is made available to facilitate further research.1


2020 ◽  
Author(s):  
Zhou Hang ◽  
Quan Tingwei ◽  
Huang Qing ◽  
Liu Tian ◽  
Cao Tingting ◽  
...  

AbstractNeuron reconstruction can provide the quantitative data required for measuring the neuronal morphology and is crucial in the field of brain research. However, the difficulty in reconstructing packed neuritis, wherein massive labor is required for accurate reconstruction in most cases, has not been resolved. In this work, we provide a fundamental pathway for solving this challenge by proposing the use of the super-resolution segmentation network (SRSNet) that builds the mapping of the neurites in the original neuronal images and their segmentation in a higher-resolution space. SRSNet focuses on enlarging the distances between the boundaries of the packed neurites producing the high-resolution segmentation images. Thus, in the construction of the training datasets, only the traced skeletons of neurites are required, which vastly increase the usability of SRSNet. From the results of the experiments conducted in this work, it has been observed that SRSNet achieves accurate reconstruction of packed neurites where the other state-of-the-art methods fail.


Sign in / Sign up

Export Citation Format

Share Document