scholarly journals Multi-Scale 3D Convolution Network for Video Based Person Re-Identification

Author(s):  
Jianing Li ◽  
Shiliang Zhang ◽  
Tiejun Huang

This paper proposes a two-stream convolution network to extract spatial and temporal cues for video based person ReIdentification (ReID). A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network. The resulting M3D convolution network introduces a fraction of parameters into the 2D CNN, but gains the ability of multi-scale temporal feature learning. With this compact architecture, M3D convolution network is also more efficient and easier to optimize than existing 3D convolution networks. The temporal stream further involves Residual Attention Layers (RAL) to refine the temporal features. By jointly learning spatial-temporal attention masks in a residual manner, RAL identifies the discriminative spatial regions and temporal cues. The other stream in our network is implemented with a 2D CNN for spatial feature extraction. The spatial and temporal features from two streams are finally fused for the video based person ReID. Evaluations on three widely used benchmarks datasets, i.e.,MARS, PRID2011, and iLIDS-VID demonstrate the substantial advantages of our method over existing 3D convolution networks and state-of-art methods.

2015 ◽  
Vol 4 (4) ◽  
pp. 1870-1893 ◽  
Author(s):  
Wen Luo ◽  
Zhao-Yuan Yu ◽  
Sheng-Jun Xiao ◽  
A-Xing Zhu ◽  
Lin-Wang Yuan

Electronics ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1208 ◽  
Author(s):  
Kang Yue ◽  
Danli Wang

Visual fatigue evaluation plays an important role in applications such as virtual reality since the visual fatigue symptoms always affect the user experience seriously. Existing visual evaluation methods require hand-crafted features for classification, and conduct feature extraction and classification in a separated manner. In this paper, we conduct a designed experiment to collect electroencephalogram (EEG) signals of various visual fatigue levels, and present a multi-scale convolutional neural network (CNN) architecture named MorletInceptionNet to detect visual fatigue using EEG as input, which exploits the spatial-temporal structure of multichannel EEG signals. Our MorletInceptionNet adopts a joint space-time-frequency features extraction scheme in which Morlet wavelet-like kernels are used for time-frequency raw feature extraction and inception architecture are further used to extract multi-scale temporal features. Then, the multi-scale temporal features are concatenated and fed to the fully connected layer for visual fatigue evaluation using classification. In experiment evaluation, we compare our method with five state-of-the-art methods, and the results demonstrate that our model achieve overally the best performance better performance for two widely used evaluation metrics, i.e., classification accuracy and kappa value. Furthermore, we use input-perturbation network-prediction correlation maps to conduct in-depth analysis into the reason why the proposed method outperforms other methods. The results suggest that our model is sensitive to the perturbation of β (14–30 Hz) and γ (30–40 Hz) bands. Furthermore, their spatial patterns are of high correlation with that of the corresponding power spectral densities which are used as evaluation features traditionally. This finding provides evidence of the hypothesis that the proposed model can learn the joint time-frequency-space features to distinguish fatigue levels automatically.


Author(s):  
Jung-Hoon Cho ◽  
Seung Woo Ham ◽  
Dong-Kyu Kim

With the growth of the bike-sharing system, the problem of demand forecasting has become important to the bike-sharing system. This study aims to develop a novel prediction model that enhances the accuracy of the peak hourly demand. A spatiotemporal graph convolutional network (STGCN) is constructed to consider both the spatial and temporal features. One of the model’s essential steps is determining the main component of the adjacency matrix and the node feature matrix. To achieve this, 131 days of data from the bike-sharing system in Seoul are used and experiments conducted on the models with various adjacency matrices and node feature matrices, including public transit usage. The results indicate that the STGCN models reflecting the previous demand pattern to the adjacency matrix show outstanding performance in predicting demand compared with the other models. The results also show that the model that includes bus boarding and alighting records is more accurate than the model that contains subway records, inferring that buses have a greater connection to bike-sharing than the subway. The proposed STGCN with public transit data contributes to the alleviation of unmet demand by enhancing the accuracy in predicting peak demand.


2021 ◽  
pp. 0309524X2199826
Author(s):  
Guowei Cai ◽  
Yuqing Yang ◽  
Chao Pan ◽  
Dian Wang ◽  
Fengjiao Yu ◽  
...  

Multi-step real-time prediction based on the spatial correlation of wind speed is a research hotspot for large-scale wind power grid integration, and this paper proposes a multi-location multi-step wind speed combination prediction method based on the spatial correlation of wind speed. The correlation coefficients were determined by gray relational analysis for each turbine in the wind farm. Based on this, timing-control spatial association optimization is used for optimization and scheduling, obtaining spatial information on the typical turbine and its neighborhood information. This spatial information is reconstructed to improve the efficiency of spatial feature extraction. The reconstructed spatio-temporal information is input into a convolutional neural network with memory cells. Spatial feature extraction and multi-step real-time prediction are carried out, avoiding the problem of missing information affecting prediction accuracy. The method is innovative in terms of both efficiency and accuracy, and the prediction accuracy and generalization ability of the proposed method is verified by predicting wind speed and wind power for different wind farms.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 319
Author(s):  
Yi Wang ◽  
Xiao Song ◽  
Guanghong Gong ◽  
Ni Li

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.


Sign in / Sign up

Export Citation Format

Share Document