Multi-source Spatio-temporal Hybrid Dilated Graph Convolutional Network for Traffic Speed Forecasting

Author(s):  
Lei Zhang ◽  
Quansheng Guo ◽  
Dong Li ◽  
Jiaxing Pan ◽  
Chuyuan Wei ◽  
...  
Author(s):  
Sophia Bano ◽  
Francisco Vasconcelos ◽  
Emmanuel Vander Poorten ◽  
Tom Vercauteren ◽  
Sebastien Ourselin ◽  
...  

Abstract Purpose Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos. Methods We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation. Results We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos. Conclusion FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.


Author(s):  
Yinong Zhang ◽  
Shanshan Guan ◽  
Cheng Xu ◽  
Hongzhe Liu

In the era of intelligent education, human behavior recognition based on computer vision is an important branch of pattern recognition. Human behavior recognition is a basic technology in the fields of intelligent monitoring and human-computer interaction in education. The dynamic changes of human skeleton provide important information for the recognition of educational behavior. Traditional methods usually use manual information to label or traverse rules only, resulting in limited representation capabilities and poor generalization performance of the model. In this paper, a kind of dynamic skeleton model with residual is adopted—a spatio-temporal graph convolutional network based on residual connections, which not only overcomes the limitations of previous methods, but also can learn the spatio-temporal model from the skeleton data. In the big bone NTU-RGB + D dataset, the network model not only improved the representation ability of human behavior characteristics, but also improved the generalization ability, and achieved better recognition effect than the existing model. In addition, this paper also compares the results of behavior recognition on subsets of different joint points, and finds that spatial structure division have better effects.


Author(s):  
Xiaobin Zhu ◽  
Zhuangzi Li ◽  
Xiao-Yu Zhang ◽  
Changsheng Li ◽  
Yaqi Liu ◽  
...  

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.


2019 ◽  
Vol 155 ◽  
pp. 551-558
Author(s):  
Kuldeep Kurte ◽  
Srinath Ravulaparthy ◽  
Anne Berres ◽  
Melissa Allen ◽  
Jibonananda Sanyal

2019 ◽  
Vol 11 (2) ◽  
pp. 42 ◽  
Author(s):  
Sheeraz Arif ◽  
Jing Wang ◽  
Tehseen Ul Hassan ◽  
Zesong Fei

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.


2020 ◽  
Vol 10 (4) ◽  
pp. 1509 ◽  
Author(s):  
Liang Ge ◽  
Siyu Li ◽  
Yaqian Wang ◽  
Feng Chang ◽  
Kunyan Wu

Traffic speed prediction plays a significant role in the intelligent traffic system (ITS). However, due to the complex spatial-temporal correlations of traffic data, it is very challenging to predict traffic speed timely and accurately. The traffic speed renders not only short-term neighboring and multiple long-term periodic dependencies in the temporal dimension but also local and global dependencies in the spatial dimension. To address this problem, we propose a novel deep-learning-based model, Global Spatial-Temporal Graph Convolutional Network (GSTGCN), for urban traffic speed prediction. The model consists of three spatial-temporal components with the same structure and an external component. The three spatial-temporal components are used to model the recent, daily-periodic, and weekly-periodic spatial-temporal correlations of the traffic data, respectively. More specifically, each spatial-temporal component consists of a dynamic temporal module and a global correlated spatial module. The former contains multiple residual blocks which are stacked by dilated casual convolutions, while the latter contains a localized graph convolution and a global correlated mechanism. The external component is used to extract the effect of external factors, such as holidays and weather conditions, on the traffic speed. Experimental results on two real-world traffic datasets have demonstrated that the proposed GSTGCN outperforms the state-of-the-art baselines.


Sign in / Sign up

Export Citation Format

Share Document