S3-Net: A Fast Scene Understanding Network by Single-Shot Segmentation for Autonomous Driving

2021 ◽  
Vol 12 (5) ◽  
pp. 1-19
Author(s):  
Yuan Cheng ◽  
Yuchao Yang ◽  
Hai-Bao Chen ◽  
Ngai Wong ◽  
Hao Yu

Real-time segmentation and understanding of driving scenes are crucial in autonomous driving. Traditional pixel-wise approaches extract scene information by segmenting all pixels in a frame, and hence are inefficient and slow. Proposal-wise approaches only learn from the proposed object candidates, but still require multiple steps on the expensive proposal methods. Instead, this work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes , and meanwhile extracts attention-aware time-series sub-scene features ( ats-features ) as inputs to an attention-aware spatio-temporal model (ASM) . Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments results on CityScapes, UCF11, HMDB51, and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9× and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

2020 ◽  
Vol 34 (07) ◽  
pp. 11982-11989
Author(s):  
Xiaodan Shi ◽  
Xiaowei Shao ◽  
Zipei Fan ◽  
Renhe Jiang ◽  
Haoran Zhang ◽  
...  

Accurate human path forecasting in complex and crowded scenarios is critical for collision avoidance of autonomous driving and social robots navigation. It still remains as a challenging problem because of dynamic human interaction and intrinsic multimodality of human motion. Given the observation, there is a rich set of plausible ways for an agent to walk through the circumstance. To address those issues, we propose a spatio-temporal model that can aggregate the information from socially interacting agents and capture the multimodality of the motion patterns. We use mixture density functions to describe the human path and predict the distribution of future paths with explicit density. To integrate more factors to model interacting people, we further introduce a coordinate transformation to represent the relative motion between people. Extensive experiments over several trajectory prediction benchmarks demonstrate that our method is able to forecast various plausible futures in complex scenarios and achieves state-of-the-art performance.


2020 ◽  
Vol 2020 (14) ◽  
pp. 306-1-306-6
Author(s):  
Florian Schiffers ◽  
Lionel Fiske ◽  
Pablo Ruiz ◽  
Aggelos K. Katsaggelos ◽  
Oliver Cossairt

Imaging through scattering media finds applications in diverse fields from biomedicine to autonomous driving. However, interpreting the resulting images is difficult due to blur caused by the scattering of photons within the medium. Transient information, captured with fast temporal sensors, can be used to significantly improve the quality of images acquired in scattering conditions. Photon scattering, within a highly scattering media, is well modeled by the diffusion approximation of the Radiative Transport Equation (RTE). Its solution is easily derived which can be interpreted as a Spatio-Temporal Point Spread Function (STPSF). In this paper, we first discuss the properties of the ST-PSF and subsequently use this knowledge to simulate transient imaging through highly scattering media. We then propose a framework to invert the forward model, which assumes Poisson noise, to recover a noise-free, unblurred image by solving an optimization problem.


Author(s):  
Álvaro Briz-Redón ◽  
Adina Iftimi ◽  
Juan Francisco Correcher ◽  
Jose De Andrés ◽  
Manuel Lozano ◽  
...  

2016 ◽  
Vol 12 (6) ◽  
pp. e1004969 ◽  
Author(s):  
Zhihui Wang ◽  
Romica Kerketta ◽  
Yao-Li Chuang ◽  
Prashant Dogra ◽  
Joseph D. Butner ◽  
...  

Author(s):  
Yinong Zhang ◽  
Shanshan Guan ◽  
Cheng Xu ◽  
Hongzhe Liu

In the era of intelligent education, human behavior recognition based on computer vision is an important branch of pattern recognition. Human behavior recognition is a basic technology in the fields of intelligent monitoring and human-computer interaction in education. The dynamic changes of human skeleton provide important information for the recognition of educational behavior. Traditional methods usually use manual information to label or traverse rules only, resulting in limited representation capabilities and poor generalization performance of the model. In this paper, a kind of dynamic skeleton model with residual is adopted—a spatio-temporal graph convolutional network based on residual connections, which not only overcomes the limitations of previous methods, but also can learn the spatio-temporal model from the skeleton data. In the big bone NTU-RGB + D dataset, the network model not only improved the representation ability of human behavior characteristics, but also improved the generalization ability, and achieved better recognition effect than the existing model. In addition, this paper also compares the results of behavior recognition on subsets of different joint points, and finds that spatial structure division have better effects.


Sign in / Sign up

Export Citation Format

Share Document