stream architecture
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 11)

H-INDEX

5
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
pp. 23
Author(s):  
Ozgun Akcay ◽  
Ahmet Cumhur Kinaci ◽  
Emin Ozgur Avsar ◽  
Umut Aydar

In geospatial applications such as urban planning and land use management, automatic detection and classification of earth objects are essential and primary subjects. When the significant semantic segmentation algorithms are considered, DeepLabV3+ stands out as a state-of-the-art CNN. Although the DeepLabV3+ model is capable of extracting multi-scale contextual information, there is still a need for multi-stream architectural approaches and different training approaches of the model that can leverage multi-modal geographic datasets. In this study, a new end-to-end dual-stream architecture that considers geospatial imagery was developed based on the DeepLabV3+ architecture. As a result, the spectral datasets other than RGB provided increments in semantic segmentation accuracies when they were used as additional channels to height information. Furthermore, both the given data augmentation and Tversky loss function which is sensitive to imbalanced data accomplished better overall accuracies. Also, it has been shown that the new dual-stream architecture using Potsdam and Vaihingen datasets produced 88.87% and 87.39% overall semantic segmentation accuracies, respectively. Eventually, it was seen that enhancement of the traditional significant semantic segmentation networks has a great potential to provide higher model performances, whereas the contribution of geospatial data as the second stream to RGB to segmentation was explicitly shown.


Author(s):  
Sarah Almeida Carneiro ◽  
Silvio Jamil Ferzoli Guimarães ◽  
Hélio Pedrini

The need for assertive video classification has been increasingly in demand. Especially for detecting endangering situations, it is crucial to have a quick response to avoid triggering more serious problems. During this work, we target video classification concerning falls. Our study focuses on the use of high-level descriptors able to correctly characterize the event. These descriptor results will serve as inputs to a multi-stream architecture of VGG-16 networks. Therefore, our proposal is based on the analysis of the best combination of high-level extracted features for the binary classification of videos. This approach was tested on three known datasets, and has proven to yield similar results as other more consuming methods found in the literature.


Author(s):  
Guoyong Cai ◽  
Yumeng Cai

Short videos action recognition based on deep learning has made a series of important progress; most of the proposed methods are based on 3D Convolution neural networks (3D CNN) and Two Stream architecture. However, 3D CNN has a large number of parameters and Two Stream networks cannot learn features well enough. This work aims to build a network to learn better features and reduce the scale of parameters. A Hierarchy Spatial-Temporal Transformer model is proposed, which is based on Two Stream architecture and hierarchy inference. The model is divided into three modules: Hierarchy Residual Reformer, Spatial Attention Module, and Temporal-Spatial Attention Module. In the model, each frame’s image is firstly transformed into a spatial visual feature map. Secondly, spatial feature learning is performed by spatial attention to generating attention spatial feature maps. Finally, the generated attention spatial feature map is incorporated with temporal feature vectors to generate a final representation for classification experiments. Experiment results in the hmdb51 and ucf101 data set showed that the proposed model achieved better accuracy than the state-of-art baseline models


2020 ◽  
Vol 34 (07) ◽  
pp. 11865-11873 ◽  
Author(s):  
Yongri Piao ◽  
Zhengkun Rong ◽  
Miao Zhang ◽  
Huchuan Lu

Light field saliency detection is becoming of increasing interest in recent years due to the significant improvements in challenging scenes by using abundant light field cues. However, high dimension of light field data poses computation-intensive and memory-intensive challenges, and light field data access is far less ubiquitous as RGB data. These may severely impede practical applications of light field saliency detection. In this paper, we introduce an asymmetrical two-stream architecture inspired by knowledge distillation to confront these challenges. First, we design a teacher network to learn to exploit focal slices for higher requirements on desktop computers and meanwhile transfer comprehensive focusness knowledge to the student network. Our teacher network is achieved relying on two tailor-made modules, namely multi-focusness recruiting module (MFRM) and multi-focusness screening module (MFSM), respectively. Second, we propose two distillation schemes to train a student network towards memory and computation efficiency while ensuring the performance. The proposed distillation schemes ensure better absorption of focusness knowledge and enable the student to replace the focal slices with a single RGB image in an user-friendly way. We conduct the experiments on three benchmark datasets and demonstrate that our teacher network achieves state-of-the-arts performance and student network (ResNet18) achieves Top-1 accuracies on HFUT-LFSD dataset and Top-4 on DUT-LFSD, which tremendously minimizes the model size by 56% and boosts the Frame Per Second (FPS) by 159%, compared with the best performing method.


Author(s):  
Miao Zhang ◽  
Sun Xiao Fei ◽  
Jie Liu ◽  
Shuang Xu ◽  
Yongri Piao ◽  
...  

Author(s):  
Piyumal RANAWAKA ◽  
Mongkol EKPANYAPONG ◽  
Adriano TAVARES ◽  
Mathew DAILEY ◽  
Krit ATHIKULWONGSE ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document