scholarly journals An End-to-End Trainable Multi-Column CNN for Scene Recognition in Extremely Changing Environment

Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1556 ◽  
Author(s):  
Zhenyu Li ◽  
Aiguo Zhou ◽  
Yong Shen

Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.

Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1234
Author(s):  
Lei Zha ◽  
Yu Yang ◽  
Zicheng Lai ◽  
Ziwei Zhang ◽  
Juan Wen

In recent years, neural networks for single image super-resolution (SISR) have applied more profound and deeper network structures to extract extra image details, which brings difficulties in model training. To deal with deep model training problems, researchers utilize dense skip connections to promote the model’s feature representation ability by reusing deep features of different receptive fields. Benefiting from the dense connection block, SRDensenet has achieved excellent performance in SISR. Despite the fact that the dense connected structure can provide rich information, it will also introduce redundant and useless information. To tackle this problem, in this paper, we propose a Lightweight Dense Connected Approach with Attention for Single Image Super-Resolution (LDCASR), which employs the attention mechanism to extract useful information in channel dimension. Particularly, we propose the recursive dense group (RDG), consisting of Dense Attention Blocks (DABs), which can obtain more significant representations by extracting deep features with the aid of both dense connections and the attention module, making our whole network attach importance to learning more advanced feature information. Additionally, we introduce the group convolution in DABs, which can reduce the number of parameters to 0.6 M. Extensive experiments on benchmark datasets demonstrate the superiority of our proposed method over five chosen SISR methods.


Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3669 ◽  
Author(s):  
Rui Sun ◽  
Qiheng Huang ◽  
Miaomiao Xia ◽  
Jun Zhang

Video-based person re-identification is an important task with the challenges of lighting variation, low-resolution images, background clutter, occlusion, and human appearance similarity in the multi-camera visual sensor networks. In this paper, we propose a video-based person re-identification method called the end-to-end learning architecture with hybrid deep appearance-temporal feature. It can learn the appearance features of pivotal frames, the temporal features, and the independent distance metric of different features. This architecture consists of two-stream deep feature structure and two Siamese networks. For the first-stream structure, we propose the Two-branch Appearance Feature (TAF) sub-structure to obtain the appearance information of persons, and used one of the two Siamese networks to learn the similarity of appearance features of a pairwise person. To utilize the temporal information, we designed the second-stream structure that consisting of the Optical flow Temporal Feature (OTF) sub-structure and another Siamese network, to learn the person’s temporal features and the distances of pairwise features. In addition, we select the pivotal frames of video as inputs to the Inception-V3 network on the Two-branch Appearance Feature sub-structure, and employ the salience-learning fusion layer to fuse the learned global and local appearance features. Extensive experimental results on the PRID2011, iLIDS-VID, and Motion Analysis and Re-identification Set (MARS) datasets showed that the respective proposed architectures reached 79%, 59% and 72% at Rank-1 and had advantages over state-of-the-art algorithms. Meanwhile, it also improved the feature representation ability of persons.


2021 ◽  
pp. 454-470
Author(s):  
Dat Nguyen Van ◽  
Son Nguyen Trung ◽  
Anh Pham Thi Hong ◽  
Thao Thu Hoang ◽  
Ta Minh Thanh

2019 ◽  
Vol 11 (19) ◽  
pp. 2220 ◽  
Author(s):  
Ximin Cui ◽  
Ke Zheng ◽  
Lianru Gao ◽  
Bing Zhang ◽  
Dong Yang ◽  
...  

Jointly using spatial and spectral information has been widely applied to hyperspectral image (HSI) classification. Especially, convolutional neural networks (CNN) have gained attention in recent years due to their detailed representation of features. However, most of CNN-based HSI classification methods mainly use patches as input classifier. This limits the range of use for spatial neighbor information and reduces processing efficiency in training and testing. To overcome this problem, we propose an image-based classification framework that is efficient and straightforward. Based on this framework, we propose a multiscale spatial-spectral CNN for HSIs (HyMSCN) to integrate both multiple receptive fields fused features and multiscale spatial features at different levels. The fused features are exploited using a lightweight block called the multiple receptive field feature block (MRFF), which contains various types of dilation convolution. By fusing multiple receptive field features and multiscale spatial features, the HyMSCN has comprehensive feature representation for classification. Experimental results from three real hyperspectral images prove the efficiency of the proposed framework. The proposed method also achieves superior performance for HSI classification.


2020 ◽  
Vol 29 (07n08) ◽  
pp. 2040005
Author(s):  
Zhen Li ◽  
Dan Qu ◽  
Yanxia Li ◽  
Chaojie Xie ◽  
Qi Chen

Deep learning technology promotes the development of neural network machine translation (NMT). End-to-End (E2E) has become the mainstream in NMT. It uses word vectors as the initial value of the input layer. The effect of word vector model directly affects the accuracy of E2E-NMT. Researchers have proposed many approaches to learn word representations and have achieved significant results. However, the drawbacks of these methods still limit the performance of E2E-NMT systems. This paper focuses on the word embedding technology and proposes the PW-CBOW word vector model which can present better semantic information. We apply these word vector models on IWSLT14 German-English, WMT14 English-German, WMT14 English-French corporas. The results evaluate the performance of the PW-CBOW model. In the latest E2E-NMT systems, the PW-CBOW word vector model can improve the performance.


Author(s):  
A. Kasthuri ◽  
A. Suruliandi ◽  
S. P. Raja

Face annotation, a modern research topic in the area of image processing, has useful real-life applications. It is a really difficult task to annotate the correct names of people to the corresponding faces because of the variations in facial appearance. Hence, there still is a need for a robust feature to improve the performance of the face annotation process. In this work, a novel approach called the Deep Gabor-Oriented Local Order Features (DGOLOF) for feature representation has been proposed, which extracts deep texture features from face images. Seven recently proposed face annotation methods are considered to evaluate the proposed deep texture feature under uncontrolled situations like occlusion, expression changes, illumination and pose variations. Experimental results on the LFW, IMFDB, Yahoo and PubFig databases show that the proposed deep texture feature provides efficient results with the Name Semantic Network (NSN)-based face annotation. Moreover, it is observed that the proposed deep texture feature improves the performance of face annotation, regardless of all the challenges involved.


2017 ◽  
Vol 37 (4-5) ◽  
pp. 492-512 ◽  
Author(s):  
Julie Dequaire ◽  
Peter Ondrúška ◽  
Dushyant Rao ◽  
Dominic Wang ◽  
Ingmar Posner

This paper presents a novel approach for tracking static and dynamic objects for an autonomous vehicle operating in complex urban environments. Whereas traditional approaches for tracking often feature numerous hand-engineered stages, this method is learned end-to-end and can directly predict a fully unoccluded occupancy grid from raw laser input. We employ a recurrent neural network to capture the state and evolution of the environment, and train the model in an entirely unsupervised manner. In doing so, our use case compares to model-free, multi-object tracking although we do not explicitly perform the underlying data-association process. Further, we demonstrate that the underlying representation learned for the tracking task can be leveraged via inductive transfer to train an object detector in a data efficient manner. We motivate a number of architectural features and show the positive contribution of dilated convolutions, dynamic and static memory units to the task of tracking and classifying complex dynamic scenes through full occlusion. Our experimental results illustrate the ability of the model to track cars, buses, pedestrians, and cyclists from both moving and stationary platforms. Further, we compare and contrast the approach with a more traditional model-free multi-object tracking pipeline, demonstrating that it can more accurately predict future states of objects from current inputs.


Sign in / Sign up

Export Citation Format

Share Document