scholarly journals Generalized Zero-Shot Vehicle Detection in Remote Sensing Imagery via Coarse-to-Fine Framework

Author(s):  
Hong Chen ◽  
Yongtan Luo ◽  
Liujuan Cao ◽  
Baochang Zhang ◽  
Guodong Guo ◽  
...  

Vehicle detection and recognition in remote sensing images are challenging, especially when only limited training data are available to accommodate various target categories. In this paper, we introduce a novel coarse-to-fine framework, which decomposes vehicle detection into segmentation-based vehicle localization and generalized zero-shot vehicle classification. Particularly, the proposed framework can well handle the problem of generalized zero-shot vehicle detection, which is challenging due to the requirement of recognizing vehicles that are even unseen during training. Specifically, a hierarchical DeepLab v3 model is proposed in the framework, which fully exploits fine-grained features to locate the target on a pixel-wise level, then recognizes vehicles in a coarse-grained manner. Additionally, the hierarchical DeepLab v3 model is beneficially compatible to combine the generalized zero-shot recognition. To the best of our knowledge, there is no publically available dataset to test comparative methods, we therefore construct a new dataset to fill this gap of evaluation. The experimental results show that the proposed framework yields promising results on the imperative yet difficult task of zero-shot vehicle detection and recognition.

2021 ◽  
Vol 9 ◽  
pp. 929-944
Author(s):  
Omar Khattab ◽  
Christopher Potts ◽  
Matei Zaharia

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.


2019 ◽  
Vol 8 (9) ◽  
pp. 390 ◽  
Author(s):  
Kun Zheng ◽  
Mengfei Wei ◽  
Guangmin Sun ◽  
Bilal Anas ◽  
Yu Li

Vehicle detection based on very high-resolution (VHR) remote sensing images is beneficial in many fields such as military surveillance, traffic control, and social/economic studies. However, intricate details about the vehicle and the surrounding background provided by VHR images require sophisticated analysis based on massive data samples, though the number of reliable labeled training data is limited. In practice, data augmentation is often leveraged to solve this conflict. The traditional data augmentation strategy uses a combination of rotation, scaling, and flipping transformations, etc., and has limited capabilities in capturing the essence of feature distribution and proving data diversity. In this study, we propose a learning method named Vehicle Synthesis Generative Adversarial Networks (VS-GANs) to generate annotated vehicles from remote sensing images. The proposed framework has one generator and two discriminators, which try to synthesize realistic vehicles and learn the background context simultaneously. The method can quickly generate high-quality annotated vehicle data samples and greatly helps in the training of vehicle detectors. Experimental results show that the proposed framework can synthesize vehicles and their background images with variations and different levels of details. Compared with traditional data augmentation methods, the proposed method significantly improves the generalization capability of vehicle detectors. Finally, the contribution of VS-GANs to vehicle detection in VHR remote sensing images was proved in experiments conducted on UCAS-AOD and NWPU VHR-10 datasets using up-to-date target detection frameworks.


Author(s):  
N. Mo ◽  
L. Yan

Abstract. Vehicles usually lack detailed information and are difficult to be trained on the high-resolution remote sensing images because of small size. In addition, vehicles contain multiple fine-grained categories that are slightly different, randomly located and oriented. Therefore, it is difficult to locate and identify these fine categories of vehicles. Considering the above problems in high-resolution remote sensing images, this paper proposes an oriented vehicle detection approach. First of all, we propose an oversampling and stitching method to augment the training dataset by increasing the frequency of objects with fewer training samples in order to balance the number of objects in each fine-grained vehicle category. Then considering the effect of the pooling operations on representing small objects, we propose to improve the resolution of feature maps so that detailed information hidden in feature maps can be enriched and they can better distinguish the fine-grained vehicle categories. Finally, we design a joint training loss function for horizontal and oriented bounding boxes with center loss, to decrease the impact of small between-class diversity on vehicle detection. Experimental verification is performed on the VEDAI dataset consisting of 9 fine-grained vehicle categories so as to evaluate the proposed framework. The experimental results show that the proposed framework performs better than most of competitive approaches in terms of a mean average precision of 60.7% and 60.4% in detecting horizontal and oriented bounding boxes respectively.


Author(s):  
Zheng Li ◽  
Ying Wei ◽  
Yu Zhang ◽  
Xiang Zhang ◽  
Xin Li

Aspect-level sentiment classification (ASC) aims at identifying sentiment polarities towards aspects in a sentence, where the aspect can behave as a general Aspect Category (AC) or a specific Aspect Term (AT). However, due to the especially expensive and labor-intensive labeling, existing public corpora in AT-level are all relatively small. Meanwhile, most of the previous methods rely on complicated structures with given scarce data, which largely limits the efficacy of the neural models. In this paper, we exploit a new direction named coarse-to-fine task transfer, which aims to leverage knowledge learned from a rich-resource source domain of the coarse-grained AC task, which is more easily accessible, to improve the learning in a low-resource target domain of the fine-grained AT task. To resolve both the aspect granularity inconsistency and feature mismatch between domains, we propose a Multi-Granularity Alignment Network (MGAN). In MGAN, a novel Coarse2Fine attention guided by an auxiliary task can help the AC task modeling at the same finegrained level with the AT task. To alleviate the feature false alignment, a contrastive feature alignment method is adopted to align aspect-specific feature representations semantically. In addition, a large-scale multi-domain dataset for the AC task is provided. Empirically, extensive experiments demonstrate the effectiveness of the MGAN.


2019 ◽  
Vol 2019 ◽  
pp. 1-40
Author(s):  
Ngoc Q. Ly ◽  
Tuong K. Do ◽  
Binh X. Nguyen

Object retrieval plays an increasingly important role in video surveillance, digital marketing, e-commerce, etc. It is facing challenges such as large-scale datasets, imbalanced data, viewpoint, cluster background, and fine-grained details (attributes). This paper has proposed a model to integrate object ontology, a local multitask deep neural network (local MDNN), and an imbalanced data solver to take advantages and overcome the shortcomings of deep learning network models to improve the performance of the large-scale object retrieval system from the coarse-grained level (categories) to the fine-grained level (attributes). Our proposed coarse-to-fine object retrieval (CFOR) system can be robust and resistant to the challenges listed above. To the best of our knowledge, the new main point of our CFOR system is the power of mutual support of object ontology, a local MDNN, and an imbalanced data solver in a unified system. Object ontology supports the exploitation of the inner-group correlations to improve the system performance in category classification, attribute classification, and conducting training flow and retrieval flow to save computational costs in the training stage and retrieval stage on large-scale datasets, respectively. A local MDNN supports linking object ontology to the raw data, and an imbalanced data solver based on Matthews’ correlation coefficient (MCC) addresses that the imbalance of data has contributed effectively to increasing the quality of object ontology realization without adjusting network architecture and data augmentation. In order to evaluate the performance of the CFOR system, we experimented on the DeepFashion dataset. This paper has shown that our local MDNN framework based on the pretrained NASNet architecture has achieved better performance (14.2% higher in recall rate) compared to single-task learning (STL) in the attribute learning task; it has also shown that our model with an imbalanced data solver has achieved better performance (5.14% higher in recall rate for fewer data attributes) compared to models that do not take this into account. Moreover, MAP@30 hovers 0.815 in retrieval on an average of 35 imbalanced fashion attributes.


Symmetry ◽  
2018 ◽  
Vol 10 (11) ◽  
pp. 626 ◽  
Author(s):  
Zhibin Guan ◽  
Kang Liu ◽  
Yan Ma ◽  
Xu Qian ◽  
Tongkai Ji

Image caption generation is a fundamental task to build a bridge between image and its description in text, which is drawing increasing interest in artificial intelligence. Images and textual sentences are viewed as two different carriers of information, which are symmetric and unified in the same content of visual scene. The existing image captioning methods rarely consider generating a final description sentence in a coarse-grained to fine-grained way, which is how humans understand the surrounding scenes; and the generated sentence sometimes only describes coarse-grained image content. Therefore, we propose a coarse-to-fine-grained hierarchical generation method for image captioning, named SDA-CFGHG, to address the two problems above. The core of our SDA-CFGHG method is a sequential dual attention that is used to fuse different grained visual information with sequential means. The advantage of our SDA-CFGHG method is that it can achieve image captioning in a coarse-to-fine-grained way and the generated textual sentence can capture details of the raw image to some degree. Moreover, we validate the impressive performance of our method on benchmark datasets—MS COCO, Flickr—with several popular evaluation metrics—CIDEr, SPICE, METEOR, ROUGE-L, and BLEU.


2020 ◽  
Vol 9 (6) ◽  
pp. 354
Author(s):  
Naoko Nitta ◽  
Kazuaki Nakamura ◽  
Noboru Babaguchi

While visual appearances play a main role in recognizing the concepts captured in images, additional information can provide complementary information for fine-grained image recognition, where concepts with similar visual appearances such as species of birds need to be distinguished. Especially for recognizing geospatial concepts, which are observed only at specific places, geographical locations of the images can improve the recognition accuracy. However, such geo-aware fine-grained image recognition requires prior information about the visual and geospatial features of each concept or the training data composed of high-quality images for each concept associated with correct geographical locations. By using a large number of images photographed in various places and described with textual tags which can be collected from image sharing services such as Flickr, this paper proposes a method for constructing a geospatial concept graph which contains the necessary prior information for realizing the geo-aware fine-grained image recognition, such as a set of visually recognizable fine-grained geospatial concepts, their visual and geospatial features, and the coarse-grained representative visual concepts whose visual features can be transferred to several fine-grained geospatial concepts. Leveraging the information from the images captured by many people can automatically extract diverse types of geospatial concepts with proper features for realizing efficient and effective geo-aware fine-grained image recognition.


Author(s):  
Yang Zhao ◽  
Jiajun Zhang ◽  
Chengqing Zong ◽  
Zhongjun He ◽  
Hua Wu

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance in recent years. However, the under-translation problem still remains a big challenge. In this paper, we focus on the under-translation problem and attempt to find out what kinds of source words are more likely to be ignored. Through analysis, we observe that a source word with a large translation entropy is more inclined to be dropped. To address this problem, we propose a coarse-to-fine framework. In coarse-grained phase, we introduce a simple strategy to reduce the entropy of highentropy words through constructing the pseudo target sentences. In fine-grained phase, we propose three methods, including pre-training method, multitask method and two-pass method, to encourage the neural model to correctly translate these high-entropy words. Experimental results on various translation tasks show that our method can significantly improve the translation quality and substantially reduce the under-translation cases of high-entropy words.


2021 ◽  
Vol 13 (24) ◽  
pp. 5050
Author(s):  
Sheng He ◽  
Ruqin Zhou ◽  
Shenhong Li ◽  
San Jiang ◽  
Wanshou Jiang

As an essential task in remote sensing, disparity estimation of high-resolution stereo images is still confronted with intractable problems due to extremely complex scenes and dynamically changing disparities. Especially in areas containing texture-less regions, repetitive patterns, disparity discontinuities, and occlusions, stereo matching is difficult. Recently, convolutional neural networks have provided a new paradigm for disparity estimation, but it is difficult for current models to consider both accuracy and speed. This paper proposes a novel end-to-end network to overcome the aforementioned obstacles. The proposed network learns stereo matching at dual scales, in which the low one captures coarse-grained information while the high one captures fine-grained information, helpful for matching structures of different scales. Moreiver, we construct cost volumes from negative to positive values to make the network work well for both negative and nonnegative disparities since the disparity varies dramatically in remote sensing stereo images. A 3D encoder-decoder module formed by factorized 3D convolutions is introduced to adaptively learn cost aggregation, which is of high efficiency and able to alleviate the edge-fattening issue at disparity discontinuities and approximate the matching of occlusions. Besides, we use a refinement module that brings in shallow features as guidance to attain high-quality full-resolution disparity maps. The proposed network is compared with several typical models. Experimental results on a challenging dataset demonstrate that our network shows powerful learning and generalization abilities. It achieves convincing performance on both accuracy and efficiency, and improvements of stereo matching in these challenging areas are noteworthy.


Minerals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1222
Author(s):  
Mauro Ciarniello ◽  
Lyuba V. Moroz ◽  
Olivier Poch ◽  
Vassilissa Vinogradoff ◽  
Pierre Beck ◽  
...  

Visual-to-infrared (VIS-IR) remote sensing observations of different classes of outer solar system objects indicate the presence of water ice and organics. Here, we present laboratory reflectance spectra in the 0.5–4.2 μm spectral range of binary particulate mixtures of water ice, organics analogue (kerite), and an opaque iron sulphide phase (pyrrhotite) to investigate the spectral effects of varying mixing ratios, endmember grain size, and mixing modality. The laboratory spectra are also compared to different implementations of the Hapke reflectance model (Hapke, 2012). We find that minor amounts (≲1 wt%) of kerite (investigated grain sizes of 45–63 μm and <25 μm) can remain undetected when mixed in coarse-grained (67 ± 31 μm) water ice, suggesting that organics similar to meteoritic insoluble organic matter (IOM) might be characterized by larger detectability thresholds. Additionally, our measurements indicate that the VIS absolute reflectance of water ice-containing mixtures is not necessarily monotonically linked to water ice abundance. The latter is better constrained by spectral indicators such as the band depths of water ice VIS-IR diagnostic absorptions and spectral slopes. Simulation of laboratory spectra of intimate mixtures with a semi-empirical formulation of the Hapke model suggests that simplistic assumptions on the endmember grain size distribution and shape may lead to estimated mixing ratios considerably offset from the nominal values. Finally, laboratory spectra of water ice grains with fine-grained pyrrhotite inclusions (intraparticle mixture) have been positively compared with a modified version of the Hapke model from Lucey and Riner (2011).


Sign in / Sign up

Export Citation Format

Share Document