discriminative parts
Recently Published Documents


TOTAL DOCUMENTS

19
(FIVE YEARS 8)

H-INDEX

5
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Hyun-Tae Choi ◽  
Nahyun Lee ◽  
Jewon No ◽  
Sangil Han ◽  
Jaeho Tak ◽  
...  

Humans can recognize objects well even if they only show the shape of objects or an object is composed of several components. But, most of the classifiers in the deep learning framework are trained through original images without removing complex elements inside the object. And also, they do not remove things other than the object to be classified. So the classifiers are not as effective as the human classification of objects because they are trained with the original image which has many objects that the classifier does not want to classify. In this respect, we found out which pre-processing can improve the performance of the classifier the most by comparing the results of using data through other pre-processing. In this paper, we try to limit the amount of information in the object to a minimum. To restrict the information, we use anisotropic diffusion and isotropic diffusion, which are used for removing the noise in the images. By using the anisotropic diffusion and the isotropic diffusion for the pre-processing, only shapes of objects were passed to the classifier. With these diffusion processes, we can get similar classification accuracy compared to when using the original image, and we found out that although the original images are diffused too much, the classifier can classify the objects centered on discriminative parts of the objects.


Author(s):  
Yaohui Zhu ◽  
Chenlong Liu ◽  
Shuqiang Jiang

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works of few-shot learning mainly work on general object images. And current solutions usually learn a global image representation from training tasks to adapt novel tasks. However, fine-gained categories are distinguished by subtle and local parts, which could not be captured by global representations effectively. This may hinder existing few-shot learning approaches from dealing with fine-gained categories well. In this work, we propose a multi-attention meta-learning (MattML) method for few-shot fine-grained image recognition (FSFGIR). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images. The base learner is equipped with two convolutional block attention modules (CBAM) and a classifier. The two CBAM can learn diverse and informative parts. And the initial weights of classifier are attended by the task learner, which gives the classifier a task-related sensitive initialization. For adaptation, the gradient-based meta-learning approach is employed by updating the parameters of two CBAM and the attended classifier, which facilitates the updated base learner to adaptively focus on discriminative parts. We experimentally analyze the different components of our method, and experimental results on four benchmark datasets demonstrate the effectiveness and superiority of our method.


2020 ◽  
Vol 12 (4) ◽  
pp. 681
Author(s):  
Yunsheng Xiong ◽  
Xin Niu ◽  
Yong Dou ◽  
Hang Qie ◽  
Kang Wang

Aircraft recognition has great application value, but aircraft in remote sensing images have some problems such as low resolution, poor contrasts, poor sharpness, and lack of details caused by the vertical view, which make the aircraft recognition very difficult. Especially when there are many kinds of aircraft and the differences between aircraft are subtle, the fine-grained recognition of aircraft is more challenging. In this paper, we propose a non-locally enhanced feature fusion network(NLFFNet) and attempt to make full use of the features from discriminative parts of aircraft. First, according to the long-distance self-correlation in aircraft images, we adopt non-locally enhanced operation and guide the network to pay more attention to the discriminating areas and enhance the features beneficial to classification. Second, we propose a part-level feature fusion mechanism(PFF), which crops 5 parts of the aircraft on the shared feature maps, then extracts the subtle features inside the parts through the part full connection layer(PFC) and fuses the features of these parts together through the combined full connection layer(CFC). In addition, by adopting the improved loss function, we can enhance the weight of hard examples in the loss function meanwhile reducing the weight of excessively hard examples, which improves the overall recognition ability of the network. The dataset includes 47 categories of aircraft, including many aircraft of the same family with slight differences in appearance, and our method can achieve 89.12% accuracy on the test dataset, which proves the effectiveness of our method.


Author(s):  
C. Chahla ◽  
H. Snoussi ◽  
F. Abdallah ◽  
F. Dornaika

Person re-identification is one of the indispensable elements for visual surveillance. It assigns consistent labeling for the same person within the field of view of the same camera or even across multiple cameras. While handcrafted feature extraction is certainly one way of approaching this problem, in many cases, these features are becoming more and more complex. Besides, training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. This paper explores the following three main strategies for solving the person re-identification problem: (i) using handcrafted features, (ii) using transfer learning based on a pre-trained deep CNN (trained for object categorization) and (iii) training a deep CNN from scratch. Our experiments consistently demonstrated that: (1) The handcrafted features may still have favorable characteristics and benefits especially in cases where the learning database is not sufficient to train a deep network. (2) A fully trained Siamese CNN outperforms handcrafted approaches and the combination of pre-trained CNN with different re-identification processes. (3) Moreover, our experiments demonstrated that pre-trained features and handcrafted features perform equally well. These experiments have also revealed the most discriminative parts in the human body.


Author(s):  
Tianxiang Pan ◽  
Bin Wang ◽  
Guiguang Ding ◽  
Jungong Han ◽  
Junhai Yong

Weakly supervised object detection (WSOD) has been widely studied but the accuracy of state-of-art methods remains far lower than strongly supervised methods. One major reason for this huge gap is the incomplete box detection problem which arises because most previous WSOD models are structured on classification networks and therefore tend to recognize the most discriminative parts instead of complete bounding boxes. To solve this problem, we define a low-shot weakly supervised object detection task and propose a novel low-shot box correction network to address it. The proposed task enables to train object detectors on a large data set all of which have image-level annotations, but only a small portion or few shots have box annotations. Given the low-shot box annotations, we use a novel box correction network to transfer the incomplete boxes into complete ones. Extensive empirical evidence shows that our proposed method yields state-of-art detection accuracy under various settings on the PASCAL VOC benchmark.


Author(s):  
Yang Fu ◽  
Xiaoyang Wang ◽  
Yunchao Wei ◽  
Thomas Huang

In this work, we propose a novel Spatial-Temporal Attention (STA) approach to tackle the large-scale person reidentification task in videos. Different from the most existing methods, which simply compute representations of video clips using frame-level aggregation (e.g. average pooling), the proposed STA adopts a more effective way for producing robust clip-level feature representation. Concretely, our STA fully exploits those discriminative parts of one target person in both spatial and temporal dimensions, which results in a 2-D attention score matrix via inter-frame regularization to measure the importances of spatial parts across different frames. Thus, a more robust clip-level feature representation can be generated according to a weighted sum operation guided by the mined 2-D attention score matrix. In this way, the challenging cases for video-based person re-identification such as pose variation and partial occlusion can be well tackled by the STA. We conduct extensive experiments on two large-scale benchmarks, i.e. MARS and DukeMTMCVideoReID. In particular, the mAP reaches 87.7% on MARS, which significantly outperforms the state-of-the-arts with a large margin of more than 11.6%.


2019 ◽  
Vol 11 (5) ◽  
pp. 544 ◽  
Author(s):  
Kun Fu ◽  
Wei Dai ◽  
Yue Zhang ◽  
Zhirui Wang ◽  
Menglong Yan ◽  
...  

Aircraft recognition in remote sensing images has long been a meaningful topic. Most related methods treat entire images as a whole and do not concentrate on the features of parts. In fact, a variety of aircraft types have small interclass variance, and the main evidence for classifying subcategories is related to some discriminative object parts. In this paper, we introduce the idea of fine-grained visual classification (FGVC) and attempt to make full use of the features from discriminative object parts. First, multiple class activation mapping (MultiCAM) is proposed to extract the discriminative parts of aircrafts of different categories. Second, we present a mask filter (MF) strategy to enhance the discriminative object parts and filter the interference of the background from original images. Third, a selective connected feature fusion method is proposed to fuse the features extracted from both networks, focusing on the original images and the results of MF, respectively. Compared with the single prediction category in class activation mapping (CAM), MultiCAM makes full use of the predictions of all categories to overcome the wrong discriminative parts produced by a wrong single prediction category. Additionally, the designed MF preserves the object scale information and helps the network to concentrate on the object itself rather than the interfering background. Experiments on a challenging dataset prove that our method can achieve state-of-the-art performance.


2018 ◽  
Vol 15 (1) ◽  
pp. 41-54
Author(s):  
Mohsen Biglari ◽  
Ali Soleimani ◽  
Hamid Hassanpour ◽  
◽  
◽  
...  

Sign in / Sign up

Export Citation Format

Share Document