pascal voc
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 20)

H-INDEX

3
(FIVE YEARS 2)

2021 ◽  
pp. 1-10
Author(s):  
Mona M. Moussa ◽  
Rasha Shoitan ◽  
Mohamed S. Abdallah

Finding the common objects in a set of images is considered one of the recent challenges in different computer vision tasks. Most of the conventional methods have proposed unsupervised and weakly supervised co-localization methods to find the common objects; however, these methods require producing a huge amount of region proposals. This paper tackles this problem by exploiting supervised learning benefits to localize the common object in a set of unlabeled images containing multiple objects or with no common objects. Two stages are proposed to localize the common objects: the candidate box generation stage and the matching and clustering stage. In the candidate box generation stage, the objects are localized and surrounded by the bounding boxes. The matching and clustering stage is applied on the generated bounding boxes and creates a distance matrix based on a trained Siamese network to reflect the matching percentage. Hierarchical clustering uses the generated distance matrix to find the common objects and create clusters for each one. The proposed method is trained on PASCAL VOC 2007 dataset; on the other hand, it is assessed by applying different experiments on PASCAL VOC 2007 6×2 and Object Discovery datasets, respectively. The results reveal that the proposed method outperforms the conventional methods by 8% to 40% in terms of corloc metric.


Author(s):  
Xinzhe Zhou ◽  
Wenhao Jiang ◽  
Sheng Qi ◽  
Yadong Mu

Visual backdoor attack is a recently-emerging task which aims to implant trojans in a deep neural model. A trojaned model responds to a trojan-invoking trigger in a fully predictable manner while functioning normally otherwise. As a key motivating fact to this work, most triggers adopted in existing methods, such as a learned patterned block that overlays a benigh image, can be easily noticed by human. In this work, we take image recognition and detection as the demonstration tasks, building trojaned networks that are significantly less human-perceptible and can simultaneously attack multiple targets in an image. The main technical contributions are two-folds: first, under a relaxed attack mode, we formulate trigger embedding as an image steganography-and-steganalysis problem that conceals a secret image in another image in a decipherable and almost invisible way. In specific, a variable number of different triggers can be encoded into a same secret image and fed to an encoder module that does steganography. Secondly, we propose a generic split-and-merge scheme for training a trojaned model. Neurons are split into two sets, trained either for normal image recognition / detection or trojaning the model. To merge them, we novelly propose to hide trojan neurons within the nullspace of the normal ones, such that the two sets do not interfere with each other and the resultant model exhibits similar parameter statistics to a clean model. Comprehensive experiments are conducted on the datasets PASCAL VOC and Microsoft COCO (for detection) and a subset of ImageNet (for recognition). All results clearly demonstrate the effectiveness of our proposed visual trojan method.


2021 ◽  
Author(s):  
ADENISIMI DANIEL

This paper compares state-of-the-art methods in object and instance detection and examines why YOLO (You Only Look Once) outperforms top detection methods. Different Pascal VOC dataset is used as the benchmark to explore mean average precision (mAP). YOLO is twice as accurate as prior works on real-time detection. The outcome of merging YOLO with Fast R-CNN is an increased mean average precision (mAP) which results in a performance boost. Hence, YOLO is an enhanced model of top detection methods.


Author(s):  
Ruyi Ji ◽  
Zeyu Liu ◽  
Libo Zhang ◽  
Jianwei Liu ◽  
Xin Zuo ◽  
...  

Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yu-Cheng Fan ◽  
Chitra Meghala Yelamandala ◽  
Ting-Wei Chen ◽  
Chun-Ju Huang

Recently, self-driving cars became a big challenge in the automobile industry. After the DARPA challenge, which introduced the design of a self-driving system that can be classified as SAR Level 3 or higher levels, driven to focus on self-driving cars more. Later on, using these introduced design models, a lot of companies started to design self-driving cars. Various sensors, such as radar, high-resolution cameras, and LiDAR are important in self-driving cars to sense the surroundings. LiDAR acts as an eye of a self-driving vehicle, by offering 64 scanning channels, 26.9° vertical field view, and a high-precision 360° horizontal field view in real-time. The LiDAR sensor can provide 360° environmental depth information with a detection range of up to 120 meters. In addition, the left and right cameras can further assist in obtaining front image information. In this way, the surrounding environment model of the self-driving car can be accurately obtained, which is convenient for the self-driving algorithm to perform route planning. It is very important for self-driving to avoid the collision. LiDAR provides both horizontal and vertical field views and helps in avoiding collision. In an online website, the dataset provides different kinds of data like point cloud data and color images which helps this data to use for object recognition. In this paper, we used two types of publicly available datasets, namely, KITTI and PASCAL VOC. Firstly, the KITTI dataset provides in-depth data knowledge for the LiDAR segmentation (LS) of objects obtained through LiDAR point clouds. The performance of object segmentation through LiDAR cloud points is used to find the region of interest (ROI) on images. And later on, we trained the network with the PASCAL VOC dataset used for object detection by the YOLOv4 neural network. To evaluate, we used the region of interest image as input to YOLOv4. By using all these technologies, we can segment and detect objects. Our algorithm ultimately constructs a LiDAR point cloud at the same time; it also detects the image in real-time.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Parvinder Kaur ◽  
Baljit Singh Khehra ◽  
Amar Partap Singh Pharwaha

Object detection is being widely used in many fields, and therefore, the demand for more accurate and fast methods for object detection is also increasing. In this paper, we propose a method for object detection in digital images that is more accurate and faster. The proposed model is based on Single-Stage Multibox Detector (SSD) architecture. This method creates many anchor boxes of various aspect ratios based on the backbone network and multiscale feature network and calculates the classes and balances of the anchor boxes to detect objects at various scales. Instead of the VGG16-based deep transfer learning model in SSD, we have used a more efficient base network, i.e., EfficientNet. Detection of objects of different sizes is still an inspiring task. We have used Multiway Feature Pyramid Network (MFPN) to solve this problem. The input to the base network is given to MFPN, and then, the fused features are given to bounding box prediction and class prediction networks. Softer-NMS is applied instead of NMS in SSD to reduce the number of bounding boxes gently. The proposed method is validated on MSCOCO 2017, PASCAL VOC 2007, and PASCAL VOC 2012 datasets and compared to existing state-of-the-art techniques. Our method shows better detection quality in terms of mean Average Precision (mAP).


2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Jie Xu ◽  
Hanyuan Wang ◽  
Mingzhu Xu ◽  
Fan Yang ◽  
Yifei Zhou ◽  
...  

Object detection is used widely in smart cities including safety monitoring, traffic control, and car driving. However, in the smart city scenario, many objects will have occlusion problems. Moreover, most popular object detectors are often sensitive to various real-world occlusions. This paper proposes a feature-enhanced occlusion perception object detector by simultaneously detecting occluded objects and fully utilizing spatial information. To generate hard examples with occlusions, a mask generator localizes and masks discriminated regions with weakly supervised methods. To obtain enriched feature representation, we design a multiscale representation fusion module to combine hierarchical feature maps. Moreover, this method exploits contextual information by heaping up representations from different regions in feature maps. The model is trained end-to-end learning by minimizing the multitask loss. Our model obtains superior performance compared to previous object detectors, 77.4% mAP and 74.3% mAP on PASCAL VOC 2007 and PASCAL VOC 2012, respectively. It also achieves 24.6% mAP on MS COCO. Experiments demonstrate that the proposed method is useful to improve the effectiveness of object detection, making it highly suitable for smart cities application that need to discover key objects with occlusions.


2021 ◽  
pp. 1-11
Author(s):  
Di Xu ◽  
Zhili Wang

This paper proposes a better semi-supervised semantic segmentation network using an improved generative adversarial network. It is important for the discriminator on the pixel level to know whether it correctly distinguishes the predicted probability map. However, currently there is no correlation between the actual credibility and the confidence map generated by the pixel-level discriminator. We study this problem and a new network is proposed, which includes one generator and two discriminators. One of the discriminators can output more reliable confidence maps on the pixel level and the other is trained to generate the probability on the image level, which is used as the dynamic threshold in the semi-supervised module instead of being set manually. In addition, the trusted region shared by the two discriminators is used to provide the semi-supervised reference. Through experiments on the PASCAL VOC 2012 and Cityscapes datasets, the proposed network brings better gains, proving the effectiveness of the network.


2021 ◽  
Vol 13 (2) ◽  
pp. 200
Author(s):  
S. N. Shivappriya ◽  
M. Jasmine Pemeena Priyadarsini ◽  
Andrzej Stateczny ◽  
C. Puttamadappa ◽  
B. D. Parameshachari

Object detection is an important process in surveillance system to locate objects and it is considered as major application in computer vision. The Convolution Neural Network (CNN) based models have been developed by many researchers for object detection to achieve higher performance. However, existing models have some limitations such as overfitting problem and lower efficiency in small object detection. Object detection in remote sensing hasthe limitations of low efficiency in detecting small object and the existing methods have poor localization. Cascade Object Detection methods have been applied to increase the learning process of the detection model. In this research, the Additive Activation Function (AAF) is applied in a Faster Region based CNN (RCNN) for object detection. The proposed AAF-Faster RCNN method has the advantage of better convergence and clear bounding variance. The Fourier Series and Linear Combination of activation function are used to update the loss function. The Microsoft (MS) COCO datasets and Pascal VOC 2007/2012 are used to evaluate the performance of the AAF-Faster RCNN model. The proposed AAF-Faster RCNN is also analyzed for small object detection in the benchmark dataset. The analysis shows that the proposed AAF-Faster RCNN model has higher efficiency than state-of-art Pay Attention to Them (PAT) model in object detection. To evaluate the performance of AAF-Faster RCNN method of object detection in remote sensing, the NWPU VHR-10 remote sensing data set is used to test the proposed method. The AAF-Faster RCNN model has mean Average Precision (mAP) of 83.1% and existing PAT-SSD512 method has the 81.7%mAP in Pascal VOC 2007 dataset.


Sign in / Sign up

Export Citation Format

Share Document