Automatic Carotid Artery Detection Using Attention Layer Region-Based Convolution Neural Network

Localization of vessel Region of Interest (ROI) from medical images provides an interactive approach that can assist doctors in evaluating carotid artery diseases. Accurate vessel detection is a prerequisite for the following procedures, like wall segmentation, plaque identification and 3D reconstruction. Deep learning models such as CNN have been widely used in medical image processing, and achieve state-of-the-art performance. Faster R-CNN is one of the most representative and successful methods for object detection. Using outputs of feature maps in different layers has been proved to be a useful way to improve the detection performance, however, the common method is to ensemble outputs of different layers directly, and the special characteristic and different importance of each layer haven’t been considered. In this work, we introduce a new network named Attention Layer R-CNN(AL R-CNN) and use it for automatic carotid artery detection, in which we integrate a new module named Attention Layer Part (ALP) into a basic Faster R-CNN system for better assembling feature maps of different layers. Experimental results on carotid dataset show that our method surpasses other state-of-the-art object detection systems.

Download Full-text

Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation

Applied Sciences ◽

10.3390/app9030565 ◽

2019 ◽

Vol 9 (3) ◽

pp. 565 ◽

Cited By ~ 6

Author(s):

Hao Qu ◽

Lilian Zhang ◽

Xuesong Wu ◽

Xiaofeng He ◽

Xiaoping Hu ◽

...

Keyword(s):

Object Detection ◽

Data Augmentation ◽

Region Of Interest ◽

Complex Environments ◽

Feature Maps ◽

Multi Scale ◽

Level Data ◽

Training Stage ◽

Street Scene ◽

Layer Region

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance.

Download Full-text

Improve YOLOv3 using dilated spatial pyramid module for multi-scale object detection

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420936062 ◽

2020 ◽

Vol 17 (4) ◽

pp. 172988142093606

Author(s):

Xiaoguo Zhang ◽

Ye Gao ◽

Huiqing Wang ◽

Qing Wang

Keyword(s):

Object Detection ◽

State Of The Art ◽

Receptive Fields ◽

Mean Average Precision ◽

Feature Maps ◽

Average Precision ◽

Multi Scale ◽

Art Object ◽

Spatial Pyramid ◽

Scale Variation

Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.

Download Full-text

Improving Object Detection Quality by Incorporating Global Contexts via Self-Attention

Electronics ◽

10.3390/electronics10010090 ◽

2021 ◽

Vol 10 (1) ◽

pp. 90

Author(s):

Donghyeon Lee ◽

Joonyoung Kim ◽

Kyomin Jung

Keyword(s):

Object Detection ◽

State Of The Art ◽

Semantic Segmentation ◽

Feature Maps ◽

Art Object ◽

Local Contexts ◽

Feature Extractor ◽

Specific Object ◽

Detection Quality

Fully convolutional structures provide feature maps acquiring local contexts of an image by only stacking numerous convolutional layers. These structures are known to be effective in modern state-of-the-art object detectors such as Faster R-CNN and SSD to find objects from local contexts. However, the quality of object detectors can be further improved by incorporating global contexts when some ambiguous objects should be identified by surrounding objects or background. In this paper, we introduce a self-attention module for object detectors to incorporate global contexts. More specifically, our self-attention module allows the feature extractor to compute feature maps with global contexts by the self-attention mechanism. Our self-attention module computes relationships among all elements in the feature maps, and then blends the feature maps considering the computed relationships. Therefore, this module can capture long-range relationships among objects or backgrounds, which is difficult for fully convolutional structures. Furthermore, our proposed module is not limited to any specific object detectors, and it can be applied to any CNN-based model for any computer vision task. In the experimental results on the object detection task, our method shows remarkable gains in average precision (AP) compared to popular models that have fully convolutional structures. In particular, compared to Faster R-CNN with the ResNet-50 backbone, our module applied to the same backbone achieved +4.0 AP gains without the bells and whistles. In image semantic segmentation and panoptic segmentation tasks, our module improved the performance in all metrics used for each task.

Download Full-text

Deep-Learning-Based Road Crack Detection Frameworks for Dashcam-captured Images under Different Illumination Conditions

10.21203/rs.3.rs-685762/v1 ◽

2021 ◽

Author(s):

Da-Ren Chen ◽

Wei-Min Chiu

Keyword(s):

Object Detection ◽

Large Scale ◽

Crack Detection ◽

State Of The Art ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Machine Learning Techniques ◽

Detection Accuracy ◽

The Road ◽

Art Object

Abstract Machine learning techniques have been used to increase detection accuracy of cracks in road surfaces. Most studies failed to consider variable illumination conditions on the target of interest (ToI), and only focus on detecting the presence or absence of road cracks. This paper proposes a new road crack detection method, IlumiCrack, which integrates Gaussian mixture models (GMM) and object detection CNN models. This work provides the following contributions: 1) For the first time, a large-scale road crack image dataset with a range of illumination conditions (e.g., day and night) is prepared using a dashcam. 2) Based on GMM, experimental evaluations on 2 to 4 levels of brightness are conducted for optimal classification. 3) the IlumiCrack framework is used to integrate state-of-the-art object detecting methods with CNN to classify the road crack images into eight types with high accuracy. Experimental results show that IlumiCrack outperforms the state-of-the-art R-CNN object detection frameworks.

Download Full-text

PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6933 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12460-12467

Author(s):

Liang Xie ◽

Chao Xiang ◽

Zhengxu Yu ◽

Guodong Xu ◽

Zheng Yang ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Point Clouds ◽

Semantic Features ◽

Feature Maps ◽

3D Object ◽

Detection Algorithms ◽

Full Resolution ◽

Fusion Methods ◽

3D Object Detection

LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird's Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.

Download Full-text

An Evaluation of Deep Learning Methods for Small Object Detection

Journal of Electrical and Computer Engineering ◽

10.1155/2020/3189691 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

Nhat-Duy Nguyen ◽

Tien Do ◽

Thanh Duc Ngo ◽

Duy-Dinh Le

Keyword(s):

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Rapid Development ◽

Empirical Evaluation ◽

Grid Cell ◽

Small Object ◽

Feature Maps ◽

Comparative Results ◽

Small Object Detection

Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.

Download Full-text

TasselNetV2+: A Fast Implementation for High-Throughput Plant Counting From High-Resolution RGB Imagery

Frontiers in Plant Science ◽

10.3389/fpls.2020.541960 ◽

2020 ◽

Vol 11 ◽

Author(s):

Hao Lu ◽

Zhiguo Cao

Keyword(s):

High Resolution ◽

Object Detection ◽

High Throughput ◽

Graphics Processing Units ◽

State Of The Art ◽

Image Resolution ◽

Plant Phenotyping ◽

Art Object ◽

Bounding Boxes ◽

Computational Bottleneck

Plant counting runs through almost every stage of agricultural production from seed breeding, germination, cultivation, fertilization, pollination to yield estimation, and harvesting. With the prevalence of digital cameras, graphics processing units and deep learning-based computer vision technology, plant counting has gradually shifted from traditional manual observation to vision-based automated solutions. One of popular solutions is a state-of-the-art object detection technique called Faster R-CNN where plant counts can be estimated from the number of bounding boxes detected. It has become a standard configuration for many plant counting systems in plant phenotyping. Faster R-CNN, however, is expensive in computation, particularly when dealing with high-resolution images. Unfortunately high-resolution imagery is frequently used in modern plant phenotyping platforms such as unmanned aerial vehicles, engendering inefficient image analysis. Such inefficiency largely limits the throughput of a phenotyping system. The goal of this work hence is to provide an effective and efficient tool for high-throughput plant counting from high-resolution RGB imagery. In contrast to conventional object detection, we encourage another promising paradigm termed object counting where plant counts are directly regressed from images, without detecting bounding boxes. In this work, by profiling the computational bottleneck, we implement a fast version of a state-of-the-art plant counting model TasselNetV2 with several minor yet effective modifications. We also provide insights why these modifications make sense. This fast version, TasselNetV2+, runs an order of magnitude faster than TasselNetV2, achieving around 30 fps on image resolution of 1980 × 1080, while it still retains the same level of counting accuracy. We validate its effectiveness on three plant counting tasks, including wheat ears counting, maize tassels counting, and sorghum heads counting. To encourage the use of this tool, our implementation has been made available online at https://tinyurl.com/TasselNetV2plus.

Download Full-text

A Multibranch Object Detection Method for Traffic Scenes

Computational Intelligence and Neuroscience ◽

10.1155/2019/3679203 ◽

2019 ◽

Vol 2019 ◽

pp. 1-16

Author(s):

Jiangfan Feng ◽

Fanjie Wang ◽

Siqin Feng ◽

Yongrong Peng

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Detection Method ◽

State Of The Art ◽

Recall Rate ◽

Small Scale ◽

Feature Maps ◽

Time Requirements ◽

Speed Up

The performance of convolutional neural network- (CNN-) based object detection has achieved incredible success. Howbeit, existing CNN-based algorithms suffer from a problem that small-scale objects are difficult to detect because it may have lost its response when the feature map has reached a certain depth, and it is common that the scale of objects (such as cars, buses, and pedestrians) contained in traffic images and videos varies greatly. In this paper, we present a 32-layer multibranch convolutional neural network named MBNet for fast detecting objects in traffic scenes. Our model utilizes three detection branches, in which feature maps with a size of 16 × 16, 32 × 32, and 64 × 64 are used, respectively, to optimize the detection for large-, medium-, and small-scale objects. By means of a multitask loss function, our model can be trained end-to-end. The experimental results show that our model achieves state-of-the-art performance in terms of precision and recall rate, and the detection speed (up to 33 fps) is fast, which can meet the real-time requirements of industry.

Download Full-text

CBNet: A Novel Composite Backbone Network Architecture for Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6834 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11653-11660 ◽

Cited By ~ 3

Author(s):

Yudong Liu ◽

Yongtao Wang ◽

Siwei Wang ◽

Tingting Liang ◽

Qijie Zhao ◽

...

Keyword(s):

Object Detection ◽

Network Architecture ◽

State Of The Art ◽

Feature Maps ◽

Single Model ◽

Backbone Network ◽

Novel Strategy ◽

Composite Connections ◽

High Level ◽

Instance Segmentation

In existing CNN based detectors, the backbone network is a very important component for basic feature1 extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing ones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 points. Moreover, experimental results show that the instance segmentation results can be improved as well. Specifically, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with a single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available at https://github.com/PKUbahuangliuhe/CBNet.

Download Full-text

I Am Guessing You Can't Recognize This: Generating Adversarial Images for Object Detection Using Spatial Commonsense (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7166 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13789-13790 ◽

Cited By ~ 1

Author(s):

Anurag Garg ◽

Niket Tandon ◽

Aparna S. Varde

Keyword(s):

Object Detection ◽

Domain Adaptation ◽

State Of The Art ◽

High Accuracy ◽

Target Domain ◽

Commonsense Knowledge ◽

Detection Model ◽

Art Object ◽

Smart Mobility ◽

Research Questions

Can we automatically predict failures of an object detection model on images from a target domain? We characterize errors of a state-of-the-art object detection model on the currently popular smart mobility domain, and find that a large number of errors can be identified using spatial commonsense. We propose øurmodel , a system that automatically identifies a large number of such errors based on commonsense knowledge. Our system does not require any new annotations and can still find object detection errors with high accuracy (more than 80% when measured by humans). This work lays the foundation to answer exciting research questions on domain adaptation including the ability to automatically create adversarial datasets for target domain.

Download Full-text