scholarly journals An Efficient Module for Instance Segmentation Based on Multi-Level Features and Attention Mechanisms

2021 ◽  
Vol 11 (3) ◽  
pp. 968
Author(s):  
Yingchun Sun ◽  
Wang Gao ◽  
Shuguo Pan ◽  
Tao Zhao ◽  
Yahui Peng

Recently, multi-level feature networks have been extensively used in instance segmentation. However, because not all features are beneficial to instance segmentation tasks, the performance of networks cannot be adequately improved by synthesizing multi-level convolutional features indiscriminately. In order to solve the problem, an attention-based feature pyramid module (AFPM) is proposed, which integrates the attention mechanism on the basis of a multi-level feature pyramid network to efficiently and pertinently extract the high-level semantic features and low-level spatial structure features; for instance, segmentation. Firstly, we adopt a convolutional block attention module (CBAM) into feature extraction, and sequentially generate attention maps which focus on instance-related features along the channel and spatial dimensions. Secondly, we build inter-dimensional dependencies through a convolutional triplet attention module (CTAM) in lateral attention connections, which is used to propagate a helpful semantic feature map and filter redundant informative features irrelevant to instance objects. Finally, we construct branches for feature enhancement to strengthen detailed information to boost the entire feature hierarchy of the network. The experimental results on the Cityscapes dataset manifest that the proposed module outperforms other excellent methods under different evaluation metrics and effectively upgrades the performance of the instance segmentation method.

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3251
Author(s):  
Shuqin Tu ◽  
Weijun Yuan ◽  
Yun Liang ◽  
Fan Wang ◽  
Hua Wan

Instance segmentation is an accurate and reliable method to segment adhesive pigs’ images, and is critical for providing health and welfare information on individual pigs, such as body condition score, live weight, and activity behaviors in group-housed pig environments. In this paper, a PigMS R-CNN framework based on mask scoring R-CNN (MS R-CNN) is explored to segment adhesive pig areas in group-pig images, to separate the identification and location of group-housed pigs. The PigMS R-CNN consists of three processes. First, a residual network of 101-layers, combined with the feature pyramid network (FPN), is used as a feature extraction network to obtain feature maps for input images. Then, according to these feature maps, the region candidate network generates the regions of interest (RoIs). Finally, for each RoI, we can obtain the location, classification, and segmentation results of detected pigs through the regression and category, and mask three branches from the PigMS R-CNN head network. To avoid target pigs being missed and error detections in overlapping or stuck areas of group-housed pigs, the PigMS R-CNN framework uses soft non-maximum suppression (soft-NMS) by replacing the traditional NMS to conduct post-processing selected operation of pigs. The MS R-CNN framework with traditional NMS obtains results with an F1 of 0.9228. By setting the soft-NMS threshold to 0.7 on PigMS R-CNN, detection of the target pigs achieves an F1 of 0.9374. The work explores a new instance segmentation method for adhesive group-housed pig images, which provides valuable exploration for vision-based, real-time automatic pig monitoring and welfare evaluation.


2021 ◽  
Vol 13 (16) ◽  
pp. 3305
Author(s):  
Zixian Ge ◽  
Guo Cao ◽  
Hao Shi ◽  
Youqiang Zhang ◽  
Xuesong Li ◽  
...  

Recently, hyperspectral image (HSI) classification has become a popular research direction in remote sensing. The emergence of convolutional neural networks (CNNs) has greatly promoted the development of this field and demonstrated excellent classification performance. However, due to the particularity of HSIs, redundant information and limited samples pose huge challenges for extracting strong discriminative features. In addition, addressing how to fully mine the internal correlation of the data or features based on the existing model is also crucial in improving classification performance. To overcome the above limitations, this work presents a strong feature extraction neural network with an attention mechanism. Firstly, the original HSI is weighted by means of the hybrid spectral–spatial attention mechanism. Then, the data are input into a spectral feature extraction branch and a spatial feature extraction branch, composed of multiscale feature extraction modules and weak dense feature extraction modules, to extract high-level semantic features. These two features are compressed and fused using the global average pooling and concat approaches. Finally, the classification results are obtained by using two fully connected layers and one Softmax layer. A performance comparison shows the enhanced classification performance of the proposed model compared to the current state of the art on three public datasets.


Author(s):  
JULIE JUPP ◽  
JOHN S. GERO

Style is an ordering principle by which to structure artifacts in a design domain. The application of a visual order entails some explicit grouping property that is both cognitively plausible and contextually dependent. Central to cognitive–contextual notions are the type of representation used in analysis and the flexibility to allow semantic interpretation. We present a model of visual style based on the concept of similarity as a qualitative context-dependent categorization. The two core components of the model are semantic feature extraction and self-organizing maps (SOMs). The model proposes a method of categorizing two-dimensional unannotated design diagrams using both low-level geometric and high-level semantic features that are automatically derived from the pictorial content of the design. The operation of the initial model, called Q-SOM, is then extended to include relevance feedback (Q-SOM:RF). The extended model can be seen as a series of sequential processing stages, in which qualitative encoding and feature extraction are followed by iterative recategorization. Categorization is achieved using an unsupervised SOM, and contextual dependencies are integrated via cluster relevance determined by the observer's feedback. The following stages are presented: initial per feature detection and extraction, selection of feature sets corresponding to different spatial ontologies, unsupervised categorization of design diagrams based on appropriate feature subsets, and integration of design context via relevance feedback. From our experiments we compare different outcomes from consecutive stages of the model. The results show that the model provides a cognitively plausible and context-dependent method for characterizing visual style in design.


2020 ◽  
Vol 12 (3) ◽  
pp. 560
Author(s):  
Lifu Chen ◽  
Siyu Tan ◽  
Zhouhao Pan ◽  
Jin Xing ◽  
Zhihui Yuan ◽  
...  

The detection of airports from Synthetic Aperture Radar (SAR) images is of great significance in various research fields. However, it is challenging to distinguish the airport from surrounding objects in SAR images. In this paper, a new framework, multi-level and densely dual attention (MDDA) network is proposed to extract airport runway areas (runways, taxiways, and parking lots) in SAR images to achieve automatic airport detection. The framework consists of three parts: down-sampling of original SAR images, MDDA network for feature extraction and classification, and up-sampling of airports extraction results. First, down-sampling is employed to obtain a medium-resolution SAR image from the high-resolution SAR images to ensure the samples (500 × 500) can contain adequate information about airports. The dataset is then input to the MDDA network, which contains an encoder and a decoder. The encoder uses ResNet_101 to extract four-level features with different resolutions, and the decoder performs fusion and further feature extraction on these features. The decoder integrates the chained residual pooling network (CRP_Net) and the dual attention fusion and extraction (DAFE) module. The CRP_Net module mainly uses chained residual pooling and multi-feature fusion to extract advanced semantic features. In the DAFE module, position attention module (PAM) and channel attention mechanism (CAM) are combined with weighted filtering. The entire decoding network is constructed in a densely connected manner to enhance the gradient transmission among features and take full advantage of them. Finally, the airport results extracted by the decoding network were up-sampled by bilinear interpolation to accomplish airport extraction from high-resolution SAR images. To verify the proposed framework, experiments were performed using Gaofen-3 SAR images with 1 m resolution, and three different airports were selected for accuracy evaluation. The results showed that the mean pixels accuracy (MPA) and mean intersection over union (MIoU) of the MDDA network was 0.98 and 0.97, respectively, which is much higher than RefineNet and DeepLabV3. Therefore, MDDA can achieve automatic airport extraction from high-resolution SAR images with satisfying accuracy.


2021 ◽  
pp. 157-166
Author(s):  
Lei Zhao ◽  
◽  
Jia Su ◽  
Zhiping Shi ◽  
Yong Guan

This paper focuses on using traditional image processing algorithms with some apparent-to-semantic features to improve the detection accuracy. Based on the optimization of Faster R-CNN algorithm, a mainstream framework in current object detection scenario, the multi-channel features are achieved by combining traditional image semantic feature algorithms (like Integral Channel Feature (ICF), Histograms of Gradient (HOG), Local Binary Pattern (LBF), etc.) and advanced semantic feature algorithms (like segmentation, heatmap, etc.). In order to realize the joint training of the original image and the above feature extraction algorithms, a unique network for increasing the accuracy of object detection and minimizing system weight called Multi-Channel Feature Network (MCFN) is proposed. The function of MCFN is to provide a multi-channel interface, which is not limited to the RGB component of a single picture, nor to the number of input channels. The experimental result shows the relationship between the number of additional channels, performance of model and accuracy. Compared with the basic Faster R-CNN structure, this result is based on the case of two additional channels. And the universal Mean Average Precision (mAP) can be improved by 2%-3%. When the number of extra channels is increased, the accuracy will not increase linearly. In fact, system performance starts to fluctuate in a range after the number of additional channels reaches six.


2021 ◽  
pp. 235-246
Author(s):  
Fan Xu ◽  
Shuihua Sun ◽  
Shiao Xu ◽  
Zhiyuan Zhang ◽  
Kuo-Chi Chang

2019 ◽  
Vol 9 (20) ◽  
pp. 4363 ◽  
Author(s):  
Yutian Wu ◽  
Shuming Tang ◽  
Shuwei Zhang ◽  
Harutoshi Ogai

Feature Pyramid Network (FPN) builds a high-level semantic feature pyramid and detects objects of different scales in corresponding pyramid levels. Usually, features within the same pyramid levels have the same weight for subsequent object detection, which ignores the feature requirements of different scale objects. As we know, for most detection networks, it is hard to detect small objects and occluded objects because there is little information to exploit. To solve the above problems, we propose an Enhanced Feature Pyramid Object Detection Network (EFPN), which innovatively constructs an enhanced feature extraction subnet and adaptive parallel detection subnet. Enhanced feature extraction subnet introduces Feature Weight Module (FWM) to enhance pyramid features by weighting the fusion feature map. Adaptive parallel detection subnet introduces Adaptive Context Expansion (ACE) and Parallel Detection Branch (PDB). ACE aims to generate the features of adaptively enlarged object context region and original region. PDB predicts classification and regression results separately with the two features. Experiments showed that EFPN outperforms FPN in detection accuracy on Pascal VOC and KITTI datasets. Furthermore, the performance of EFPN meets the real-time requirements of autonomous driving systems.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 886 ◽  
Author(s):  
Zhixian Yang ◽  
Ruixia Dong ◽  
Hao Xu ◽  
Jinan Gu

Object-detection methods based on deep learning play an important role in achieving machine automation. In order to achieve fast and accurate autonomous detection of stacked electronic components, an instance segmentation method based on an improved Mask R-CNN algorithm was proposed. By optimizing the feature extraction network, the performance of Mask R-CNN was improved. A dataset of electronic components containing 1200 images (992 × 744 pixels) was developed, and four types of components were included. Experiments on the dataset showed the model was superior in speed while being more lightweight and more accurate. The speed of our model showed promising results, with twice that of Mask R-CNN. In addition, our model was 0.35 times the size of Mask R-CNN, and the average precision (AP) of our model was improved by about two points compared to Mask R-CNN.


Author(s):  
Silvester Tena ◽  
Rudy Hartanto ◽  
Igi Ardiyanto

In <span>recent years, a great deal of research has been conducted in the area of fabric image retrieval, especially the identification and classification of visual features. One of the challenges associated with the domain of content-based image retrieval (CBIR) is the semantic gap between low-level visual features and high-level human perceptions. Generally, CBIR includes two main components, namely feature extraction and similarity measurement. Therefore, this research aims to determine the content-based image retrieval for fabric using feature extraction techniques grouped into traditional methods and convolutional neural networks (CNN). Traditional descriptors deal with low-level features, while CNN addresses the high-level, called semantic features. Traditional descriptors have the advantage of shorter computation time and reduced system requirements. Meanwhile, CNN descriptors, which handle high-level features tailored to human perceptions, deal with large amounts of data and require a great deal of computation time. In general, the features of a CNN's fully connected layers are used for matching query and database images. In several studies, the extracted features of the CNN's convolutional layer were used for image retrieval. At the end of the CNN layer, hash codes are added to reduce  </span>search time.


2021 ◽  
Vol 13 (8) ◽  
pp. 1602
Author(s):  
Qiaoqiao Sun ◽  
Xuefeng Liu ◽  
Salah Bourennane

Deep learning models have strong abilities in learning features and they have been successfully applied in hyperspectral images (HSIs). However, the training of most deep learning models requires labeled samples and the collection of labeled samples are labor-consuming in HSI. In addition, single-level features from a single layer are usually considered, which may result in the loss of some important information. Using multiple networks to obtain multi-level features is a solution, but at the cost of longer training time and computational complexity. To solve these problems, a novel unsupervised multi-level feature extraction framework that is based on a three dimensional convolutional autoencoder (3D-CAE) is proposed in this paper. The designed 3D-CAE is stacked by fully 3D convolutional layers and 3D deconvolutional layers, which allows for the spectral-spatial information of targets to be mined simultaneously. Besides, the 3D-CAE can be trained in an unsupervised way without involving labeled samples. Moreover, the multi-level features are directly obtained from the encoded layers with different scales and resolutions, which is more efficient than using multiple networks to get them. The effectiveness of the proposed multi-level features is verified on two hyperspectral data sets. The results demonstrate that the proposed method has great promise in unsupervised feature learning and can help us to further improve the hyperspectral classification when compared with single-level features.


Sign in / Sign up

Export Citation Format

Share Document