scholarly journals Feature fusion network based on strip pooling

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gaihua Wang ◽  
Qianyu Zhai

AbstractContextual information is a key factor affecting semantic segmentation. Recently, many methods have tried to use the self-attention mechanism to capture more contextual information. However, these methods with self-attention mechanism need a huge computation. In order to solve this problem, a novel self-attention network, called FFANet, is designed to efficiently capture contextual information, which reduces the amount of calculation through strip pooling and linear layers. It proposes the feature fusion (FF) module to calculate the affinity matrix. The affinity matrix can capture the relationship between pixels. Then we multiply the affinity matrix with the feature map, which can selectively increase the weight of the region of interest. Extensive experiments on the public datasets (PASCAL VOC2012, CityScapes) and remote sensing dataset (DLRSD) have been conducted and achieved Mean Iou score 74.5%, 70.3%, and 63.9% respectively. Compared with the current typical algorithms, the proposed method has achieved excellent performance.

Author(s):  
Taye Girma Debelee ◽  
Abrham Gebreselasie ◽  
Friedhelm Schwenker ◽  
Mohammadreza Amirian ◽  
Dereje Yohannes

In this paper, a modified adaptive K-means (MAKM) method is proposed to extract the region of interest (ROI) from the local and public datasets. The local image datasets are collected from Bethezata General Hospital (BGH) and the public datasets are from Mammographic Image Analysis Society (MIAS). The same image number is used for both datasets, 112 are abnormal and 208 are normal. Two texture features (GLCM and Gabor) from ROIs and one CNN based extracted features are considered in the experiment. CNN features are extracted using Inception-V3 pre-trained model after simple preprocessing and cropping. The quality of the features are evaluated individually and by fusing features to one another and five classifiers (SVM, KNN, MLP, RF, and NB) are used to measure the descriptive power of the features using cross-validation. The proposed approach was first evaluated on the local dataset and then applied to the public dataset. The results of the classifiers are measured using accuracy, sensitivity, specificity, kappa, computation time and AUC. The experimental analysis made using GLCM features from the two datasets indicates that GLCM features from BGH dataset outperformed that of MIAS dataset in all five classifiers. However, Gabor features from the two datasets scored the best result with two classifiers (SVM and MLP). For BGH and MIAS, SVM scored an accuracy of 99%, 97.46%, the sensitivity of 99.48%, 96.26% and specificity of 98.16%, 100% respectively. And MLP achieved an accuracy of 97%, 87.64%, the sensitivity of 97.40%, 96.65% and specificity of 96.26%, 75.73% respectively. Relatively maximum performance is achieved for feature fusion between Gabor and CNN based extracted features using MLP classifier. However, KNN, MLP, RF, and NB classifiers achieved almost 100% performance for GLCM texture features and SVM scored an accuracy of 96.88%, the sensitivity of 97.14% and specificity of 96.36%. As compared to other classifiers, NB has scored the least computation time in all experiments.


2020 ◽  
Vol 133 ◽  
pp. 327-333 ◽  
Author(s):  
Heng Zhou ◽  
Zhijun Fang ◽  
Yongbin Gao ◽  
Bo Huang ◽  
Cengsi Zhong ◽  
...  

2021 ◽  
Vol 11 (8) ◽  
pp. 2231-2242
Author(s):  
Fei Gao ◽  
Kai Qiao ◽  
Jinjin Hai ◽  
Bin Yan ◽  
Minghui Wu ◽  
...  

The goal of this research is to achieve accurate segmentation of liver tumors in noncontrast T2-weighted magnetic resonance imaging. As liver tumors and adjacent organs are represented by pixels of very similar gray intensity, segmentation is challenging, and the presence of different sizes of liver tumor makes segmentation more difficult. Differing from previous work to capture contextual information using multiscale feature fusion with concatenation, attention mechanism is added to our segmentation model to extract precise global contextual information for pixel labeling without requiring complex dilated convolution. This study describe a liver lesion segmentation model derived from FC-DenseNet with attention mechanism. Specifically, a global attention module (GAM) is added to up-sampling path, and high-level features are processed by the GAM to generating weighting information for guiding high resolution detail features recovery. High-level features are very effective for accurate category classification, but relatively weak at pixel classification and predicting restoration of the original resolution, so the fusion of high-level semantic features and low-level detail features can improve segmentation accuracy. A weighted focal loss function is used to solve the problem of lesion area occupying a relatively low proportion of the whole image, and to deal with the disequilibrium of foreground and background in the training liver lesion images. Experimental results show our segmentation model can automatically segment liver tumors from complete MRI images, and the addition of the GAM model can effectively improve liver tumor segmentation. Our algorithms have obvious advantages over other CNN algorithms and traditional manual methods of feature extraction.


2021 ◽  
Vol 11 (3) ◽  
pp. 1096
Author(s):  
Qing Li ◽  
Yingcheng Lin ◽  
Wei He

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.


2021 ◽  
Author(s):  
Wei Bai

Abstract Image semantic segmentation is one of the core tasks of computer vision. It is widely used in fields such as unmanned driving, medical image processing, geographic information systems and intelligent robots. Aiming at the problem that the existing semantic segmentation algorithm ignores the different channel and location features of the feature map and the simple method when the feature map is fused, this paper designs a semantic segmentation algorithm that combines the attention mechanism. Firstly, dilated convolution is used, and a smaller downsampling factor is used to maintain the resolution of the image and obtain the detailed information of the image. Secondly, the attention mechanism module is introduced to assign weights to different parts of the feature map, which reduces the accuracy loss. The design feature fusion module assigns weights to the feature maps of different receptive fields obtained by the two paths, and merges them together to obtain the final segmentation result. Finally, through experiments, it was verified on the Camvid, Cityscapes and PASCAL VOC2012 datasets. Mean intersection over union (MIoU) and mean pixel accuracy (MPA) are used as metrics. The method in this paper can make up for the loss of accuracy caused by downsampling while ensuring the receptive field and improving the resolution, which can better guide the model learning. And the proposed feature fusion module can better integrate the features of different receptive fields. Therefore, the proposed method can significantly improve the segmentation performance compared to the traditional method.


2021 ◽  
Vol 13 (18) ◽  
pp. 3651
Author(s):  
Weiqi Wang ◽  
Xiong You ◽  
Xin Zhang ◽  
Lingyu Chen ◽  
Lantian Zhang ◽  
...  

Facing the realistic demands of the application environment of robots, the application of simultaneous localisation and mapping (SLAM) has gradually moved from static environments to complex dynamic environments, while traditional SLAM methods usually result in pose estimation deviations caused by errors in data association due to the interference of dynamic elements in the environment. This problem is effectively solved in the present study by proposing a SLAM approach based on light detection and ranging (LiDAR) under semantic constraints in dynamic environments. Four main modules are used for the projection of point cloud data, semantic segmentation, dynamic element screening, and semantic map construction. A LiDAR point cloud semantic segmentation network SANet based on a spatial attention mechanism is proposed, which significantly improves the real-time performance and accuracy of point cloud semantic segmentation. A dynamic element selection algorithm is designed and used with prior knowledge to significantly reduce the pose estimation deviations caused by SLAM dynamic elements. The results of experiments conducted on the public datasets SemanticKITTI, KITTI, and SemanticPOSS show that the accuracy and robustness of the proposed approach are significantly improved.


2020 ◽  
Vol 34 (07) ◽  
pp. 12168-12175 ◽  
Author(s):  
Jingwen Wang ◽  
Lin Ma ◽  
Wenhao Jiang

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets.


Author(s):  
Dongsei Kim ◽  

This paper examines what the public, architects, urban designers, and city officials can learn about significant public spaces from emergent technologies and data generated from growing social media. Interrogating this analytical method aids us to recognize social media’s potentials, such as gaining a deeper understanding of the relationship between how public spaces are “represented” and how they are “physically experienced” through the means of technology. This investigation combines emerging image recognizing algorithms— Semantic Segmentation—with location-tagged images from Instagram to investigate the newly opened Seoullo 7017 walkway in Seoul. It argues that we should recognize these newly generated “big data” as a form of “collective intelligence” that can stimulate proactive engagement with our everyday interactions with public space. Equally, the findings of this investigation reveal to our society how to cautiously engage these “collective intelligence” with counterbalancing values.


2021 ◽  
Author(s):  
Xiaoxiao Hu ◽  
Yaoyao Gu ◽  
Kai Chen ◽  
Haipeng Dai ◽  
Jie Huang ◽  
...  

Abstract Background: Aldolase B (ALDOB) is a member of the aldolase family, which is the fourth enzyme in glycolysis process. In recent years, the non-enzymatic effects of some glycolytic enzymes have been reported to promote the formation of several human tumors, but the non-enzymatic action of ALDOB in neuroblastoma(NB) remains unclear. This study aims to explore the non-enzymatic effect of ALDOB in neuroblastoma.Methods: We used immunohistochemistry to examine 63 patients tissue microarray samples and 3 pairs of lymph node metastases and the primary tissue samples, and evaluated the relationship between ALDOB expression level and clinical characteristics. We then analyzed the public datasets of NB based on microarray to verify the immunohistochemistry results. In addition, we conducted in vitro experiments on SK-N-BE(2) and SH-SY5Y cell lines to explore the molecular mechanism.Results: Immunohistochemistry indicated ALDOB is significantly associated with INSS stage and tumor metastasis in NB, public dataset analysis showed ALDOB is related to NB patient survival remarkably. In vitro experiments displayed silencing ALDOB may inhibit the cell migration by epithelial-mesenchymal transition (EMT) pathway.Conclusions: Our finding demonstrated that ALDOB can affect the metastasis of NB by EMT pathway and may be a potential target for neuroblastoma therapy in the future.


2021 ◽  
Vol 13 (16) ◽  
pp. 3149
Author(s):  
Xiaochen Wei ◽  
Xikai Fu ◽  
Ye Yun ◽  
Xiaolei Lv

Road detection from images has emerged as an important way to obtain road information, thereby gaining much attention in recent years. However, most existing methods only focus on extracting road information from single temporal intensity images, which may cause a decrease in image resolution due to the use of spatial filter methods to avoid coherent speckle noises. Some newly developed methods take into account the multi-temporal information in the preprocessing stage to filter the coherent speckle noise in the SAR imagery. They ignore the temporal characteristic of road objects such as the temporal consistency for the road objects in the multitemporal SAR images that cover the same area and are taken at adjacent times, causing the limitation in detection performance. In this paper, we propose a multiscale and multitemporal network (MSMTHRNet) for road detection from SAR imagery, which contains the temporal consistency enhancement module (TCEM) and multiscale fusion module (MSFM) that are based on attention mechanism. In particular, we propose the TCEM to make full use of multitemporal information, which contains temporal attention submodule that applies attention mechanism to capture temporal contextual information. We enforce temporal consistency constraint by the TCEM to obtain the enhanced feature representations of SAR imagery that help to distinguish the real roads. Since the width of roads are various, incorporating multiscale features is a promising way to improve the results of road detection. We propose the MSFM that applies learned weights to combine predictions of different scale features. Since there is no public dataset, we build a multitemporal road detection dataset to evaluate our methods. State-of-the-art semantic segmentation network HRNetV2 is used as a baseline method to compare with MSHRNet that only has MSFM and the MSMTHRNet. The MSHRNet(TAF) whose input is the SAR image after the temporal filter is adopted to compare with our proposed MSMTHRNet. On our test dataset, MSHRNet and MSMTHRNet improve over the HRNetV2 by 2.1% and 14.19%, respectively, in the IoU metric and by 3.25% and 17.08%, respectively, in the APLS metric. MSMTHRNet improves over the MSMTHRNet(TAF) by 8.23% and 8.81% in the IoU metric and APLS metric, respectively.


Sign in / Sign up

Export Citation Format

Share Document