scholarly journals Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

Author(s):  
Tao Hu ◽  
Pengwan Yang ◽  
Chiliang Zhang ◽  
Gang Yu ◽  
Yadong Mu ◽  
...  

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremendous amounts of data. The scarcity of annotated data becomes even more challenging in semantic segmentation since pixellevel annotation in segmentation task is more labor-intensive to acquire. To tackle this issue, we propose an Attentionbased Multi-Context Guiding (A-MCG) network, which consists of three branches: the support branch, the query branch, the feature fusion branch. A key differentiator of A-MCG is the integration of multi-scale context features between support and query branches, enforcing a better guidance from the support set. In addition, we also adopt a spatial attention along the fusion branch to highlight context information from several scales, enhancing self-supervision in one-shot learning. To address the fusion problem in multi-shot learning, Conv-LSTM is adopted to collaboratively integrate the sequential support features to elevate the final accuracy. Our architecture obtains state-of-the-art on unseen classes in a variant of PASCAL VOC12 dataset and performs favorably against previous work with large gains of 1.1%, 1.4% measured in mIoU in the 1-shot and 5-shot setting.

Author(s):  
Ningyu Zhang ◽  
Xiang Chen ◽  
Xin Xie ◽  
Shumin Deng ◽  
Chuanqi Tan ◽  
...  

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.


2021 ◽  
Author(s):  
Chao Lu ◽  
Fansheng Chen ◽  
Xiaofeng Su ◽  
Dan Zeng

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.


2020 ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background: Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup.Methods: In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider networkto improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset.Results: The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7 % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79 %on the REFUGE dataset, respectively.Conclusions: The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup. Methods In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider network to improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset. Results The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7$$\%$$ % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79$$\%$$ % on the REFUGE dataset, respectively. Conclusions The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


2020 ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background: Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup. Methods: In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider network to improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset. Results: The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7 % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79 % on the REFUGE dataset, respectively. Conclusions: The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


Energies ◽  
2021 ◽  
Vol 14 (13) ◽  
pp. 3800
Author(s):  
Sebastian Krapf ◽  
Nils Kemmerzell ◽  
Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽  
Manuel Hack Hack Vázquez ◽  
Fabian Netzler ◽  
...  

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.


Author(s):  
Rohit Mohan ◽  
Abhinav Valada

AbstractUnderstanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dominik Jens Elias Waibel ◽  
Sayedali Shetab Boushehri ◽  
Carsten Marr

Abstract Background Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. Results We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. Conclusions With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.


Sign in / Sign up

Export Citation Format

Share Document