scholarly journals Effects of Spatial Transformer Location on Segmentation Performance of a Dense Transformer Network

Author(s):  
David Abou-Chacra ◽  
John Zelek

Semantic segmentation solves the task of labelling every pixel inan image with its class label, and remains an important unsolvedproblem. While significant work has gone into using deep learningto solve this problem, almost all the existing research uses methodsthat do not make modifications on spatial context considered for thepixel being labelled. Spatial information is an important cue in taskssuch as segmentation, reusing the same spatial span for every pixeland every label may not be the best approach. Spatial TransformerNetworks have shown promising results in improving classificationperformance of existing networks by allowing networks to activelymanipulate their input data to achieve better performance. Our workshows the benefit of incorporating Spatial Transformer Networksand their corresponding decoders into networks tailored to semanticsegmentation. Our experiments show an improvement in performanceover baseline networks when using networks augmentedwith Spatial Transformers.

2020 ◽  
Vol 9 (4) ◽  
pp. 256 ◽  
Author(s):  
Liguo Weng ◽  
Yiming Xu ◽  
Min Xia ◽  
Yonghong Zhang ◽  
Jia Liu ◽  
...  

Changes on lakes and rivers are of great significance for the study of global climate change. Accurate segmentation of lakes and rivers is critical to the study of their changes. However, traditional water area segmentation methods almost all share the following deficiencies: high computational requirements, poor generalization performance, and low extraction accuracy. In recent years, semantic segmentation algorithms based on deep learning have been emerging. Addressing problems associated to a very large number of parameters, low accuracy, and network degradation during training process, this paper proposes a separable residual SegNet (SR-SegNet) to perform the water area segmentation using remote sensing images. On the one hand, without compromising the ability of feature extraction, the problem of network degradation is alleviated by adding modified residual blocks into the encoder, the number of parameters is limited by introducing depthwise separable convolutions, and the ability of feature extraction is improved by using dilated convolutions to expand the receptive field. On the other hand, SR-SegNet removes the convolution layers with relatively more convolution kernels in the encoding stage, and uses the cascading method to fuse the low-level and high-level features of the image. As a result, the whole network can obtain more spatial information. Experimental results show that the proposed method exhibits significant improvements over several traditional methods, including FCN, DeconvNet, and SegNet.


2020 ◽  
Vol 13 (1) ◽  
pp. 71
Author(s):  
Zhiyong Xu ◽  
Weicun Zhang ◽  
Tianxiang Zhang ◽  
Jiangyun Li

Semantic segmentation is a significant method in remote sensing image (RSIs) processing and has been widely used in various applications. Conventional convolutional neural network (CNN)-based semantic segmentation methods are likely to lose the spatial information in the feature extraction stage and usually pay little attention to global context information. Moreover, the imbalance of category scale and uncertain boundary information meanwhile exists in RSIs, which also brings a challenging problem to the semantic segmentation task. To overcome these problems, a high-resolution context extraction network (HRCNet) based on a high-resolution network (HRNet) is proposed in this paper. In this approach, the HRNet structure is adopted to keep the spatial information. Moreover, the light-weight dual attention (LDA) module is designed to obtain global context information in the feature extraction stage and the feature enhancement feature pyramid (FEFP) structure is promoted and employed to fuse the contextual information of different scales. In addition, to achieve the boundary information, we design the boundary aware (BA) module combined with the boundary aware loss (BAloss) function. The experimental results evaluated on Potsdam and Vaihingen datasets show that the proposed approach can significantly improve the boundary and segmentation performance up to 92.0% and 92.3% on overall accuracy scores, respectively. As a consequence, it is envisaged that the proposed HRCNet model will be an advantage in remote sensing images segmentation.


2021 ◽  
Vol 10 (3) ◽  
pp. 125
Author(s):  
Junqing Huang ◽  
Liguo Weng ◽  
Bingyu Chen ◽  
Min Xia

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.


Author(s):  
J. Kang ◽  
I. Lee

Sophisticated indoor design and growing development in urban architecture make indoor spaces more complex. And the indoor spaces are easily connected to public transportations such as subway and train stations. These phenomena allow to transfer outdoor activities to the indoor spaces. Constant development of technology has a significant impact on people knowledge about services such as location awareness services in the indoor spaces. Thus, it is required to develop the low-cost system to create the 3D model of the indoor spaces for services based on the indoor models. In this paper, we thus introduce the rotating stereo frame camera system that has two cameras and generate the indoor 3D model using the system. First, select a test site and acquired images eight times during one day with different positions and heights of the system. Measurements were complemented by object control points obtained from a total station. As the data were obtained from the different positions and heights of the system, it was possible to make various combinations of data and choose several suitable combinations for input data. Next, we generated the 3D model of the test site using commercial software with previously chosen input data. The last part of the processes will be to evaluate the accuracy of the generated indoor model from selected input data. In summary, this paper introduces the low-cost system to acquire indoor spatial data and generate the 3D model using images acquired by the system. Through this experiments, we ensure that the introduced system is suitable for generating indoor spatial information. The proposed low-cost system will be applied to indoor services based on the indoor spatial information.


2020 ◽  
Vol 10 (3) ◽  
pp. 820 ◽  
Author(s):  
Marcela Bindzárová Gergeľová ◽  
Žofia Kuzevičová ◽  
Slavomír Labant ◽  
Juraj Gašinec ◽  
Štefan Kuzevič ◽  
...  

Weather-related disasters represent a major threat to the sustainable development of society. This study focuses directly on the assessment of the state of spatial information quality for the needs of hydrodynamic modeling. Based on the selected procedures and methods designed for the collection and processing of spatial information, the aim of this study was to assess their qualitative level of suitability for 3D flood event modeling in accordance with the Infrastructure for Spatial Information in the European Community (INSPIRE) Directive. In the evaluation process we entered geodetic measurements and the digital relief model 3.5 (DMR 3.5) available for the territory of the Slovak Republic. The result of this study is an assessment of the qualitative analysis on three levels: (i) main channel and surrounding topography data from geodetic measurements; (ii) digital relief model; and (iii) hydrodynamic/hydraulic modeling. The qualitative aspect of the input data shows the sensitivity of a given model to changes in the input data quality condition. The average spatial error in the determination of a point’s position was calculated as 0.017 m of all measured points along a watercourse and its slope foot and slope edge. Although the declared accuracy of DMR 3.5 is assumed to be ±2.50 m, in some of the sections in the selected area there were differences in elevation up to 4.79 m. For this reason, we needed a combination of DMR 3.5 and geodetic measurements to refine the input model for the process of hydrodynamic modeling. The quality of the hydrological data for the monitored N annual flow levels was of fourth-class reliability for the selected area.


2020 ◽  
Vol 9 (10) ◽  
pp. 571
Author(s):  
Jinglun Li ◽  
Jiapeng Xiu ◽  
Zhengqiu Yang ◽  
Chen Liu

Semantic segmentation plays an important role in being able to understand the content of remote sensing images. In recent years, deep learning methods based on Fully Convolutional Networks (FCNs) have proved to be effective for the sematic segmentation of remote sensing images. However, the rich information and complex content makes the training of networks for segmentation challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple modular structure and can be added to any segmentation model to enhance its ability to learn features. Two types of attention module are appended to the segmentation model, one focusing on spatial information the other focusing upon the channel. Then, the outputs of these two attention modules are fused to further improve the network’s ability to extract features, thus contributing to more precise segmentation results. Finally, data pre-processing and augmentation strategies are used to compensate for the small number of datasets and uneven distribution. The proposed network was tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net, PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


2020 ◽  
Vol 9 (3) ◽  
pp. 147 ◽  
Author(s):  
Xi Kuai ◽  
Renzhong Guo ◽  
Zhijun Zhang ◽  
Biao He ◽  
Zhigang Zhao ◽  
...  

Georeferencing by place names (known as toponyms) is the most common way of associating textual information with geographic locations. While computers use numeric coordinates (such as longitude-latitude pairs) to represent places, people generally refer to places via their toponyms. Query by toponym is an effective way to find information about a geographic area. However, segmenting and parsing textual addresses to extract local toponyms is a difficult task in the geocoding field, especially in China. In this paper, a local spatial context-based framework is proposed to extract local toponyms and segment Chinese textual addresses. We collect urban points of interest (POIs) as an input data source; in this dataset, the textual address and geospatial position coordinates correspond at a one-to-one basis and can be easily used to explore the spatial distribution of local toponyms. The proposed framework involves two steps: address element identification and local toponym extraction. The first step identifies as many address element candidates as possible from a continuous string of textual addresses for each urban POI. The second step focuses on merging neighboring candidate pairs into local toponyms. A series of experiments are conducted to determine the thresholds for local toponym extraction based on precision-recall curves. Finally, we evaluate our framework by comparing its performance with three well-known Chinese word segmentation models. The comparative experimental results demonstrate that our framework achieves a better performance than do other models.


2001 ◽  
Vol 356 (1413) ◽  
pp. 1493-1503 ◽  
Author(s):  
Neil Burgess ◽  
Suzanna Becker ◽  
John A. King ◽  
John O'Keefe

The computational role of the hippocampus in memory has been characterized as: (i) an index to disparate neocortical storage sites; (ii) a time–limited store supporting neocortical long–term memory; and (iii) a content–addressable associative memory. These ideas are reviewed and related to several general aspects of episodic memory, including the differences between episodic, recognition and semantic memory, and whether hippocampal lesions differentially affect recent or remote memories. Some outstanding questions remain, such as: what characterizes episodic retrieval as opposed to other forms of read–out from memory; what triggers the storage of an event memory; and what are the neural mechanisms involved? To address these questions a neural–level model of the medial temporal and parietal roles in retrieval of the spatial context of an event is presented. This model combines the idea that retrieval of the rich context of real–life events is a central characteristic of episodic memory, and the idea that medial temporal allocentric representations are used in long–term storage while parietal egocentric representations are used to imagine, manipulate and re–experience the products of retrieval. The model is consistent with the known neural representation of spatial information in the brain, and provides an explanation for the involvement of Papez's circuit in both the representation of heading direction and in the recollection of episodic information. Two experiments relating to the model are briefly described. A functional neuroimaging study of memory for the spatial context of life–like events in virtual reality provides support for the model's functional localization. A neuropsychological experiment suggests that the hippocampus does store an allocentric representation of spatial locations.


Sign in / Sign up

Export Citation Format

Share Document