scholarly journals Patch Proposal Network for Fast Semantic Segmentation of High-Resolution Images

2020 ◽  
Vol 34 (07) ◽  
pp. 12402-12409 ◽  
Author(s):  
Tong Wu ◽  
Zhenzhen Lei ◽  
Bingqian Lin ◽  
Cuihua Li ◽  
Yanyun Qu ◽  
...  

Despite recent progress on the segmentation of high-resolution images, there exist an unsolved problem, i.e., the trade-off among the segmentation accuracy, memory resources and inference speed. So far, GLNet is introduced for high or ultra-resolution image segmentation, which has reduced the computational memory of the segmentation network. However, it ignores the importances of different cropped patches, and treats tiled patches equally for fusion with the whole image, resulting in high computational cost. To solve this problem, we introduce a patch proposal network (PPN) in this paper, which adaptively distinguishes the critical patches from the trivial ones to fuse with the whole image for refining segmentation. PPN is a classification network which alleviates network training burden and improves segmentation accuracy. We further embed PPN in a global-local segmentation network, instructing global branch and refinement branch to work collaboratively. We implement our method on four image datasets:DeepGlobe, ISIC, CRAG and Cityscapes, the first two are ultra-resolution image datasets and the last two are high-resolution image datasets. The experimental results show that our method achieves almost the best segmentation performance compared with the state-of-the-art segmentation methods and the inference speed is 12.9 fps on DeepGlobe and 10 fps on ISIC. Moreover, we embed PPN with the general semantic segmentation network and the experimental results on Cityscapes which contains more object classes demonstrate the generalization ability on general semantic segmentation.

Author(s):  
Xiongxiong Xue ◽  
Zhenqi Han ◽  
Weiqin Tong ◽  
Mingqi Li ◽  
Lizhuang Liu

Video super-resolution, which utilizes the relevant information of several low-resolution frames to generate high-resolution images, is a challenging task. One possible solution called sliding window method tries to divide the generation of high-resolution video sequences into independent sub-tasks, and only adjacent low-resolution images are used to estimate the high-resolution version of the central low-resolution image. Another popular method named recurrent algorithm proposes to utilize not only the low-resolution images but also the generated high-resolution images of previous frames to generate the high-resolution image. However, both methods have some unavoidable disadvantages. The former one usually leads to bad temporal consistency and requires higher computational cost while the latter method always can not make full use of information contained by optical flow or any other calculated features. Thus more investigations need to be done to explore the balance between these two methods. In this work, a bidirectional frame recurrent video super-resolution method is proposed. To be specific, a reverse training is proposed that the generated high-resolution frame is also utilized to help estimate the high-resolution version of the former frame. With the contribution of reverse training and the forward training, the idea of bidirectional recurrent method not only guarantees the temporal consistency but also make full use of the adjacent information due to the bidirectional training operation while the computational cost is acceptable. Experimental results demonstrate that the bidirectional super-resolution framework gives remarkable performance that it solves the time-related problems when the generated high-resolution image is impressive compared with recurrent-based video super-resolution method.


2020 ◽  
Vol 10 (12) ◽  
pp. 4282
Author(s):  
Ghada Zamzmi ◽  
Sivaramakrishnan Rajaraman ◽  
Sameer Antani

Medical images are acquired at different resolutions based on clinical goals or available technology. In general, however, high-resolution images with fine structural details are preferred for visual task analysis. Recognizing this significance, several deep learning networks have been proposed to enhance medical images for reliable automated interpretation. These deep networks are often computationally complex and require a massive number of parameters, which restrict them to highly capable computing platforms with large memory banks. In this paper, we propose an efficient deep learning approach, called Hydra, which simultaneously reduces computational complexity and improves performance. The Hydra consists of a trunk and several computing heads. The trunk is a super-resolution model that learns the mapping from low-resolution to high-resolution images. It has a simple architecture that is trained using multiple scales at once to minimize a proposed learning-loss function. We also propose to append multiple task-specific heads to the trained Hydra trunk for simultaneous learning of multiple visual tasks in medical images. The Hydra is evaluated on publicly available chest X-ray image collections to perform image enhancement, lung segmentation, and abnormality classification. Our experimental results support our claims and demonstrate that the proposed approach can improve the performance of super-resolution and visual task analysis in medical images at a remarkably reduced computational cost.


Electronics ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1370 ◽  
Author(s):  
Tingzhu Sun ◽  
Weidong Fang ◽  
Wei Chen ◽  
Yanxin Yao ◽  
Fangming Bi ◽  
...  

Although image inpainting based on the generated adversarial network (GAN) has made great breakthroughs in accuracy and speed in recent years, they can only process low-resolution images because of memory limitations and difficulty in training. For high-resolution images, the inpainted regions become blurred and the unpleasant boundaries become visible. Based on the current advanced image generation network, we proposed a novel high-resolution image inpainting method based on multi-scale neural network. This method is a two-stage network including content reconstruction and texture detail restoration. After holding the visually believable fuzzy texture, we further restore the finer details to produce a smoother, clearer, and more coherent inpainting result. Then we propose a special application scene of image inpainting, that is, to delete the redundant pedestrians in the image and ensure the reality of background restoration. It involves pedestrian detection, identifying redundant pedestrians and filling in them with the seemingly correct content. To improve the accuracy of image inpainting in the application scene, we proposed a new mask dataset, which collected the characters in COCO dataset as a mask. Finally, we evaluated our method on COCO and VOC dataset. the experimental results show that our method can produce clearer and more coherent inpainting results, especially for high-resolution images, and the proposed mask dataset can produce better inpainting results in the special application scene.


2019 ◽  
Vol 16 (9) ◽  
pp. 870-874 ◽  
Author(s):  
David Hörl ◽  
Fabio Rojas Rusak ◽  
Friedrich Preusser ◽  
Paul Tillberg ◽  
Nadine Randel ◽  
...  

2017 ◽  
Vol 9 (5) ◽  
pp. 500 ◽  
Author(s):  
Mi Zhang ◽  
Xiangyun Hu ◽  
Like Zhao ◽  
Ye Lv ◽  
Min Luo ◽  
...  

Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 1985
Author(s):  
Qi Wang ◽  
Meihan Wu ◽  
Fei Yu ◽  
Chen Feng ◽  
Kaige Li ◽  
...  

Real-time processing of high-resolution sonar images is of great significance for the autonomy and intelligence of autonomous underwater vehicle (AUV) in complex marine environments. In this paper, we propose a real-time semantic segmentation network termed RT-Seg for Side-Scan Sonar (SSS) images. The proposed architecture is based on a novel encoder-decoder structure, in which the encoder blocks utilized Depth-Wise Separable Convolution and a 2-way branch for improving performance, and a corresponding decoder network is implemented to restore the details of the targets, followed by a pixel-wise classification layer. Moreover, we use patch-wise strategy for splitting the high-resolution image into local patches and applying them to network training. The well-trained model is used for testing high-resolution SSS images produced by sonar sensor in an onboard Graphic Processing Unit (GPU). The experimental results show that RT-Seg can greatly reduce the number of parameters and floating point operations compared to other networks. It runs at 25.67 frames per second on an NVIDIA Jetson AGX Xavier on 500*500 inputs with excellent segmentation result. Further insights on the speed and accuracy trade-off are discussed in this paper.


2014 ◽  
Vol 981 ◽  
pp. 352-355 ◽  
Author(s):  
Ji Zhou Wei ◽  
Shu Chun Yu ◽  
Wen Fei Dong ◽  
Chao Feng ◽  
Bing Xie

A stereo matching algorithm was proposed based on pyramid algorithm and dynamic programming. High and low resolution images was computed by pyramid algorithm, and then candidate control points were stroke on low-resolution image, and final control points were stroke on the high-resolution images. Finally, final control points were used in directing stereo matching based on dynamic programming. Since the striking of candidate control points on low-resolution image, the time is greatly reduced. Experiments show that the proposed method has a high matching precision.


2005 ◽  
Vol 11 (S02) ◽  
Author(s):  
G Fried ◽  
B Grosser ◽  
T Gaskins ◽  
A Klementiev

Author(s):  
Sergey Belyaev ◽  
Igor Popov ◽  
Vladislav Shubnikov ◽  
Pavel Popov ◽  
Ekaterina Boltenkova ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document