Semantic segmentation of tea geometrid in natural scene images using discriminative pyramid network

Semantic segmentation is a fundamental task in remote sensing image analysis (RSIA). Fully convolutional networks (FCNs) have achieved state-of-the-art performance in the task of semantic segmentation of natural scene images. However, due to distinctive differences between natural scene images and remotely-sensed (RS) images, FCN-based semantic segmentation methods from the field of computer vision cannot achieve promising performances on RS images without modifications. In previous work, we proposed an RS image semantic segmentation framework SDFCNv1, combined with a majority voting postprocessing method. Nevertheless, it still has some drawbacks, such as small receptive field and large number of parameters. In this paper, we propose an improved semantic segmentation framework SDFCNv2 based on SDFCNv1, to conduct optimal semantic segmentation on RS images. We first construct a novel FCN model with hybrid basic convolutional (HBC) blocks and spatial-channel-fusion squeeze-and-excitation (SCFSE) modules, which occupies a larger receptive field and fewer network model parameters. We also put forward a data augmentation method based on spectral-specific stochastic-gamma-transform-based (SSSGT-based) during the model training process to improve generalizability of our model. Besides, we design a mask-weighted voting decision fusion postprocessing algorithm for image segmentation on overlarge RS images. We conducted several comparative experiments on two public datasets and a real surveying and mapping dataset. Extensive experimental results demonstrate that compared with the SDFCNv1 framework, our SDFCNv2 framework can increase the mIoU metric by up to 5.22% while only using about half of parameters.

Download Full-text

A NOVEL METHOD FOR EXTRACTING TEXT FROM NATURAL SCENE IMAGES AND TTS

European Science Review ◽

10.29013/esr-19-11.12.1-30-33 ◽

2019 ◽

pp. 30-33

Author(s):

U. R. Khamdamov ◽

M. N. Mukhiddinov ◽

A. O. Mukhamedaminov ◽

O. N. Djuraev

Keyword(s):

Natural Scene ◽

Novel Method ◽

Natural Scene Images

Download Full-text

Ensemble Visual Content based Search and Retrieval for Natural Scene Images

Recent Patents on Computer Science ◽

10.2174/2213275912666190327175712 ◽

2019 ◽

Vol 12 ◽

Author(s):

Pushpendra Singh ◽

P.N. Hrisheekesha ◽

Vinai Kumar Singh

Keyword(s):

Template Matching ◽

Feature Descriptor ◽

Natural Scene ◽

Visual Content ◽

Other Information ◽

Similar Images ◽

State Of Art ◽

Natural Scene Images ◽

Hybrid Methodology ◽

Search And Retrieval

Content based image retrieval (CBIR) is one of the field for information retrieval where similar images are retrieved from database based on the various image descriptive parameters. The image descriptor vector is used by machine learning based systems to store, learn and template matching. These feature descriptor vectors locally or globally demonstrate the visual content present in an image using texture, color, shape, and other information. In past, several algorithms were proposed to fetch the variety of contents from an image based on which the image is retrieved from database. But, the literature suggests that the precision and recall for the gained results using single content descriptor is not significant. The main vision of this paper is to categorize and evaluate those algorithms, which were proposed in the interval of last 10 years. In addition, experiment is performed using a hybrid content descriptors methodology that helps to gain the significant results as compared with state-of-art algorithms. The hybrid methodology decreases the error rate and improves the precision and recall for large natural scene images dataset having more than 20 classes.

Download Full-text

Convolutional Neural Network for the Semantic Segmentation of Remote Sensing Images

Mobile Networks and Applications ◽

10.1007/s11036-020-01703-3 ◽

2021 ◽

Vol 26 (1) ◽

pp. 200-215

Author(s):

Muhammad Alam ◽

Jian-Feng Wang ◽

Cong Guangpei ◽

LV Yunrong ◽

Yuanfang Chen

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Neural Networks ◽

Image Processing ◽

Deep Learning ◽

Semantic Segmentation ◽

Natural Scene ◽

Remote Sensing Images ◽

Advantages And Disadvantages ◽

Target Segmentation

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.

Download Full-text

Mining discriminative patches for script identification in natural scene images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200260 ◽

2021 ◽

Vol 40 (1) ◽

pp. 551-563

Author(s):

Liqiong Lu ◽

Dong Wu ◽

Ziwei Tang ◽

Yaohua Yi ◽

Faliang Huang

Keyword(s):

Neural Networks ◽

Experimental Results ◽

The Other ◽

Natural Scene ◽

Fixed Size ◽

Script Identification ◽

Aspect Ratios ◽

Novel Approach ◽

Public Datasets ◽

Natural Scene Images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text

Devanagari Text Detection From Natural Scene Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2020070104 ◽

2020 ◽

Vol 10 (3) ◽

pp. 44-59

Author(s):

Sankirti Sandeep Shiravale ◽

R. Jayadevan ◽

Sanjeev S. Sannakki

Keyword(s):

Edge Detection ◽

Image Understanding ◽

Text Detection ◽

Experimental Results ◽

Combined Approach ◽

Natural Scene ◽

Light Conditions ◽

The Individual ◽

Natural Scene Images ◽

Better Than

Text present in a camera captured scene images is semantically rich and can be used for image understanding. Automatic detection, extraction, and recognition of text are crucial in image understanding applications. Text detection from natural scene images is a tedious task due to complex background, uneven light conditions, multi-coloured and multi-sized font. Two techniques, namely ‘edge detection' and ‘colour-based clustering', are combined in this paper to detect text in scene images. Region properties are used for elimination of falsely generated annotations. A dataset of 1250 images is created and used for experimentation. Experimental results show that the combined approach performs better than the individual approaches.

Download Full-text