Handwritten Annotation Spotting in Printed Documents Using Top-Down Visual Saliency Models

Shilpa Pandey; Gaurav Harit

doi:10.1145/3485468

Handwritten Annotation Spotting in Printed Documents Using Top-Down Visual Saliency Models

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3485468 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-25

Author(s):

Shilpa Pandey ◽

Gaurav Harit

Keyword(s):

Binary Classification ◽

Visual Saliency ◽

Semantic Segmentation ◽

Training Set ◽

Handwritten Text ◽

Different Types ◽

Comparable Performance ◽

Scanned Image ◽

Printed Text ◽

Visual Saliency Models

In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.

Download Full-text

The Effect of Eye Movements in Response to Different Types of Scenes Using a Graph-Based Visual Saliency Algorithm

Applied Sciences ◽

10.3390/app9245378 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5378 ◽

Cited By ~ 1

Author(s):

Maria Wahid ◽

Asim Waris ◽

Syed Omer Gilani ◽

Ramanathan Subramanian

Keyword(s):

Eye Movements ◽

Video Clip ◽

Visual Saliency ◽

Visual Objects ◽

Study Results ◽

Significant Difference ◽

Different Types ◽

Visual Saliency Model ◽

Visual Saliency Models

Saliency is the quality of an object that makes it stands out from neighbouring items and grabs viewer attention. Regarding image processing, it refers to the pixel or group of pixels that stand out in an image or a video clip and capture the attention of the viewer. Our eye movements are usually guided by saliency while inspecting a scene. Rapid detection of emotive stimuli an ability possessed by humans. Visual objects in a scene are also emotionally salient. As different images and clips can elicit different emotional responses in a viewer such as happiness or sadness, there is a need to measure these emotions along with visual saliency. This study was conducted to determine whether the existing available visual saliency models can also measure emotional saliency. A classical Graph-Based Visual Saliency (GBVS) model is used in the study. Results show that there is low saliency or salient features in sad movies with at least a significant difference of 0.05 between happy and sad videos as well as a large mean difference of 76.57 and 57.0, hence making these videos less emotionally salient. However, overall visual content does not capture emotional salience. The applied Graph-Based Visual Saliency model notably identified happy emotions but could not analyze sad emotions.

Download Full-text

UNDERSTANDING 3D POINT CLOUD DEEP NEURAL NETWORKS BY VISUALIZATION TECHNIQUES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-651-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 651-657

Author(s):

Y. Cao ◽

M. Previtali ◽

M. Scaioni

Keyword(s):

Point Cloud ◽

Semantic Segmentation ◽

Point Clouds ◽

Learning Networks ◽

Quantitative Investigation ◽

Different Types ◽

Visualization Techniques ◽

Point Cloud Classification ◽

Learned Features ◽

Excellent Tool

Abstract. In the wake of the success of Deep Learning Networks (DLN) for image recognition, object detection, shape classification and semantic segmentation, this approach has proven to be both a major breakthrough and an excellent tool in point cloud classification. However, understanding how different types of DLN achieve still lacks. In several studies the output of segmentation/classification process is compared against benchmarks, but the network is treated as a “black-box” and intermediate steps are not deeply analysed. Specifically, here the following questions are discussed: (1) what exactly did DLN learn from a point cloud? (2) On the basis of what information do DLN make decisions? To conduct such a quantitative investigation of these DLN applied to point clouds, this paper investigates the visual interpretability for the decision-making process. Firstly, we introduce a reconstruction network able to reconstruct and visualise the learned features, in order to face with question (1). Then, we propose 3DCAM to indicate the discriminative point cloud regions used by these networks to identify that category, thus dealing with question (2). Through answering the above two questions, the paper would like to offer some initial solutions to better understand the application of DLN to point clouds.

Download Full-text

Selection of a best metric and evaluation of bottom-up visual saliency models

Image and Vision Computing ◽

10.1016/j.imavis.2013.08.004 ◽

2013 ◽

Vol 31 (10) ◽

pp. 796-808 ◽

Cited By ~ 5

Author(s):

Mohsen Emami ◽

Lawrence L. Hoberock

Keyword(s):

Visual Saliency ◽

Bottom Up ◽

Visual Saliency Models ◽

Selection Of

Download Full-text

No-Reference quality assessment of noisy images with local features and visual saliency models

Information Sciences ◽

10.1016/j.ins.2019.01.034 ◽

2019 ◽

Vol 482 ◽

pp. 334-349 ◽

Cited By ~ 6

Author(s):

Mariusz Oszust

Keyword(s):

Quality Assessment ◽

Visual Saliency ◽

Local Features ◽

Noisy Images ◽

Reference Quality ◽

Visual Saliency Models

Download Full-text

U-net Network for Building Information Extraction of Remote-Sensing Imagery

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v14i12.9335 ◽

2018 ◽

Vol 14 (12) ◽

pp. 179

Author(s):

Jingtan Li ◽

Maolin Xu ◽

Hongling Xiu

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Information Extraction ◽

Image Data ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Remote Sensing Images ◽

Training Set ◽

Building Information ◽

The Face

With the resolution of remote sensing images is getting higher and higher, high-resolution remote sensing images are widely used in many areas. Among them, image information extraction is one of the basic applications of remote sensing images. In the face of massive high-resolution remote sensing image data, the traditional method of target recognition is difficult to cope with. Therefore, this paper proposes a remote sensing image extraction based on U-net network. Firstly, the U-net semantic segmentation network is used to train the training set, and the validation set is used to verify the training set at the same time, and finally the test set is used for testing. The experimental results show that U-net can be applied to the extraction of buildings.

Download Full-text

Efficient Semantic Segmentation Using Multi-Path Decoder

Applied Sciences ◽

10.3390/app10186386 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6386

Author(s):

Xing Bai ◽

Jun Zhou

Keyword(s):

Neural Network ◽

Real Time ◽

Network Architecture ◽

Resource Constraints ◽

Cost Effective ◽

Semantic Segmentation ◽

Classification Model ◽

Neural Network Architecture ◽

Great Progress ◽

Different Types

Benefiting from the booming of deep learning, the state-of-the-art models achieved great progress. But they are huge in terms of parameters and floating point operations, which makes it hard to apply them to real-time applications. In this paper, we propose a novel deep neural network architecture, named MPDNet, for fast and efficient semantic segmentation under resource constraints. First, we use a light-weight classification model pretrained on ImageNet as the encoder. Second, we use a cost-effective upsampling datapath to restore prediction resolution and convert features for classification into features for segmentation. Finally, we propose to use a multi-path decoder to extract different types of features, which are not ideal to process inside only one convolutional neural network. The experimental results of our model outperform other models aiming at real-time semantic segmentation on Cityscapes. Based on our proposed MPDNet, we achieve 76.7% mean IoU on Cityscapes test set with only 118.84GFLOPs and achieves 37.6 Hz on 768 × 1536 images on a standard GPU.

Download Full-text

One For All: A Mutual Enhancement Method for Object Detection and Semantic Segmentation

Applied Sciences ◽

10.3390/app10010013 ◽

2019 ◽

Vol 10 (1) ◽

pp. 13 ◽

Cited By ~ 2

Author(s):

Shichao Zhang ◽

Zhe Zhang ◽

Libo Sun ◽

Wenhu Qin

Keyword(s):

Object Detection ◽

Data Augmentation ◽

Semantic Segmentation ◽

Detection Task ◽

Training Set ◽

Segmentation Accuracy ◽

Enhancement Method ◽

Road Segmentation ◽

Accuracy Performance

Generally, most approaches using methods such as cropping, rotating, and flipping achieve more data to train models for improving the accuracy of detection and segmentation. However, due to the difficulties of labeling such data especially semantic segmentation data, those traditional data augmentation methodologies cannot help a lot when the training set is really limited. In this paper, a model named OFA-Net (One For All Network) is proposed to combine object detection and semantic segmentation tasks. Meanwhile, using a strategy called “1-N Alternation” to train the OFA-Net model, which can make a fusion of features from detection and segmentation data. The results show that object detection data can be recruited to better the segmentation accuracy performance, and furthermore, segmentation data assist a lot to enhance the confidence of predictions for object detection. Finally, the OFA-Net model is trained without traditional data augmentation methodologies and tested on the KITTI test server. The model works well on the KITTI Road Segmentation challenge and can do a good job on the object detection task.

Download Full-text

Understanding and Visualizing Deep Visual Saliency Models

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.01045 ◽

2019 ◽

Cited By ~ 4

Author(s):

Sen He ◽

Hamed R. Tavakoli ◽

Ali Borji ◽

Yang Mi ◽

Nicolas Pugeault

Keyword(s):

Visual Saliency ◽

Visual Saliency Models

Download Full-text

Semantic Segmentation of a Printed Circuit Board for Component Recognition Based on Depth Images

Sensors ◽

10.3390/s20185318 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5318

Author(s):

Dongnian Li ◽

Changming Li ◽

Chengjun Chen ◽

Zhengxu Zhao

Keyword(s):

Random Forest ◽

Printed Circuit Board ◽

Semantic Segmentation ◽

Circuit Board ◽

Depth Image ◽

Training Set ◽

Pixel Classification ◽

Printed Circuit ◽

Depth Images ◽

Illumination Changes

Locating and identifying the components mounted on a printed circuit board (PCB) based on machine vision is an important and challenging problem for automated PCB inspection and automated PCB recycling. In this paper, we propose a PCB semantic segmentation method based on depth images that segments and recognizes components in the PCB through pixel classification. The image training set for the PCB was automatically synthesized with graphic rendering. Based on a series of concentric circles centered at the given depth pixel, we extracted the depth difference features from the depth images in the training set to train a random forest pixel classifier. By using the constructed random forest pixel classifier, we performed semantic segmentation for the PCB to segment and recognize components in the PCB through pixel classification. Experiments on both synthetic and real test sets were conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that our method can segment and recognize most of the components from a real depth image of the PCB. Our method is immune to illumination changes and can be implemented in parallel on a GPU.

Download Full-text

Visual Saliency Prediction Based on Deep Learning

Information ◽

10.3390/info10080257 ◽

2019 ◽

Vol 10 (8) ◽

pp. 257 ◽

Cited By ~ 7

Author(s):

Bashir Ghariba ◽

Mohamed S. Shehata ◽

Peter McGuire

Keyword(s):

Deep Learning ◽

Saliency Detection ◽

Visual Saliency ◽

Semantic Segmentation ◽

Input Image ◽

Human Eye ◽

Proposed Model ◽

Global Accuracy ◽

Visual Saliency Detection ◽

Deep Learning Model

Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.

Download Full-text