Weighted combination of per-frame recognition results for text recognition in a video stream

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

Download Full-text

Algorithm for choosing the best frame in a video stream in the task of identity document recognition

Computer Optics ◽

10.18287/2412-6179-co-811 ◽

2021 ◽

Vol 45 (1) ◽

pp. 101-109

Author(s):

M.A. Aliev ◽

I.A. Kunina ◽

A.V. Kazbekov ◽

V.L. Arlazarov

Keyword(s):

Image Quality ◽

Recognition System ◽

Video Stream ◽

Document Image ◽

Document Recognition ◽

Identity Document ◽

Readable Form ◽

Recognition Systems ◽

Set Up

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.

Download Full-text

Document Image Quality Assessment with Relaying Reference to Determine Minimum Readable Resolution for Compression

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.9.iqsp-323 ◽

2020 ◽

Vol 2020 (9) ◽

pp. 323-1-323-8

Author(s):

Litao Hu ◽

Zhenhua Hu ◽

Peter Bauer ◽

Todd J. Harris ◽

Jan P. Allebach

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Research Area ◽

Input Image ◽

Quality Score ◽

Document Image ◽

Digital Cameras ◽

Active Research ◽

Traditional Approaches

Image quality assessment has been a very active research area in the field of image processing, and there have been numerous methods proposed. However, most of the existing methods focus on digital images that only or mainly contain pictures or photos taken by digital cameras. Traditional approaches evaluate an input image as a whole and try to estimate a quality score for the image, in order to give viewers an idea of how “good” the image looks. In this paper, we mainly focus on the quality evaluation of contents of symbols like texts, bar-codes, QR-codes, lines, and hand-writings in target images. Estimating a quality score for this kind of information can be based on whether or not it is readable by a human, or recognizable by a decoder. Moreover, we mainly study the viewing quality of the scanned document of a printed image. For this purpose, we propose a novel image quality assessment algorithm that is able to determine the readability of a scanned document or regions in a scanned document. Experimental results on some testing images demonstrate the effectiveness of our method.

Download Full-text

Arabic Cursive Text Recognition from Natural Scene Images

Applied Sciences ◽

10.3390/app9020236 ◽

2019 ◽

Vol 9 (2) ◽

pp. 236 ◽

Cited By ~ 6

Author(s):

Saad Ahmed ◽

Saeeda Naz ◽

Muhammad Razzak ◽

Rubiyah Yusof

Keyword(s):

Recognition System ◽

Document Image ◽

Text Recognition ◽

Chinese Script ◽

Challenging Problem ◽

Future Directions ◽

Scene Text ◽

Comprehensive Survey ◽

Recognition Systems ◽

Scene Text Recognition

This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers.

Download Full-text

Multiple Image Based Human Joint Angle Computation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.865.547 ◽

2017 ◽

Vol 865 ◽

pp. 547-553 ◽

Cited By ~ 1

Author(s):

Ji Hun Park

Keyword(s):

Nonlinear Programming ◽

Joint Angle ◽

Video Stream ◽

Input Image ◽

Reference Image ◽

Computation Method ◽

Feature Points ◽

Center Of Rotation ◽

Volumetric Reconstruction ◽

Human Joint

This paper presents a new computation method for human joint angle. A human structure is modelled as an articulated rigid body kinematics in single video stream. Every input image consists of a rotating articulated segment with a different 3D angle. Angle computation for a human joint is achieved by several steps. First we compute internal as well as external parameters of a camera using feature points of fixed environment using nonlinear programming. We set an image as a reference image frame for 3D scene analysis for a rotating articulated segment. Then we compute angles of rotation and a center of rotation of the segment for each input frames using corresponding feature points as well as computed camera parameters using nonlinear programming. With computed angles of rotation and a center of rotation, we can perform volumetric reconstruction of an articulated human body in 3D. Basic idea for volumetric reconstruction is regarding separate 3D reconstruction for each articulated body segment. Volume reconstruction in 3D for a rotating segment is done by modifying transformation relation of world-to-camera to adjust an angle of rotation of a rotated segment as if there were no rotation for the segment. Our experimental results for a single rotating segment show our method works well.

Download Full-text

Script Identification of Camera Based Bilingual Document Images Using SFTA Features

10.4018/978-1-6684-3690-5.ch040 ◽

2022 ◽

pp. 811-822

Author(s):

B.V. Dhandra ◽

Satishkumar Mallappa ◽

Gururaj Mukarambi

Keyword(s):

Block Size ◽

Texture Features ◽

Input Image ◽

Document Image ◽

Optimal Size ◽

Document Images ◽

Binary Images ◽

Script Identification ◽

Unified Algorithm ◽

Block Sizes

In this article, the exhaustive experiment is carried out to test the performance of the Segmentation based Fractal Texture Analysis (SFTA) features with nt = 4 pairs, and nt = 8 pairs, geometric features and their combinations. A unified algorithm is designed to identify the scripts of the camera captured bi-lingual document image containing International language English with each one of Hindi, Kannada, Telugu, Malayalam, Bengali, Oriya, Punjabi, and Urdu scripts. The SFTA algorithm decomposes the input image into a set of binary images from which the fractal dimension of the resulting regions are computed in order to describe the segmented texture patterns. This motivates use of the SFTA features as the texture features to identify the scripts of the camera-based document image, which has an effect of non-homogeneous illumination (Resolution). An experiment is carried on eleven scripts each with 1000 sample images of block sizes 128 × 128, 256 × 256, 512 × 512 and 1024 × 1024. It is observed that the block size 512 × 512 gives the maximum accuracy of 86.45% for Gujarathi and English script combination and is the optimal size. The novelty of this article is that unified algorithm is developed for the script identification of bilingual document images.

Download Full-text

Experimental modeling the flow of character recognition results in video stream for document recognition

Eleventh International Conference on Machine Vision (ICMV 2018) ◽

10.1117/12.2522970 ◽

2019 ◽

Author(s):

Elena Andreeva ◽

Vladimir V. Arlazarov ◽

Oleg Slavin ◽

Igor Janiszewski

Keyword(s):

Character Recognition ◽

Video Stream ◽

Experimental Modeling ◽

Document Recognition

Download Full-text

A modification of a stopping method for text recognition in a video stream with best frame selection

Thirteenth International Conference on Machine Vision ◽

10.1117/12.2586928 ◽

2021 ◽

Author(s):

Ilya Tolstov ◽

Stanislav Martynov ◽

Vera Farsobina ◽

Konstantin Bulatov

Keyword(s):

Video Stream ◽

Text Recognition ◽

Frame Selection

Download Full-text

The Phylogeny of Lecanactis (Opegraphaceae)

The Lichenologist ◽

10.1006/lich.1997.0095 ◽

1997 ◽

Vol 29 (5) ◽

pp. 397-414 ◽

Cited By ~ 4

Author(s):

Anders Tehler ◽

José M. Egea

Keyword(s):

Chemical Data ◽

Successive Approximations ◽

Consensus Tree ◽

Strict Consensus Tree ◽

Weighting Method ◽

Parsimonious Tree ◽

Character Weighting

AbstractThe genus Lecanactis, with 24 species, has been phylogenetically analysed using cladistic parsimony methods and support tests. Morphological, anatomical and chemical data were used, comprising 38 characters. Twelve equally most parsimonious trees were obtained. The successive approximations character weighting method gave one most parsimonious tree. The ingroup, Lecanactis, is supported as monophyletic. Although parsimony jackknifing and Bremer support indicate that the trees are poorly supported, some groups are wholly or partly distinguished in both the strict consensus tree, the successive weighting tree and the Jac tree.

Download Full-text

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

Computer Optics ◽

10.18287/2412-6179-2019-43-5-818-824 ◽

2019 ◽

Vol 43 (5) ◽

pp. 818-824 ◽

Cited By ~ 7

Author(s):

V.V. Arlazarov ◽

K. Bulatov ◽

T. Chernov ◽

V.L. Arlazarov

Keyword(s):

Mobile Devices ◽

Face Detection ◽

Data Extraction ◽

Personal Data ◽

Ground Truth ◽

Document Analysis ◽

Video Stream ◽

Text Line ◽

Document Recognition ◽

Identity Document

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

Download Full-text

Camera Captured Document De-Warping and De-Skewing

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9085 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4398-4403

Author(s):

H. C. Vinod ◽

S. K. Niranjan

Keyword(s):

Color Space ◽

Threshold Value ◽

Document Image ◽

Geometric Distortion ◽

Document Images ◽

Data Set ◽

Handwritten Documents ◽

Perspective Distortion ◽

Bounding Box ◽

Analysis Technique

De-warping is the elementary step in the analysis of document images which are camera based. Processing of warped image is a challenging task. Therefore, to make the document images OCR understandable de-warping has become a major task. In this paper, we have presented an effective pre-processing, de-warping and de-skewing techniques for camera captured document image processing. In pre-processing, we have divided input color document image into R, G and B-band, further convert R, G, B-band to C, M and Y-color space respectively, convert the average CMY grey scale image to binary by determining threshold value automatically. In de-warping and de-skewing, we have presented an effective techniques to remove geometrical and perspective distortion in camera captured document by combination of centroid and mid-point of bounding box height, bounding box is plotted on text blocks using connected component analysis technique. The introduced work is robust in correcting geometric distortion and skew correction for standard printed data set and also for Kannada handwritten documents.

Download Full-text