scholarly journals Weighted combination of per-frame recognition results for text recognition in a video stream

2021 ◽  
Vol 45 (1) ◽  
pp. 77-89
Author(s):  
O. Petrova ◽  
K. Bulatov ◽  
V.V. Arlazarov ◽  
V.L. Arlazarov

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

2021 ◽  
Vol 45 (1) ◽  
pp. 101-109
Author(s):  
M.A. Aliev ◽  
I.A. Kunina ◽  
A.V. Kazbekov ◽  
V.L. Arlazarov

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.


2020 ◽  
Vol 2020 (9) ◽  
pp. 323-1-323-8
Author(s):  
Litao Hu ◽  
Zhenhua Hu ◽  
Peter Bauer ◽  
Todd J. Harris ◽  
Jan P. Allebach

Image quality assessment has been a very active research area in the field of image processing, and there have been numerous methods proposed. However, most of the existing methods focus on digital images that only or mainly contain pictures or photos taken by digital cameras. Traditional approaches evaluate an input image as a whole and try to estimate a quality score for the image, in order to give viewers an idea of how “good” the image looks. In this paper, we mainly focus on the quality evaluation of contents of symbols like texts, bar-codes, QR-codes, lines, and hand-writings in target images. Estimating a quality score for this kind of information can be based on whether or not it is readable by a human, or recognizable by a decoder. Moreover, we mainly study the viewing quality of the scanned document of a printed image. For this purpose, we propose a novel image quality assessment algorithm that is able to determine the readability of a scanned document or regions in a scanned document. Experimental results on some testing images demonstrate the effectiveness of our method.


2019 ◽  
Vol 9 (2) ◽  
pp. 236 ◽  
Author(s):  
Saad Ahmed ◽  
Saeeda Naz ◽  
Muhammad Razzak ◽  
Rubiyah Yusof

This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers.


2017 ◽  
Vol 865 ◽  
pp. 547-553 ◽  
Author(s):  
Ji Hun Park

This paper presents a new computation method for human joint angle. A human structure is modelled as an articulated rigid body kinematics in single video stream. Every input image consists of a rotating articulated segment with a different 3D angle. Angle computation for a human joint is achieved by several steps. First we compute internal as well as external parameters of a camera using feature points of fixed environment using nonlinear programming. We set an image as a reference image frame for 3D scene analysis for a rotating articulated segment. Then we compute angles of rotation and a center of rotation of the segment for each input frames using corresponding feature points as well as computed camera parameters using nonlinear programming. With computed angles of rotation and a center of rotation, we can perform volumetric reconstruction of an articulated human body in 3D. Basic idea for volumetric reconstruction is regarding separate 3D reconstruction for each articulated body segment. Volume reconstruction in 3D for a rotating segment is done by modifying transformation relation of world-to-camera to adjust an angle of rotation of a rotated segment as if there were no rotation for the segment. Our experimental results for a single rotating segment show our method works well.


2022 ◽  
pp. 811-822
Author(s):  
B.V. Dhandra ◽  
Satishkumar Mallappa ◽  
Gururaj Mukarambi

In this article, the exhaustive experiment is carried out to test the performance of the Segmentation based Fractal Texture Analysis (SFTA) features with nt = 4 pairs, and nt = 8 pairs, geometric features and their combinations. A unified algorithm is designed to identify the scripts of the camera captured bi-lingual document image containing International language English with each one of Hindi, Kannada, Telugu, Malayalam, Bengali, Oriya, Punjabi, and Urdu scripts. The SFTA algorithm decomposes the input image into a set of binary images from which the fractal dimension of the resulting regions are computed in order to describe the segmented texture patterns. This motivates use of the SFTA features as the texture features to identify the scripts of the camera-based document image, which has an effect of non-homogeneous illumination (Resolution). An experiment is carried on eleven scripts each with 1000 sample images of block sizes 128 × 128, 256 × 256, 512 × 512 and 1024 × 1024. It is observed that the block size 512 × 512 gives the maximum accuracy of 86.45% for Gujarathi and English script combination and is the optimal size. The novelty of this article is that unified algorithm is developed for the script identification of bilingual document images.


1997 ◽  
Vol 29 (5) ◽  
pp. 397-414 ◽  
Author(s):  
Anders Tehler ◽  
José M. Egea

AbstractThe genus Lecanactis, with 24 species, has been phylogenetically analysed using cladistic parsimony methods and support tests. Morphological, anatomical and chemical data were used, comprising 38 characters. Twelve equally most parsimonious trees were obtained. The successive approximations character weighting method gave one most parsimonious tree. The ingroup, Lecanactis, is supported as monophyletic. Although parsimony jackknifing and Bremer support indicate that the trees are poorly supported, some groups are wholly or partly distinguished in both the strict consensus tree, the successive weighting tree and the Jac tree.


2019 ◽  
Vol 43 (5) ◽  
pp. 818-824 ◽  
Author(s):  
V.V. Arlazarov ◽  
K. Bulatov ◽  
T. Chernov ◽  
V.L. Arlazarov

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.


2020 ◽  
Vol 17 (9) ◽  
pp. 4398-4403
Author(s):  
H. C. Vinod ◽  
S. K. Niranjan

De-warping is the elementary step in the analysis of document images which are camera based. Processing of warped image is a challenging task. Therefore, to make the document images OCR understandable de-warping has become a major task. In this paper, we have presented an effective pre-processing, de-warping and de-skewing techniques for camera captured document image processing. In pre-processing, we have divided input color document image into R, G and B-band, further convert R, G, B-band to C, M and Y-color space respectively, convert the average CMY grey scale image to binary by determining threshold value automatically. In de-warping and de-skewing, we have presented an effective techniques to remove geometrical and perspective distortion in camera captured document by combination of centroid and mid-point of bounding box height, bounding box is plotted on text blocks using connected component analysis technique. The introduced work is robust in correcting geometric distortion and skew correction for standard printed data set and also for Kannada handwritten documents.


Sign in / Sign up

Export Citation Format

Share Document