scholarly journals Algorithm for choosing the best frame in a video stream in the task of identity document recognition

2021 ◽  
Vol 45 (1) ◽  
pp. 101-109
Author(s):  
M.A. Aliev ◽  
I.A. Kunina ◽  
A.V. Kazbekov ◽  
V.L. Arlazarov

During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the “best” frame. In this paper we considered the solution to such a problem where the “best” frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.

Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1308 ◽  
Author(s):  
Mohsen Jenadeleh ◽  
Marius Pedersen ◽  
Dietmar Saupe

Image quality is a key issue affecting the performance of biometric systems. Ensuring the quality of iris images acquired in unconstrained imaging conditions in visible light poses many challenges to iris recognition systems. Poor-quality iris images increase the false rejection rate and decrease the performance of the systems by quality filtering. Methods that can accurately predict iris image quality can improve the efficiency of quality-control protocols in iris recognition systems. We propose a fast blind/no-reference metric for predicting iris image quality. The proposed metric is based on statistical features of the sign and the magnitude of local image intensities. The experiments, conducted with a reference iris recognition system and three datasets of iris images acquired in visible light, showed that the quality of iris images strongly affects the recognition performance and is highly correlated with the iris matching scores. Rejecting poor-quality iris images improved the performance of the iris recognition system. In addition, we analyzed the effect of iris image quality on the accuracy of the iris segmentation module in the iris recognition system.


2019 ◽  
Vol 9 (2) ◽  
pp. 236 ◽  
Author(s):  
Saad Ahmed ◽  
Saeeda Naz ◽  
Muhammad Razzak ◽  
Rubiyah Yusof

This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers.


2021 ◽  
Vol 45 (1) ◽  
pp. 77-89
Author(s):  
O. Petrova ◽  
K. Bulatov ◽  
V.V. Arlazarov ◽  
V.L. Arlazarov

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.


2019 ◽  
Vol 43 (5) ◽  
pp. 818-824 ◽  
Author(s):  
V.V. Arlazarov ◽  
K. Bulatov ◽  
T. Chernov ◽  
V.L. Arlazarov

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses. The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.


2021 ◽  
Vol 8 (1) ◽  
pp. 164-170
Author(s):  
Mohammad Husam Alhumsi ◽  
Saleh Belhassen

Phonetic dictionaries are regarded as pivotal components of speech recognition systems. The function of speech recognition research is to generate a machine which will accurately identify and distinguish the normal human speech from any other speaker. Literature affirmed that Arabic phonetics is one of the major problems in Arabic speech recognition. Therefore, this paper reviews previous studies tackling the challenges faced by initiating an Arabic phonetic dictionary with respect to Arabic speech recognition. It has been found that the system of speech recognition investigated areas of differences concerning Arabic phonetics. In addition, an Arabic phonetic dictionary should be initiated where the Arabic vowels’ phonemes should be considered as a component of the consonants’ phonemes. Thus, the incorporation of developed machine translation systems may enhance the quality of the system. The current paper concludes with the existing challenges faced by Arabic phonetic dictionary.


Author(s):  
Nikita Razumnuy ◽  
Alexander Kozharinov ◽  
Vladimir Arlazarov ◽  
Dmitry P. Nikolaev ◽  
Timofey Chernov

Telecom IT ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 94-101
Author(s):  
E. Kalyashov

Research subject. The article reviews ways of constructing face recognition systems based on standard modules. Method. The study is based on comparison of performance and recognition quality of various pipelines. Core results. Values of reached recognition quality and dependencies from a type of original data are presented. Practical relevance. The results could be used while implementing various face recognition system pipelines.


2017 ◽  
Vol 3 (2) ◽  
pp. 231-233
Author(s):  
Axel Boese ◽  
Akhil Karthasseril Sivankutty ◽  
Michael Friebe

AbstractMedical applications like vascular endoscopy can provide additional diagnostic value for future procedures [1]. For these applications an optimal compromise between image quality and fibre flexibility has to be identified. Image quality of endoscopes is normally estimated using flat test objects or charts. For the application of vascular endoscopy a tubular test set up seems to be more beneficial. We compare three fibre endoscopes with different diameters and number of integrated fibre according to image quality and flexibility. Based on the results a recommendation for possible applications in vascular endoscopy is given.


Author(s):  
K. Shibatomi ◽  
T. Yamanoto ◽  
H. Koike

In the observation of a thick specimen by means of a transmission electron microscope, the intensity of electrons passing through the objective lens aperture is greatly reduced. So that the image is almost invisible. In addition to this fact, it have been reported that a chromatic aberration causes the deterioration of the image contrast rather than that of the resolution. The scanning electron microscope is, however, capable of electrically amplifying the signal of the decreasing intensity, and also free from a chromatic aberration so that the deterioration of the image contrast due to the aberration can be prevented. The electrical improvement of the image quality can be carried out by using the fascionating features of the SEM, that is, the amplification of a weak in-put signal forming the image and the descriminating action of the heigh level signal of the background. This paper reports some of the experimental results about the thickness dependence of the observability and quality of the image in the case of the transmission SEM.


Sign in / Sign up

Export Citation Format

Share Document