perspective distortion
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 25)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Junkang Zhang ◽  
Yiqian Wang ◽  
Dirk-Uwe G. Bartsch ◽  
William R. Freeman ◽  
Truong Q. Nguyen ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5752
Author(s):  
Milan Ondrašovič ◽  
Peter Tarábek

Homography mapping is often exploited to remove perspective distortion in images and can be estimated using point correspondences of a known object (marker). We focus on scenarios with multiple markers placed on the same plane if their relative positions in the world are unknown, causing an indeterminate point correspondence. Existing approaches may only estimate an isolated homography for each marker and cannot determine which homography achieves the best reprojection over the entire image. We thus propose a method to rank isolated homographies obtained from multiple distinct markers to select the best homography. This method extends existing approaches in the post-processing stage, provided that the point correspondences are available and that the markers differ only by similarity transformation after rectification. We demonstrate the robustness of our method using a synthetic dataset and show an approximately 60% relative improvement over the random selection strategy based on the homography estimation from the OpenCV library.


2021 ◽  
Vol 11 (14) ◽  
pp. 6292
Author(s):  
Tae-Gu Kim ◽  
Byoung-Ju Yun ◽  
Tae-Hun Kim ◽  
Jae-Young Lee ◽  
Kil-Houm Park ◽  
...  

In this study, we have proposed an algorithm that solves the problems which occur during the recognition of a vehicle license plate through closed-circuit television (CCTV) by using a deep learning model trained with a general database. The deep learning model which is commonly used suffers with a disadvantage of low recognition rate in the tilted and low-resolution images, as it is trained with images acquired from the front of the license plate. Furthermore, the vehicle images acquired by using CCTV have issues such as limitation of resolution and perspective distortion. Such factors make it difficult to apply the commonly used deep learning model. To improve the recognition rate, an algorithm which is a combination of the super-resolution generative adversarial network (SRGAN) model, and the perspective distortion correction algorithm is proposed in this paper. The accuracy of the proposed algorithm was verified with a character recognition algorithm YOLO v2, and the recognition rate of the vehicle license plate image was improved 8.8% from the original images.


2021 ◽  
Vol 45 (1) ◽  
pp. 77-89
Author(s):  
O. Petrova ◽  
K. Bulatov ◽  
V.V. Arlazarov ◽  
V.L. Arlazarov

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jingfan Tang ◽  
Meijia Zhou ◽  
Pengfei Li ◽  
Min Zhang ◽  
Ming Jiang

The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).


2021 ◽  
Vol 6 (1) ◽  
pp. 1-5
Author(s):  
Zobeir Raisi ◽  
Mohamed A. Naiel ◽  
Paul Fieguth ◽  
Steven Wardell ◽  
John Zelek

The reported accuracy of recent state-of-the-art text detection methods, mostly deep learning approaches, is in the order of 80% to 90% on standard benchmark datasets. These methods have relaxed some of the restrictions of structured text and environment (i.e., "in the wild") which are usually required for classical OCR to properly function. Even with this relaxation, there are still circumstances where these state-of-the-art methods fail.  Several remaining challenges in wild images, like in-plane-rotation, illumination reflection, partial occlusion, complex font styles, and perspective distortion, cause exciting methods to perform poorly. In order to evaluate current approaches in a formal way, we standardize the datasets and metrics for comparison which had made comparison between these methods difficult in the past. We use three benchmark datasets for our evaluations: ICDAR13, ICDAR15, and COCO-Text V2.0. The objective of the paper is to quantify the current shortcomings and to identify the challenges for future text detection research.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Lina Li

In this paper, we analyze and calculate the crowd density in a tourist area utilizing video surveillance dynamic information analysis and divide the crowd counting and density estimation task into three stages. In this paper, novel scale perception module and inverse scale perception module are designed to further facilitate the mining of multiscale information by the counting model; the main function of the third stage is to generate the population distribution density map, which mainly consists of three columns of void convolution with different void rates and generates the final population distribution density map using the feature maps of different branch regressions. Also, the algorithm uses jump connections between the top convolution and the bottom void convolution layers to reduce the risk of network gradient disappearance and gradient explosion and optimizes the network parameters using an intermediate supervision strategy. The hierarchical density estimator uses a hierarchical strategy to mine semantic features and multiscale information in a coarse-to-fine manner, and this is used to solve the problem of scale variation and perspective distortion. Also, considering that the background noise affects the quality of the generated density map, the soft attention mechanism is integrated into the model to stretch the distance between the foreground and background to further improve the quality of the density map. Also, inspired by multitask learning, this paper embeds an auxiliary count classifier in the count model to perform the count classification auxiliary task and to increase the model’s ability to express semantic information. Numerous experimental results demonstrate the effectiveness and feasibility of the proposed algorithm in solving the problems of scale variation and perspective distortion.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6888
Author(s):  
Quoc-Bao Ta ◽  
Jeong-Tae Kim

In this study, a regional convolutional neural network (RCNN)-based deep learning and Hough line transform (HLT) algorithm are applied to monitor corroded and loosened bolts in steel structures. The monitoring goals are to detect rusted bolts distinguished from non-corroded ones and also to estimate bolt-loosening angles of the identified bolts. The following approaches are performed to achieve the goals. Firstly, a RCNN-based autonomous bolt detection scheme is designed to identify corroded and clean bolts in a captured image. Secondly, a HLT-based image processing algorithm is designed to estimate rotational angles (i.e., bolt-loosening) of cropped bolts. Finally, the accuracy of the proposed framework is experimentally evaluated under various capture distances, perspective distortions, and light intensities. The lab-scale monitoring results indicate that the suggested method accurately acquires rusted bolts for images captured under perspective distortion angles less than 15° and light intensities larger than 63 lux.


Sign in / Sign up

Export Citation Format

Share Document