Author(s):  
Abhishek Das ◽  
Mihir Narayan Mohanty

In this chapter, the authors have given a detailed review on optical character recognition. Various methods are used in this field with different accuracy levels. Still there are some difficulties in recognizing handwritten characters because of different writing styles of different individuals even in a particular language. A comparative study is given to understand different types of optical character recognition along with different methods used in each type. Implementation of neural network in different forms is found in most of the works. Different image processing techniques like OCR with CNN, RNN, combination of CNN and RNN, etc. are observed in recent research works.


2021 ◽  
Vol 15 ◽  
Author(s):  
Pooja Jain ◽  
Kavita Taneja ◽  
Harmunish Taneja

Background: Instant access to desired information is crucial for building an intelligent environment that creates value for people and steering towards society 5.0. Online newspapers are one such example that provides instant access to information anywhere and anytime on our mobiles, tablets, laptops, desktops, etc. However, when it comes to searching for a specific advertisement in newspapers, online newspapers do not provide easy advertisement search options. In addition, there are no specialized search portals for keyword-based advertisement searches across multiple online newspapers. As a result, to find a specific advertisement in multiple newspapers, a sequential manual search is required across a range of online newspapers. Objective: This research paper proposes a keyword-based advertisement search framework to provide an instant access to the relevant advertisements from online English newspapers in a category of reader’s choice. Method: First, an image extraction algorithm is proposed to identify and extract the images from online newspapers without using any rules on advertisement placement and size. It is followed by a proposed deep learning Convolutional Neural Network (CNN) model named ‘Adv_Recognizer’ to separate the advertisement images from non-advertisement images. Another CNN Model, ‘Adv_Classifier’, is proposed, classifying the advertisement images into four pre-defined categories. Finally, the Optical Character Recognition (OCR) technique performs keyword-based advertisement searches in various types across multiple newspapers. Results: The proposed image extraction algorithm can easily extract all types of well-bounded images from different online newspapers. This algorithm is used to create an ‘English newspaper image dataset’ of 11,000 images, including advertisements and non-advertisements. The proposed ‘Adv_Recognizer’ model separates advertising and non-advertisement pictures with an accuracy of around 97.8%. In addition, the proposed ‘Adv_Classifier’ model classifies the advertisements in four pre-defined categories exhibiting an accuracy of approximately 73.5%. Conclusion: The proposed framework will help newspaper readers perform exhaustive advertisement searches across various online English newspapers in a category of their interest. It will also help in carrying out advertisement analysis and studies.


Sign in / Sign up

Export Citation Format

Share Document