Data Extraction from Images through OCR

Anurag Tiwari

doi:10.22214/ijraset.2021.37377

Data Extraction from Images through OCR

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37377 ◽

2021 ◽

Vol 9 (VIII) ◽

pp. 435-437

Author(s):

Anurag Tiwari

Keyword(s):

Small Businesses ◽

Character Recognition ◽

Optical Character Recognition ◽

Human Error ◽

Data Extraction ◽

Data Accessibility ◽

Daily Lives ◽

Whole Process ◽

Optical Character ◽

User Data

The paperwork used in maintaining various types of documents in our daily lives is tiresome and inefficient, it consumes a lot of time and it is difficult to maintain and remember the concerned documents. This project provides a solution to these problems by introducing Optical Character Recognition Technology (OCR) which runs on Tesseract OCR Engine. The project specifically aims at increasing data accessibility, usability and improving customer experience by decreasing the time spent to process, save, and maintain user data. Another objective of this project is to nullify the human error, which is huge in manual handling of data records, the software used in the solution uses certain techniques to minimize these errors. Optical Character Recognition (OCR) is used for extracting texts and characters from an image. This helps us in maintaining our records and data digitally and securely. In this project we are using the Tesseract OCR Engine which has high accuracy rates for clean images. We have implemented a web version of OCR which runs on TesseractJS; other JavaScript frameworks are also used. The outcome of the project is that it is able successfully to extract text and characters from the provided image using Tesseract OCR Engine. It is observed that for the high resolution images the accuracy is above 90%. This web based application is useful for small businesses as they don’t have to install any extra software, all it needs is a file to be uploaded on an online interface making them able to access remotely. It will also help students to save notes and documents online which will make their important documents easily accessible on the web. This whole process is time and memory efficient.

Download Full-text

Optical Character Recognition based Webapp

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.34926 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 385-389

Author(s):

Akshay Gharde

Keyword(s):

Character Recognition ◽

Web Application ◽

Optical Character Recognition ◽

Digital Conversion ◽

Daily Lives ◽

Storage And Retrieval ◽

Optical Character ◽

Textual Data ◽

Cross Platform ◽

Natural Way

As the use of computers in our daily lives increases, so has the need for a natural procedure to interact with the computers. The ultimate aim of human computer interaction is to bring the change that there is always a natural way of interacting with computers coupled with ease and flexibility. Printed and textual media such as prescriptions, invoices, receipts, etc. occupies a large segment of our day-to-day activities and given their volume, it is inefficient to manage them physically as there’s always an associated risk of fading, damage, misplacing, etc. and hence a medium is required for their digital conversion. In this project, we have developed a robust, cross-platform web application that can process the images using PyTesseract based algorithms that can efficiently extract the textual data to facilitate the storage and retrieval of the same. The extracted text can be downloaded as a text file and can also be translated into the desired language. This is an active field of research and thus this paper also discusses various current implementations of the mentioned concept. The Optical Character Recognition framework finds applications in a variety of fields such as business process activities, number plate recognition, KYC and banking processes to name a few.

Download Full-text

Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports

Gastrointestinal Endoscopy ◽

10.1016/j.gie.2020.08.038 ◽

2020 ◽

Author(s):

Sobia Nasir Laique ◽

Umar Hayat ◽

Shashank Sarvepalli ◽

Byron Vaughn ◽

Mounir Ibrahim ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Large Scale ◽

Data Extraction ◽

Quality Metric ◽

Optical Character

Download Full-text

Mo1076 Validation of a Hybrid Natural Language Processing Tool Utilizing Optical Character Recognition for Data Extraction From Scanned Colonoscopy Reports

Gastrointestinal Endoscopy ◽

10.1016/j.gie.2017.03.968 ◽

2017 ◽

Vol 85 (5) ◽

pp. AB417-AB418 ◽

Cited By ~ 2

Author(s):

Umar Hayat ◽

Mahmoud Isseh ◽

Nazih Isseh ◽

Mounir Ibrahim ◽

John McMichael ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Data Extraction ◽

Optical Character ◽

Natural Language Processing Tool

Download Full-text

SELECTION TECHNIQUE FOR MULTIPLE OUTPUTS OF OPTICAL CHARACTER RECOGNITION

Eurasian Journal of Mathematical and Computer Applications ◽

10.32523/2306-6172-2020-8-2-41-51 ◽

2020 ◽

Vol 8 (2) ◽

pp. 41-51

Author(s):

I.Q. Habeeb ◽

Z.Q. Al-Zaydi ◽

H.N. Abdulkhudhur

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Selection Technique ◽

Multiple Outputs ◽

Optical Character

Download Full-text

A Structured Method for the Recognition of Complex Historical Tables

History and Computing ◽

10.3366/hac.1997.9.1-3.58 ◽

1997 ◽

Vol 9 (1-3) ◽

pp. 58-77

Author(s):

Vitaly Kliatskine ◽

Eugene Shchepin ◽

Gunnar Thorvaldsen ◽

Konstantin Zingerman ◽

Valery Lazarev

Keyword(s):

Nineteenth Century ◽

Character Recognition ◽

Optical Character Recognition ◽

Complex Structure ◽

Source Material ◽

Historical Sources ◽

Tax Assessment ◽

Optical Character ◽

Algorithmic Model ◽

Machine Readable

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.

Download Full-text

CNN-based Rain Reduction in Street View Images

London Imaging Meeting ◽

10.2352/issn.2694-118x.2020.lim-12 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 78-81

Author(s):

Simone Zini ◽

Simone Bianco ◽

Raimondo Schettini

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

State Of The Art ◽

Weather Conditions ◽

Specific Interest ◽

Optical Character ◽

Street View ◽

In The Wild ◽

Bad Weather ◽

Detection And Recognition

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.

Download Full-text

ANALYZING DIFFERENT ALGORITHMS AND TECHNIQUES TO FIND OPTICAL CHARACTER RECOGNITION FOR TAMIL SCRIPTS

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2020.02.00029 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Rajkumar N

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character

Download Full-text

Automated Toll Booth using Optical Character Recognition and RFID System

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/11842019 ◽

2019 ◽

Vol 8 (1.4) ◽

pp. 1056-1061

Author(s):

Saurabh Ravindra Kulkarn ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Rfid System ◽

Toll Booth

Download Full-text

Pengantar dan Survey Tentang Optical Music Recognition

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.331 ◽

2014 ◽

Vol 6 (1) ◽

pp. 36-39

Author(s):

Kevin Purwito

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Music Recognition ◽

Optical Character ◽

Digital Format ◽

Music Recognition ◽

Index Terms ◽

The Many ◽

Further Development ◽

Music Symbol

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer

Download Full-text

Optical character recognition

10.3403/00116074u ◽

2015 ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character

Download Full-text