Crf-based authors' name tagging for scanned documents

Detection and reorganization of text may save a lot of time while reproducing old books text and its chapters. This is really challenging research topic as different books may have different font types and styles. The digital books and eBooks reading habit is increasing day by day and new documents are producing every day. So in order to boost the process the text reorganization using digital image processing techniques can be used. This research work is using hybrid algorithms and morphological algorithms. For sample we have taken an letter pad where the text and images are separated using algorithms. The another objective of this research is to increase the accuracy of recognized text and produce accurate results. This research worked on two different concepts, first is concept of Pixel-level thresholding processing and another one is Otsu Method thresholding.

Download Full-text

High-Capacity Reversible Data Hiding in Encrypted Images by Information Preprocessing

Complexity ◽

10.1155/2020/6989452 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Xi-Yan Li ◽

Xia-Bing Zhou ◽

Qing-Lei Zhou ◽

Shi-Jing Han ◽

Zheng Liu

Keyword(s):

Data Hiding ◽

Reversible Data Hiding ◽

High Capacity ◽

Main Idea ◽

Original Image ◽

Chaos Encryption ◽

Secret Information ◽

Encrypted Image ◽

Encrypted Images ◽

Scanned Documents

With the development of cloud computing, high-capacity reversible data hiding in an encrypted image (RDHEI) has attracted increasing attention. The main idea of RDHEI is that an image owner encrypts a cover image, and then a data hider embeds secret information in the encrypted image. With the information hiding key, a receiver can extract the embedded data from the hidden image; with the encryption key, the receiver reconstructs the original image. In this paper, we can embed data in the form of random bits or scanned documents. The proposed method takes full advantage of the spatial correlation in the original images to vacate the room for embedding information before image encryption. By jointly using Sudoku and Arnold chaos encryption, the encrypted images retain the vacated room. Before the data hiding phase, the secret information is preprocessed by a halftone, quadtree, and S-BOX transformation. The experimental results prove that the proposed method not only realizes high-capacity reversible data hiding in encrypted images but also reconstructs the original image completely.

Download Full-text

Signature segmentation and recognition from scanned documents

2013 13th International Conference on Intellient Systems Design and Applications ◽

10.1109/isda.2013.6920712 ◽

2013 ◽

Author(s):

Ranju Mandal ◽

Partha Pratim Roy ◽

Umapada Pal ◽

Michael Blumenstein

Keyword(s):

Scanned Documents

Download Full-text

A dewarping algorithm to compensate volume binding distortion in scanned documents

Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10 ◽

10.1145/1774088.1774103 ◽

2010 ◽

Cited By ~ 1

Author(s):

Rafael D. Lins ◽

Daniel M. Oliveira ◽

Gabriel Torreão ◽

Jian Fan ◽

Marcelo Thielo

Keyword(s):

Scanned Documents

Download Full-text

Stamp detection in scanned documents

Annales Universitatis Mariae Curie-Sklodowska sectio AI – Informatica ◽

10.2478/v10065-010-0036-6 ◽

2010 ◽

Vol 10 (1) ◽

Author(s):

Paweł Forczmański

Keyword(s):

Scanned Documents

Download Full-text

Use of Linked Data principles for semantic management of scanned documents

Transinformação ◽

10.1590/2318-08892016000200010 ◽

2016 ◽

Vol 28 (2) ◽

pp. 241-251 ◽

Cited By ~ 1

Author(s):

Luciane Lena Pessanha Monteiro ◽

Mark Douglas de Azevedo Jacyntho

Keyword(s):

Decision Making ◽

Knowledge Base ◽

Web Application ◽

Linked Data ◽

World Wide ◽

Decision Making Process ◽

Whole Process ◽

The World ◽

Scanned Documents ◽

The Web

The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a decision-making process is in course. To this end, machine-understandable metadata, created through the use of reference Linked Data ontologies, are associated to documents, creating a knowledge base. To further enrich the process, (semi)automatic mashup of these metadata with data from the new Web of Linked Data is carried out, considerably increasing the scope of the knowledge base and enabling to extract new data related to the content of stored documents from the Web and combine them, without the user making any effort or perceiving the complexity of the whole process.

Download Full-text

Automatic Information Extraction from Scanned Documents

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60276-5_9 ◽

2020 ◽

pp. 87-96

Author(s):

Lukáš Bureš ◽

Petr Neduchal ◽

Luděk Müller

Keyword(s):

Information Extraction ◽

Automatic Information ◽

Scanned Documents ◽

Automatic Information Extraction

Download Full-text

DEVELOPMENT OF AN INTELLIGENT PROCESSING SYSTEM MODULE FOR SCANNED DOCUMENTS BASED ON THE COMBINED IMAGE SEGMENTATION METHOD

Innovative technologies and scientific solutions for industries ◽

10.30837/2522-9818.2019.8.044 ◽

2019 ◽

Vol 0 (2 (8)) ◽

pp. 44-53

Author(s):

Alesya Ishchenko

Keyword(s):

Image Segmentation ◽

Processing System ◽

Segmentation Method ◽

Intelligent Processing ◽

System Module ◽

Scanned Documents

Download Full-text

Automated hierarchical classification of scanned documents using convolutional neural network and regular expression

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i1.pp1018-1029 ◽

2022 ◽

Vol 12 (1) ◽

pp. 1018

Author(s):

Rifiana Arief ◽

Achmad Benny Mutiara ◽

Tubagus Maulana Kusuma ◽

Hustinawaty Hustinawaty

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

Regular Expression ◽

Hierarchical Classification ◽

Document Vector ◽

Classification Prediction ◽

Scanned Documents

<p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.</p>

Download Full-text