Crf-based authors' name tagging for scanned documents

Author(s):  
Manabu Ohta ◽  
Atsuhiro Takasu
Keyword(s):  

Detection and reorganization of text may save a lot of time while reproducing old books text and its chapters. This is really challenging research topic as different books may have different font types and styles. The digital books and eBooks reading habit is increasing day by day and new documents are producing every day. So in order to boost the process the text reorganization using digital image processing techniques can be used. This research work is using hybrid algorithms and morphological algorithms. For sample we have taken an letter pad where the text and images are separated using algorithms. The another objective of this research is to increase the accuracy of recognized text and produce accurate results. This research worked on two different concepts, first is concept of Pixel-level thresholding processing and another one is Otsu Method thresholding.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Xi-Yan Li ◽  
Xia-Bing Zhou ◽  
Qing-Lei Zhou ◽  
Shi-Jing Han ◽  
Zheng Liu

With the development of cloud computing, high-capacity reversible data hiding in an encrypted image (RDHEI) has attracted increasing attention. The main idea of RDHEI is that an image owner encrypts a cover image, and then a data hider embeds secret information in the encrypted image. With the information hiding key, a receiver can extract the embedded data from the hidden image; with the encryption key, the receiver reconstructs the original image. In this paper, we can embed data in the form of random bits or scanned documents. The proposed method takes full advantage of the spatial correlation in the original images to vacate the room for embedding information before image encryption. By jointly using Sudoku and Arnold chaos encryption, the encrypted images retain the vacated room. Before the data hiding phase, the secret information is preprocessed by a halftone, quadtree, and S-BOX transformation. The experimental results prove that the proposed method not only realizes high-capacity reversible data hiding in encrypted images but also reconstructs the original image completely.


Author(s):  
Rafael D. Lins ◽  
Daniel M. Oliveira ◽  
Gabriel Torreão ◽  
Jian Fan ◽  
Marcelo Thielo
Keyword(s):  

2016 ◽  
Vol 28 (2) ◽  
pp. 241-251 ◽  
Author(s):  
Luciane Lena Pessanha Monteiro ◽  
Mark Douglas de Azevedo Jacyntho

The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a decision-making process is in course. To this end, machine-understandable metadata, created through the use of reference Linked Data ontologies, are associated to documents, creating a knowledge base. To further enrich the process, (semi)automatic mashup of these metadata with data from the new Web of Linked Data is carried out, considerably increasing the scope of the knowledge base and enabling to extract new data related to the content of stored documents from the Web and combine them, without the user making any effort or perceiving the complexity of the whole process.


Author(s):  
Rifiana Arief ◽  
Achmad Benny Mutiara ◽  
Tubagus Maulana Kusuma ◽  
Hustinawaty Hustinawaty

<p>This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from pusat data teknologi dan informasi (technology and information data center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.</p>


Sign in / Sign up

Export Citation Format

Share Document