printed text
Recently Published Documents


TOTAL DOCUMENTS

330
(FIVE YEARS 80)

H-INDEX

17
(FIVE YEARS 2)

Author(s):  
Shilpa Pandey ◽  
Gaurav Harit

In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.


Author(s):  
Gulfeshan Parween

Abstract: In this paper, we present a scheme to develop to complete OCR system for printed text English Alphabet of Uppercase of different font and of different sizes so that we can use this system in Banking, Corporate, Legal industry and so on. OCR system consists of different modules like preprocessing, segmentation, feature extraction and recognition. In preprocessing step it is expected to include image gray level conversion, binary conversion etc. After finding out the feature of the segmented characters artificial neural network and can be used for Character Recognition purpose. Efforts have been made to improve the performance of character recognition using artificial neural network techniques. The proposed OCR system is capable of accepting printed document images from a file and implemented using MATLAB R2014a version. Key words: OCR, Printed text, Barcode recognition


2021 ◽  
Vol 4 (5) ◽  
pp. 1183-1198
Author(s):  
Sergey S. Sidorovich

The Institute of Oriental Manuscripts of the Russian Academy of Sciences possesses a xylographed fragment in classical Mongolian script with a handwritten text on the reverse side (call mark G 110 recto), which was obtained in 1909 during P. K. Kozlov’s expedition in Khara-Khoto. The printed text in classical Mongolian script with several interlinear glosses in Chinese and a page footer (of the transcription of the Chinese name of the chapter and the page number) was read by the Soviet Orientalist N. Ts. Munkuyev more than 50 years ago. Munkuyev dated it by the XIV century based on the paleographic peculiarities. Moreover, based on the official history Yuan shi, he supposed that the text might be a Mongolian translation of the legislative code Da Yuan tong-zhi and suggested two possible versions of original Chinese name of the chapter, out of which an incorrect one was unfortunately chosen. Since Da Yuan tong-zhi was not preserved in full and the major part of the written monument including the chapters of interest were lost, it was impossible to find the text in scope, and the mistake in the reconstruction of the chapter name also could not be detected. However, in 2002 in South Korea a part of Zhi-zheng tiao-ge code was found, which was promulgated in 1346 and was intended to replace the outdated Da Yuan tong-zhi. In one of his previous articles, the author has shown that both codes were built according to a general pattern elaborated as far back as the Tang epoch (618–907). This enabled reconstruction of the name of the chapter mentioned in the fragment. Fortunately, the surviving part of the Zhi-zheng tiao-ge code contains the required chapters, and the Chinese glosses in the fragment allowed us to find the original Chinese text, which turned out to be a document dated 1303 and, according to the date, was evidently included in both codes. The article also contains the Chinese text of the document and its annotated translation.


2021 ◽  
Vol 11 (6) ◽  
pp. 7968-7973
Author(s):  
M. Kazmi ◽  
F. Yasir ◽  
S. Habib ◽  
M. S. Hayat ◽  
S. A. Qazi

Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping. This paper presents a holistic approach Urdu ligature extraction technique. The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures. It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately. A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.4%. Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms. The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.


2021 ◽  
Author(s):  
◽  
Elspeth Jane Simms

<p>Victor Hugo’s character, Claude Frollo, expressed Hugo’s linguistic analogy for architecture in his novel of 1831, Notre-Dame de Paris. Frollo directs the eyes of his companions from the book resting on his desk to the shadow of the nearby Notre-Dame cathedral, stating: ‘This will kill that’. Hugo expressed the belief that prior to the printing press, the communication of mankind occurred through architecture. His concern was for the fate of architecture following the invention of a new form of communication; the printed text. This thesis questions the concern that print will ‘kill’ architecture through an exploration of architectural research and design led by text. A validity of print as an experimental tool for architectural design is established through a range of output; visual and physical expression, creative writing, and formal writing. These design modes reveal unique architecture from within Hugo’s Notre-Dame de Paris. The outcomes of this research draw attention to the imaginative possibilities that text provides for architecture. It finds that architecture exists within text and allows for interpretation and conversion, into both real and imagined space. It provides a framework through which this can occur within other text, not just Notre-Dame de Paris. The conclusion is reached that text is a design tool which offers significant opportunities to the experimentation and design of architecture.</p>


2021 ◽  
Author(s):  
◽  
Elspeth Jane Simms

<p>Victor Hugo’s character, Claude Frollo, expressed Hugo’s linguistic analogy for architecture in his novel of 1831, Notre-Dame de Paris. Frollo directs the eyes of his companions from the book resting on his desk to the shadow of the nearby Notre-Dame cathedral, stating: ‘This will kill that’. Hugo expressed the belief that prior to the printing press, the communication of mankind occurred through architecture. His concern was for the fate of architecture following the invention of a new form of communication; the printed text. This thesis questions the concern that print will ‘kill’ architecture through an exploration of architectural research and design led by text. A validity of print as an experimental tool for architectural design is established through a range of output; visual and physical expression, creative writing, and formal writing. These design modes reveal unique architecture from within Hugo’s Notre-Dame de Paris. The outcomes of this research draw attention to the imaginative possibilities that text provides for architecture. It finds that architecture exists within text and allows for interpretation and conversion, into both real and imagined space. It provides a framework through which this can occur within other text, not just Notre-Dame de Paris. The conclusion is reached that text is a design tool which offers significant opportunities to the experimentation and design of architecture.</p>


2021 ◽  
Author(s):  
Samundeswari S ◽  
Jeshoorin G ◽  
Vasanth M

Insurance companies are regularly provided with health check reports by the buyers of insurance. Different forms of printed lab reports/health check reports have to be digitized for each value of captured parameters. Optical Character Recognition (OCR), is used to convert the images of handwritten, typed, printed text or any kind of scanned documents into machine-encoded text in order to digitize the values from the report. Conversion to this standard set of digital values will benefit in automating a lot of backend approval process. we collect the reports from the user and read the values from the report and scrutinize the values. Based on the values with the company’s standard set, the scrutinization is done and it is then visualized using any visualization tool. The result is presented to the user so that the user can get an idea whether he/she is eligible for insurance claim. The foremost objective of this paper is making the insurance backend approval process a lot easier and a quick response to the buyers.


2021 ◽  
Vol 14 (2) ◽  
Author(s):  
Alicia Feis ◽  
Amanda Lallensack ◽  
Elizabeth Pallante ◽  
Melanie Nielsen ◽  
Nicole Demarco ◽  
...  

This study investigated reading comprehension, reading speed, and the quality of eye movements while reading on an iPad, as compared to printed text. 31 visually-normal subjects were enrolled. Two of the passages were read from the Visagraph standardized text on iPad and Print. Eye movement characteristics and comprehension were evaluated. Mean (SD) fixation duration was significantly longer with the iPad at 270 ms (40) compared to the printed text (p=0.04) at 260 ms (40). Subjects’ mean reading rates were significantly lower on the iPad at 294 words per minute (wpm) than the printed text at 318 wpm (p=0.03). The mean (SD) overall reading duration was significantly (p=0.02) slower on the iPad that took 31 s (9.3) than the printed text at 28 s (8.0). Overall reading performance is lower with an iPad than printed text in normal individuals. These findings might be more consequential in children and adult slower readers when they read using iPads.  


Sign in / Sign up

Export Citation Format

Share Document