PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts

Author(s):  
Sk. Md. Obaidullah ◽  
Chayan Halder ◽  
Nibaran Das ◽  
Kaushik Roy
Author(s):  
Sk Md Obaidullah ◽  
Chitrita Goswami ◽  
K. C. Santosh ◽  
Nibaran Das ◽  
Chayan Halder ◽  
...  

We present a novel approach for separating Indic scripts with ‘matra’, which is used as a precursor to advance and/or ease subsequent handwritten script identification in multi-script documents. In our study, among state-of-the-art features and classifiers, an optimized fractal geometry analysis and random forest are found to be the best performer to distinguish scripts with ‘matra’ from their counterparts. For validation, a total of 1204 document images are used, where two different scripts with ‘matra’: Bangla and Devanagari are considered as positive samples and the other two different scripts: Roman and Urdu are considered as negative samples. With this precursor, an overall script identification performance can be advanced by more than 5.13% in accuracy and 1.17 times faster in processing time as compared to conventional system.


2012 ◽  
Vol 3 (2) ◽  
pp. 263-265
Author(s):  
Rajesh Kumar ◽  
Dr. Srinivasan K.S

Document digitization with scanner in text document images which have distortions that deteriorate the quality of the document. We propose a goal-oriented rectification methodology to recover the document from distorted document image. Our approach relies upon a coarse-to-fine strategy. First, a coarse rectification is accomplished with the projection of the curved surface on the plane which is guided by the textual content’s appearance in the document image while incorporating a transformation which does not depend on specific model primitives or scanner setup parameters. Secondly, normalization is applied on the word level aiming to restore all the local distortions of the document image. Experimental results on various document images with a variety of distortions demonstrate the robustness and effectiveness of the proposed rectification methodology that improves OCR accuracy. It finds its application widely in de-warping of document images, images captured from sculptures, from cursive handwritten text, text from palm leaves and so on...


SIFT and LBP are two popular techniques used for obtaining “feature description" of the object. SIFT identifies key points that are locations with distinct image information and robust to scaling and rotation whereas, LBP transforms an image into an array of integer labels describing small scale appearance of the image. In this paper, we present an efficient method wherein “feature description” of handwritten document images at word level are computed using SIFT and LBP. Identification of script type is done using KNN and SVM classifiers. Experimental results show that the performance of SVM is better over KNN. Further, the proposed method is compared with other methods in the literature to demonstrate the efficacy of the proposed method.


2020 ◽  
Vol 51 (3) ◽  
pp. 544-560 ◽  
Author(s):  
Kimberly A. Murphy ◽  
Emily A. Diehm

Purpose Morphological interventions promote gains in morphological knowledge and in other oral and written language skills (e.g., phonological awareness, vocabulary, reading, and spelling), yet we have a limited understanding of critical intervention features. In this clinical focus article, we describe a relatively novel approach to teaching morphology that considers its role as the key organizing principle of English orthography. We also present a clinical example of such an intervention delivered during a summer camp at a university speech and hearing clinic. Method Graduate speech-language pathology students provided a 6-week morphology-focused orthographic intervention to children in first through fourth grade ( n = 10) who demonstrated word-level reading and spelling difficulties. The intervention focused children's attention on morphological families, teaching how morphology is interrelated with phonology and etymology in English orthography. Results Comparing pre- and posttest scores, children demonstrated improvement in reading and/or spelling abilities, with the largest gains observed in spelling affixes within polymorphemic words. Children and their caregivers reacted positively to the intervention. Therefore, data from the camp offer preliminary support for teaching morphology within the context of written words, and the intervention appears to be a feasible approach for simultaneously increasing morphological knowledge, reading, and spelling. Conclusion Children with word-level reading and spelling difficulties may benefit from a morphology-focused orthographic intervention, such as the one described here. Research on the approach is warranted, and clinicians are encouraged to explore its possible effectiveness in their practice. Supplemental Material https://doi.org/10.23641/asha.12290687


2020 ◽  
Vol 29 (4) ◽  
pp. 2170-2188
Author(s):  
Lindsey R. Squires ◽  
Sara J. Ohlfest ◽  
Kristen E. Santoro ◽  
Jennifer L. Roberts

Purpose The purpose of this systematic review was to determine evidence of a cognate effect for young multilingual children (ages 3;0–8;11 [years;months], preschool to second grade) in terms of task-level and child-level factors that may influence cognate performance. Cognates are pairs of vocabulary words that share meaning with similar phonology and/or orthography in more than one language, such as rose – rosa (English–Spanish) or carrot – carotte (English–French). Despite the cognate advantage noted with older bilingual children and bilingual adults, there has been no systematic examination of the cognate research in young multilingual children. Method We conducted searches of multiple electronic databases and hand-searched article bibliographies for studies that examined young multilingual children's performance with cognates based on study inclusion criteria aligned to the research questions. Results The review yielded 16 articles. The majority of the studies (12/16, 75%) demonstrated a positive cognate effect for young multilingual children (measured in higher accuracy, faster reaction times, and doublet translation equivalents on cognates as compared to noncognates). However, not all bilingual children demonstrated a cognate effect. Both task-level factors (cognate definition, type of cognate task, word characteristics) and child-level factors (level of bilingualism, age) appear to influence young bilingual children's performance on cognates. Conclusions Contrary to early 1990s research, current researchers suggest that even young multilingual children may demonstrate sensitivity to cognate vocabulary words. Given the limits in study quality, more high-quality research is needed, particularly to address test validity in cognate assessments, to develop appropriate cognate definitions for children, and to refine word-level features. Only one study included a brief instruction prior to assessment, warranting cognate treatment studies as an area of future need. Supplemental Material https://doi.org/10.23641/asha.12753179


Sign in / Sign up

Export Citation Format

Share Document