EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

Author(s):  
Shrinidhi Kanchi ◽  
Alain Pagani ◽  
Hamam Mokayed ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. The image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network(HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses the dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While the earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3428 from scratch. Thereby, we outperform state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ju Fan ◽  
Yuanchun Jiang ◽  
Yezheng Liu ◽  
Yonghang Zhou

PurposeCourse recommendations are important for improving learner satisfaction and reducing dropout rates on massive open online course (MOOC) platforms. This study aims to propose an interpretable method of analyzing students' learning behaviors and recommending MOOCs by integrating multiple data sources.Design/methodology/approachThe study proposes a deep learning method of recommending MOOCs to students based on a multi-attention mechanism comprising learning records attention, word-level review attention, sentence-level review attention and course description attention. The proposed model is validated using real-world data consisting of the learning records of 6,628 students for 1,789 courses and 65,155 reviews.FindingsThe main contribution of this study is its exploration of multiple unstructured information using the proposed multi-attention network model. It provides an interpretable strategy for analyzing students' learning behaviors and conducting personalized MOOC recommendations.Practical implicationsThe findings suggest that MOOC platforms must fully utilize the information implied in course reviews to extract personalized learning preferences.Originality/valueThis study is the first attempt to recommend MOOCs by exploring students' preferences in course reviews. The proposed multi-attention mechanism improves the interpretability of MOOC recommendations.


2020 ◽  
Vol 51 (3) ◽  
pp. 544-560 ◽  
Author(s):  
Kimberly A. Murphy ◽  
Emily A. Diehm

Purpose Morphological interventions promote gains in morphological knowledge and in other oral and written language skills (e.g., phonological awareness, vocabulary, reading, and spelling), yet we have a limited understanding of critical intervention features. In this clinical focus article, we describe a relatively novel approach to teaching morphology that considers its role as the key organizing principle of English orthography. We also present a clinical example of such an intervention delivered during a summer camp at a university speech and hearing clinic. Method Graduate speech-language pathology students provided a 6-week morphology-focused orthographic intervention to children in first through fourth grade ( n = 10) who demonstrated word-level reading and spelling difficulties. The intervention focused children's attention on morphological families, teaching how morphology is interrelated with phonology and etymology in English orthography. Results Comparing pre- and posttest scores, children demonstrated improvement in reading and/or spelling abilities, with the largest gains observed in spelling affixes within polymorphemic words. Children and their caregivers reacted positively to the intervention. Therefore, data from the camp offer preliminary support for teaching morphology within the context of written words, and the intervention appears to be a feasible approach for simultaneously increasing morphological knowledge, reading, and spelling. Conclusion Children with word-level reading and spelling difficulties may benefit from a morphology-focused orthographic intervention, such as the one described here. Research on the approach is warranted, and clinicians are encouraged to explore its possible effectiveness in their practice. Supplemental Material https://doi.org/10.23641/asha.12290687


Author(s):  
Dang Van Thin ◽  
Ngan Luu-Thuy Nguyen ◽  
Tri Minh Truong ◽  
Lac Si Le ◽  
Duy Tin Vo

Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site. 1


2021 ◽  
Vol 14 (4) ◽  
pp. 1-24
Author(s):  
Sushant Kafle ◽  
Becca Dingman ◽  
Matt Huenerfauth

There are style guidelines for authors who highlight important words in static text, e.g., bolded words in student textbooks, yet little research has investigated highlighting in dynamic texts, e.g., captions during educational videos for Deaf or Hard of Hearing (DHH) users. In our experimental study, DHH participants subjectively compared design parameters for caption highlighting, including: decoration (underlining vs. italicizing vs. boldfacing), granularity (sentence level vs. word level), and whether to highlight only the first occurrence of a repeating keyword. In partial contrast to recommendations in prior research, which had not been based on experimental studies with DHH users, we found that DHH participants preferred boldface, word-level highlighting in captions. Our empirical results provide guidance for the design of keyword highlighting during captioned videos for DHH users, especially in educational video genres.


2019 ◽  
Author(s):  
Roberta Rocca ◽  
Kenny R. Coventry ◽  
Kristian Tylén ◽  
Marlene Staib ◽  
Torben E. Lund ◽  
...  

AbstractSpatial demonstratives are powerful linguistic tools used to establish joint attention. Identifying the meaning of semantically underspecified expressions like “this one” hinges on the integration of linguistic and visual cues, attentional orienting and pragmatic inference. This synergy between language and extralinguistic cognition is pivotal to language comprehension in general, but especially prominent in demonstratives.In this study, we aimed to elucidate which neural architectures enable this intertwining between language and extralinguistic cognition using a naturalistic fMRI paradigm. In our experiment, 28 participants listened to a specially crafted dialogical narrative with a controlled number of spatial demonstratives. A fast multiband-EPI acquisition sequence (TR = 388ms) combined with finite impulse response (FIR) modelling of the hemodynamic response was used to capture signal changes at word-level resolution.We found that spatial demonstratives bilaterally engage a network of parietal areas, including the supramarginal gyrus, the angular gyrus, and precuneus, implicated in information integration and visuospatial processing. Moreover, demonstratives recruit frontal regions, including the right FEF, implicated in attentional orienting and reference frames shifts. Finally, using multivariate similarity analyses, we provide evidence for a general involvement of the dorsal (“where”) stream in the processing of spatial expressions, as opposed to ventral pathways encoding object semantics.Overall, our results suggest that language processing relies on a distributed architecture, recruiting neural resources for perception, attention, and extra-linguistic aspects of cognition in a dynamic and context-dependent fashion.


Author(s):  
Yazan Shaker Almahameed ◽  
May Al-Shaikhli

The current study aimed at investigating the salient syntactic and semantic errors made by Jordanian English foreign language learners as writing in English. Writing poses a great challenge for both native and non-native speakers of English, since writing involves employing most language sub-systems such as grammar, vocabulary, spelling and punctuation. A total of 30 Jordanian English foreign language learners participated in the study. The participants were instructed to write a composition of no more than one hundred and fifty words on a selected topic. Essays were collected and analyzed statistically to obtain the needed results. The results of the study displayed that syntactic errors produced by the participants were varied, in that eleven types of syntactic errors were committed as follows; verb-tense, agreement, auxiliary, conjunctions, word order, resumptive pronouns, null-subject, double-subject, superlative, comparative and possessive pronouns. Amongst syntactic errors, verb tense errors were the most frequent with 33%. The results additionally revealed that two types of semantic errors were made; errors at sentence level and errors at word level. Errors at word level outstripped by far errors at sentence level, scoring respectively 82% and 18%. It can be concluded that the syntactic and semantic knowledge of Jordanian learners of English is still insufficient.


2019 ◽  
Vol 16 (2) ◽  
pp. 359-380
Author(s):  
Zhehua Piao ◽  
Sang-Min Park ◽  
Byung-Won On ◽  
Gyu Choi ◽  
Myong-Soon Park

Product reputation mining systems can help customers make their buying decision about a product of interest. In addition, it will be helpful to investigate the preferences of recently released products made by enterprises. Unlike the conventional manual survey, it will give us quick survey results on a low cost budget. In this article, we propose a novel product reputation mining approach based on three dimensional points of view that are word, sentence, and aspect?levels. Given a target product, the aspect?level method assigns the sentences of a review document to the desired aspects. The sentence?level method is a graph-based model for quantifying the importance of sentences. The word?level method computes both importance and sentiment orientation of words. Aggregating these scores, the proposed approach measures the reputation tendency and preferred intensity and selects top-k informative review documents about the product. To validate the proposed method, we experimented with review documents relevant with K5 in Kia motors. Our experimental results show that our method is more helpful than the existing lexicon?based approach in the empirical and statistical studies.


2011 ◽  
Vol 328-330 ◽  
pp. 1763-1767
Author(s):  
Jian Qiang Shen ◽  
Xuan Zou

A novel approach is proposed for measuring fabric texture orientations and recognizing weave patterns. Wavelet transform is suited for fabric image decomposition and Radon Transform is fit for line detection in fabric texture. Since different weave patterns have their own regular orientations in original image and sub-band images decomposed by Wavelet transform, these orientations features are extracted and used as SOM and LVQ inputs to achieve automatic recognition of fabric weave. The experimental results show that the neural network of LVQ is more effective than SOM. The contribution of this study is that it not only can identify fundamental fabric weaves but also can classify double layer and some derivative twill weaves such as angular twill and pointed twill.


2015 ◽  
Vol 4 (2) ◽  
pp. 74-94
Author(s):  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

Script identification is an appealing research interest in the field of document image analysis during the last few decades. The accurate recognition of the script is paramount to many post-processing steps such as automated document sorting, machine translation and searching of text written in a particular script in multilingual environment. For automatic processing of such documents through Optical Character Recognition (OCR) software, it is necessary to identify different script words of the documents before feeding them to the OCR of individual scripts. In this paper, a robust word-level handwritten script identification technique has been proposed using texture based features to identify the words written in any of the seven popular scripts namely, Bangla, Devanagari, Gurumukhi, Malayalam, Oriya, Telugu, and Roman. The texture based features comprise of a combination of Histograms of Oriented Gradients (HOG) and Moment invariants. The technique has been tested on 7000 handwritten text words in which each script contributes 1000 words. Based on the identification accuracies and statistical significance testing of seven well-known classifiers, Multi-Layer Perceptron (MLP) has been chosen as the final classifier which is then tested comprehensively using different folds and with different epoch sizes. The overall accuracy of the system is found to be 94.7% using 5-fold cross validation scheme, which is quite impressive considering the complexities and shape variations of the said scripts. This is an extended version of the paper described in (Singh et al., 2014).


Sign in / Sign up

Export Citation Format

Share Document