document classification Latest Research Papers

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

10.20944/preprints202201.0061.v1 ◽

2022 ◽

Author(s):

Shrinidhi Kanchi ◽

Alain Pagani ◽

Hamam Mokayed ◽

Marcus Liwicki ◽

Didier Stricker ◽

...

Keyword(s):

Visual Cues ◽

Reduction Rate ◽

Document Classification ◽

Document Image ◽

Analysis Pipeline ◽

Attention Network ◽

Word Level ◽

Novel Approach ◽

The Neural Network ◽

Sentence Level

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. The image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network(HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses the dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While the earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3428 from scratch. Thereby, we outperform state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

XML document classification effectively using improved high-performance factor

International Journal of Engineering Systems Modelling and Simulation ◽

10.1504/ijesms.2022.10044365 ◽

2022 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Latha Parthiban ◽

S. Sahunthala ◽

Angelina Geetha

Keyword(s):

High Performance ◽

Document Classification ◽

Performance Factor ◽

Xml Document

Advanced Applications on Bilingual Document Analysis and Processing Systems

10.4018/978-1-6684-3690-5.ch032 ◽

2022 ◽

pp. 625-674

Author(s):

Shalini Puri ◽

Satya Prakash Singh

Keyword(s):

Feature Extraction ◽

Real World ◽

Classification System ◽

Real Life ◽

Document Analysis ◽

Document Classification ◽

Classification Systems ◽

Post Processing ◽

Dual Class ◽

Advanced Applications

Today, rapid digitization requires efficient bilingual non-image and image document classification systems. Although many bilingual NLP and image-based systems provide solutions for real-world problems, they primarily focus on text extraction, identification, and recognition tasks with limited document types. This article discusses a journey of these systems and provides an overview of their methods, feature extraction techniques, document sets, classifiers, and accuracy for English-Hindi and other language pairs. The gaps found lead toward the idea of a generic and integrated bilingual English-Hindi document classification system, which classifies heterogeneous documents using a dual class feeder and two character corpora. Its non-image and image modules include pre- and post-processing stages and pre-and post-segmentation stages to classify documents into predefined classes. This article discusses many real-life applications on societal and commercial issues. The analytical results show important findings of existing and proposed systems.

Document Classification by Order of Context, Concept and Semantic Relations: OCCSR

10.9734/bpi/mono/978-93-5547-265-6/ch7 ◽

2021 ◽

pp. 75-86

Author(s):

A. Venkata Ramana ◽

E. Kesavulu Reddy

Keyword(s):

Document Classification ◽

Semantic Relations

Hierarchical Document Classification by Conceptual and Semantic Similarities: CSS-HDC

10.9734/bpi/mono/978-93-5547-265-6/ch5 ◽

2021 ◽

pp. 55-64

Author(s):

A. Venkata Ramana ◽

E. Kesavulu Reddy

Keyword(s):

Document Classification

An Automated Knowledge Mining and Document Classification System with Multi-model Transfer Learning

Journal of System and Management Sciences ◽

10.33168/jsms.2021.0408 ◽

2021 ◽

Keyword(s):

Transfer Learning ◽

Classification System ◽

Document Classification ◽

Knowledge Mining ◽

Model Transfer ◽

Automated Knowledge

Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach

10.1109/iccca52192.2021.9666391 ◽

2021 ◽

Author(s):

Md. Mehedi Hasan ◽

Sadia Tamim Dip ◽

T. M. Kamruzzaman ◽

Sonia Akter ◽

Imrus Salehin

Keyword(s):

Machine Learning ◽

Document Classification ◽

Learning Approach ◽

Unsupervised Machine Learning ◽

Machine Learning Approach

Comparison of Deep Learning Technologies in Legal Document Classification

10.1109/bigdata52589.2021.9671486 ◽

2021 ◽

Author(s):

Qian Han ◽

Derek Snaidauf

Keyword(s):

Deep Learning ◽

Document Classification ◽

Learning Technologies ◽

Legal Document

Hierarchical BERT with an adaptive fine-tuning strategy for document classification

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.107872 ◽

2021 ◽

pp. 107872

Author(s):

Jun Kong ◽

Jin Wang ◽

Xuejie Zhang

Keyword(s):

Document Classification ◽

Fine Tuning ◽

Tuning Strategy

Towards an Intelligent Fuzzy-fusion Model for Identity Document Classification

10.1145/3487664.3487738 ◽

2021 ◽

Author(s):

Nouna Khandan ◽

Amin Beheshti ◽

Helia Farhood ◽

Matineh Pooshideh ◽

Mike Simpson ◽

...

Keyword(s):

Document Classification ◽

Fusion Model ◽

Identity Document ◽

Fuzzy Fusion

document classification
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

XML document classification effectively using improved high-performance factor

Advanced Applications on Bilingual Document Analysis and Processing Systems

Document Classification by Order of Context, Concept and Semantic Relations: OCCSR

Hierarchical Document Classification by Conceptual and Semantic Similarities: CSS-HDC

An Automated Knowledge Mining and Document Classification System with Multi-model Transfer Learning

Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach

Comparison of Deep Learning Technologies in Legal Document Classification

Hierarchical BERT with an adaptive fine-tuning strategy for document classification

Towards an Intelligent Fuzzy-fusion Model for Identity Document Classification

Export Citation Format

document classificationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

XML document classification effectively using improved high-performance factor

Advanced Applications on Bilingual Document Analysis and Processing Systems

Document Classification by Order of Context, Concept and Semantic Relations: OCCSR

Hierarchical Document Classification by Conceptual and Semantic Similarities: CSS-HDC

An Automated Knowledge Mining and Document Classification System with Multi-model Transfer Learning

Movie Subtitle Document Classification Using Unsupervised Machine Learning Approach

Comparison of Deep Learning Technologies in Legal Document Classification

Hierarchical BERT with an adaptive fine-tuning strategy for document classification

Towards an Intelligent Fuzzy-fusion Model for Identity Document Classification

document classification
Recently Published Documents