DCGAN-Based Data Augmentation for Document Classification

Documents are stored in a digital form across several organizations. Printing this amount of data and placing it into folders instead of storing digitally is against the practical, economical, and ecological perspective. An efficient way of retrieving data from digitally stored documents is also required. This article presents a real-time supervised learning technique for document classification based on deep convolutional neural network (DCNN), which aims to reduce the impact of adverse document image issues such as signatures, marks, logo, and handwritten notes. The proposed technique’s major steps include data augmentation, feature extraction using pre-trained neural network models, feature fusion, and feature selection. We propose a novel data augmentation technique, which normalizes the imbalanced dataset using the secondary dataset RVL-CDIP. The DCNN features are extracted using the VGG19 and AlexNet networks. The extracted features are fused, and the fused feature vector is optimized by applying a Pearson correlation coefficient-based technique to select the optimized features while removing the redundant features. The proposed technique is tested on the Tobacco3482 dataset, which gives a classification accuracy of 93.1% using a cubic support vector machine classifier, proving the validity of the proposed technique.

Download Full-text

Ontology-Guided Data Augmentation for Medical Document Classification

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/978-3-030-59137-3_8 ◽

2020 ◽

pp. 78-88

Author(s):

Mahdi Abdollahi ◽

Xiaoying Gao ◽

Yi Mei ◽

Shameek Ghosh ◽

Jinyan Li

Keyword(s):

Data Augmentation ◽

Document Classification ◽

Medical Document

Download Full-text

Ontology-Guided Data Augmentation for Medical Document Classification

10.26686/wgtn.13151078 ◽

2020 ◽

Author(s):

Mahdi Abdollahi ◽

Gao Xiaoying ◽

Mei Yi ◽

Ghosh Shameek ◽

Li Jinyan

Keyword(s):

Input Data ◽

Data Augmentation ◽

Substantial Improvement ◽

Document Classification ◽

Suggested Approach ◽

Domain Specific ◽

Unstructured Text ◽

Challenging Tasks ◽

Medical Documents ◽

Medical Document

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.

Download Full-text

Ontology-Guided Data Augmentation for Medical Document Classification

10.26686/wgtn.13151078.v1 ◽

2020 ◽

Author(s):

Mahdi Abdollahi ◽

Gao Xiaoying ◽

Mei Yi ◽

Ghosh Shameek ◽

Li Jinyan

Keyword(s):

Input Data ◽

Data Augmentation ◽

Substantial Improvement ◽

Document Classification ◽

Suggested Approach ◽

Domain Specific ◽

Unstructured Text ◽

Challenging Tasks ◽

Medical Documents ◽

Medical Document

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text

Deep neural networks trained with heavier data augmentation learn features closer to representations in hIT

10.32470/ccn.2018.1046-0 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alex Hernández-García ◽

Johannes Mehrer ◽

Nikolaus Kriegeskorte ◽

Peter König ◽

Tim C. Kietzmann

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Data Augmentation

Download Full-text

Nonnegative Matrix Factorization and Document Classification

10.15368/theses.2015.110 ◽

2015 ◽

Author(s):

Stephen Calabrese

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Document Classification

Download Full-text

Comparison of Nonlinear Spatial Correlation Models by the Influence of the Data Augmentation to the Classification Risk

Nonlinear Analysis Modelling and Control ◽

10.15388/na.2002.7.1.15200 ◽

2002 ◽

Vol 7 (1) ◽

pp. 31-42

Author(s):

J. Šaltytė ◽

K. Dučinskas

Keyword(s):

Spatial Correlation ◽

Random Fields ◽

Data Augmentation ◽

Gaussian Random Fields ◽

Classification Rule ◽

Numerical Comparison ◽

First Order ◽

Bayesian Risk ◽

Correlation Models

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.

Download Full-text