DCGAN-Based Data Augmentation for Document Classification

Author(s):  
Aissam JADLI ◽  
Mustapha HAIN ◽  
Adil CHERGUI ◽  
Abderrahman JAIZE
Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6793
Author(s):  
Inzamam Mashood Nasir ◽  
Muhammad Attique Khan ◽  
Mussarat Yasmin ◽  
Jamal Hussain Shah ◽  
Marcin Gabryel ◽  
...  

Documents are stored in a digital form across several organizations. Printing this amount of data and placing it into folders instead of storing digitally is against the practical, economical, and ecological perspective. An efficient way of retrieving data from digitally stored documents is also required. This article presents a real-time supervised learning technique for document classification based on deep convolutional neural network (DCNN), which aims to reduce the impact of adverse document image issues such as signatures, marks, logo, and handwritten notes. The proposed technique’s major steps include data augmentation, feature extraction using pre-trained neural network models, feature fusion, and feature selection. We propose a novel data augmentation technique, which normalizes the imbalanced dataset using the secondary dataset RVL-CDIP. The DCNN features are extracted using the VGG19 and AlexNet networks. The extracted features are fused, and the fused feature vector is optimized by applying a Pearson correlation coefficient-based technique to select the optimized features while removing the redundant features. The proposed technique is tested on the Tobacco3482 dataset, which gives a classification accuracy of 93.1% using a cubic support vector machine classifier, proving the validity of the proposed technique.


2020 ◽  
Author(s):  
Mahdi Abdollahi ◽  
Gao Xiaoying ◽  
Mei Yi ◽  
Ghosh Shameek ◽  
Li Jinyan

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.


2020 ◽  
Author(s):  
Mahdi Abdollahi ◽  
Gao Xiaoying ◽  
Mei Yi ◽  
Ghosh Shameek ◽  
Li Jinyan

Extracting meaningful features from unstructured text is one of the most challenging tasks in medical document classification. The various domain specific expressions and synonyms in the clinical discharge notes make it more challenging to analyse them. The case becomes worse for short texts such as abstract documents. These challenges can lead to poor classification accuracy. As the medical input data is often not enough in the real world, in this work a novel ontology-guided method is proposed for data augmentation to enrich input data. Then, three different deep learning methods are employed to analyse the performance of the suggested approach for classification. The experimental results show that the suggested approach achieved substantial improvement in the targeted medical documents classification.


2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


Author(s):  
Alex Hernández-García ◽  
Johannes Mehrer ◽  
Nikolaus Kriegeskorte ◽  
Peter König ◽  
Tim C. Kietzmann

2002 ◽  
Vol 7 (1) ◽  
pp. 31-42
Author(s):  
J. Šaltytė ◽  
K. Dučinskas

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.


2011 ◽  
Vol 131 (8) ◽  
pp. 1459-1466
Author(s):  
Yasunari Maeda ◽  
Hideki Yoshida ◽  
Masakiyo Suzuki ◽  
Toshiyasu Matsushima

Sign in / Sign up

Export Citation Format

Share Document