Robust recognition technique for handwritten Kannada character recognition using capsule networks

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>

Download Full-text

Vehicle images reconstruction using SRCNN for improving the recognition accuracy of vehicle license plate number

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13726 ◽

2020 ◽

Vol 8 (4) ◽

pp. 304-310

Author(s):

Windra Swastika ◽

Ekky Rino Fajar Sakti ◽

Mochamad Subianto

Keyword(s):

High Resolution ◽

Character Recognition ◽

Recognition Accuracy ◽

Super Resolution ◽

Training Data ◽

License Plate ◽

Plate Number ◽

Average Accuracy ◽

Number Recognition ◽

High Resolution Images

Low-resolution images can be reconstructed into high-resolution images using the Super-resolution Convolution Neural Network (SRCNN) algorithm. This study aims to improve the vehicle license plate number's recognition accuracy by generating a high-resolution vehicle image using the SRCNN. The recognition is carried out by two types of character recognition methods: Tesseract OCR and SPNet. The training data for SRCNN uses the DIV2K dataset consisting of 900 images, while the training data for character recognition uses the Chars74 dataset. The high-resolution images constructed using SRCNN can increase the average accuracy of vehicle license plate number recognition by 16.9 % using Tesseract and 13.8 % with SPNet.

Download Full-text

Traditional Chinese Medicine (TCM) Diagnosis Model Building Based on Multi-label Classification

MATEC Web of Conferences ◽

10.1051/matecconf/201823202026 ◽

2018 ◽

Vol 232 ◽

pp. 02026

Author(s):

Lu Zhou ◽

Guang-geng Li ◽

Yu-mei Zhou ◽

Dan Yin ◽

Yan Sun ◽

...

Keyword(s):

Cross Validation ◽

Model Building ◽

Learning Algorithm ◽

Training Data ◽

Average Precision ◽

Test Dataset ◽

Average Accuracy ◽

Good Potential ◽

Diagnosis Model ◽

Fold Cross Validation

In the study, we propose a TCM diagnosis model that can be used for multi-label classification and give clear diagnosis, as well as the basis for diagnosis and differentiation when the symptoms correspond to multiple diseases or syndromes. The implementation of the model is divided into three steps. Firstly, choose the machine learning algorithm to train the TCM diagnosis model. The features of the training data are symptoms and the labels are diseases or syndromes. Secondly, give the number α (α>1, α∈Z+), the model will output the diagnoses with the top α highest probability according to the input symptoms as candidate diagnoses. Finally, the rules of differential diagnosis are designed to determine which candidate diagnoses should be reserved, thereby complete the multi-label classification. In our test dataset, by 10-fold cross-validation, the average accuracy of the single label classification was 0.882; the average precision was 0.974; the average recall was 1.000; the average f1 score was 0.967; the average accuracy of the multi-label classification was 0.706; the average micro precision was 0.934; the average micro recall was 0.941 and the average hamming loss was 0.060. Through the test we can know that this model had a good potential for auxiliary decision making in clinical diagnosis and treatment.

Download Full-text

Klasifikasi Citra Histopatologi Kanker Payudara menggunakan Data Resampling Random dan Residual Network

JURNAL SISTEM INFORMASI BISNIS ◽

10.21456/vol11iss1pp70-79 ◽

2021 ◽

Vol 11 (1) ◽

pp. 70-77

Author(s):

Wahyudi Setiawan

Keyword(s):

Breast Cancer ◽

Image Classification ◽

Training Data ◽

Unbalanced Data ◽

Residual Network ◽

Average Precision ◽

Average Recall ◽

Average Accuracy ◽

Optimal Classification

Data imbalance between classes is one of the problems on image classification. The data in each class is not equal and has a relatively large difference generated in less than optimal classification results. Ideally, the data in each class is equal or have a slight difference. This article discusses the classification of the histopathology breast cancer image. The data consist of 8 classes with unbalanced data. The method for balancing the data in each class uses random resampling which is applied to training data only. The data used from BreakHist with magnifications of 40x, 100x, 200x, and 400x . The classification uses Residual Network (ResNet) 18 and 50. The best results are obtained on images with a magnification of 400x. Classification results using ResNet18 has an average accuracy of 79.82%, an average precision of 71.39%, and an average recall of 82.37%. Meanwhile using ResNet50 showed an average accuracy of 81.67%, average precision of 78.41%, and an average recall of 82.91%.

Download Full-text

Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks

10.36227/techrxiv.14795592 ◽

2021 ◽

Author(s):

Komuravelli Prashanth ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

High Performance ◽

Performance Metrics ◽

Training Data ◽

Mean Average Precision ◽

Average Precision ◽

Squared Error Loss Function ◽

Scanned Documents

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>

Download Full-text

Algorithm for auto annotation of scanned documents based on subregion tiling and shallow networks

10.36227/techrxiv.14795592.v1 ◽

2021 ◽

Author(s):

Komuravelli Prashanth ◽

Kalidas Yeturu

Keyword(s):

Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

High Performance ◽

Performance Metrics ◽

Training Data ◽

Mean Average Precision ◽

Average Precision ◽

Squared Error Loss Function ◽

Scanned Documents

<div>There are millions of scanned documents worldwide in around 4 thousand languages. Searching for information in a scanned document requires a text layer to be available and indexed. Preparation of a text layer requires recognition of character and sub-region patterns and associating with a human interpretation. Developing an optical character recognition (OCR) system for each and every language is a very difficult task if not impossible. There is a strong need for systems that add on top of the existing OCR technologies by learning from them and unifying disparate multitude of many a system. In this regard, we propose an algorithm that leverages the fact that we are dealing with scanned documents of handwritten text regions from across diverse domains and language settings. We observe that the text regions have consistent bounding box sizes and any large font or tiny font scenarios can be handled in preprocessing or postprocessing phases. The image subregions are smaller in size in scanned text documents compared to subregions formed by common objects in general purpose images. We propose and validate the hypothesis that a much simpler convolution neural network (CNN) having very few layers and less number of filters can be used for detecting individual subregion classes. For detection of several hundreds of classes, multiple such simpler models can be pooled to operate simultaneously on a document. The advantage of going by pools of subregion specific models is the ability to deal with incremental addition of hundreds of newer classes over time, without disturbing the previous models in the continual learning scenario. Such an approach has distinctive advantage over using a single monolithic model where subregions classes share and interfere via a bulky common neural network. We report here an efficient algorithm for building a subregion specific lightweight CNN models. The training data for the CNN proposed, requires engineering synthetic data points that consider both pattern of interest and non-patterns as well. We propose and validate the hypothesis that an image canvas in which optimal amount of pattern and non-pattern can be formulated using a means squared error loss function to influence filter for training from the data. The CNN hence trained has the capability to identify the character-object in presence of several other objects on a generalized test image of a scanned document. In this setting some of the key observations are in a CNN, learning a filter depends not only on the abundance of patterns of interest but also on the presence of a non-pattern context. Our experiments have led to some of the key observations - (i) a pattern cannot be over-expressed in isolation, (ii) a pattern cannot be under-xpressed as well, (iii) a non-pattern can be of salt and pepper type noise and finally (iv) it is sufficient to provide a non-pattern context to a modest representation of a pattern to result in strong individual sub-region class models. We have carried out studies and reported \textit{mean average precision} scores on various data sets including (1) MNIST digits(95.77), (2) E-MNIST capital alphabet(81.26), (3) EMNIST small alphabet(73.32) (4) Kannada digits(95.77), (5) Kannada letters(90.34), (6) Devanagari letters(100) (7) Telugu words(93.20) (8) Devanagari words(93.20) and also on medical prescriptions and observed high-performance metrics of mean average precision over 90%. The algorithm serves as a kernel in the automatic annotation of digital documents in diverse scenarios such as annotation of ancient manuscripts and hand-written health records.</div>

Download Full-text

Concise CNN model for face expression recognition

Intelligent Decision Technologies ◽

10.3233/idt-190181 ◽

2021 ◽

pp. 1-9

Author(s):

Harshadkumar B. Prajapati ◽

Ankit S. Vyas ◽

Vipul K. Dabhi

Keyword(s):

Network Architecture ◽

Expression Recognition ◽

Human Machine Interaction ◽

Face Expression ◽

Computational Overhead ◽

Average Accuracy ◽

Proposed Model ◽

Face Expression Recognition ◽

Expression Classification ◽

Machine Interaction

Face expression recognition (FER) has gained very much attraction to researchers in the field of computer vision because of its major usefulness in security, robotics, and HMI (Human-Machine Interaction) systems. We propose a CNN (Convolutional Neural Network) architecture to address FER. To show the effectiveness of the proposed model, we evaluate the performance of the model on JAFFE dataset. We derive a concise CNN architecture to address the issue of expression classification. Objective of various experiments is to achieve convincing performance by reducing computational overhead. The proposed CNN model is very compact as compared to other state-of-the-art models. We could achieve highest accuracy of 97.10% and average accuracy of 90.43% for top 10 best runs without any pre-processing methods applied, which justifies the effectiveness of our model. Furthermore, we have also included visualization of CNN layers to observe the learning of CNN.

Download Full-text

Robust Approach to Supervised Deep Neural Network Training for Real-Time Object Classification in Cluttered Indoor Environment

Applied Sciences ◽

10.3390/app11157148 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7148

Author(s):

Bedada Endale ◽

Abera Tullu ◽

Hayoung Shi ◽

Beom-Soo Kang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Input Data ◽

Deep Neural Network ◽

Data Augmentation ◽

Object Classification ◽

Training Data ◽

Gradient Descent Algorithm

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.

Download Full-text

Gravity Control-Based Data Augmentation Technique for Improving VR User Activity Recognition

Symmetry ◽

10.3390/sym13050845 ◽

2021 ◽

Vol 13 (5) ◽

pp. 845

Author(s):

Dongheun Han ◽

Chulwoo Lee ◽

Hyeongyeop Kang

Keyword(s):

Activity Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Measurement Unit ◽

Gravitational Acceleration ◽

The Neural Network ◽

Typical Data ◽

Robust Recognition ◽

Gravity Acceleration

The neural-network-based human activity recognition (HAR) technique is being increasingly used for activity recognition in virtual reality (VR) users. The major issue of a such technique is the collection large-scale training datasets which are key for deriving a robust recognition model. However, collecting large-scale data is a costly and time-consuming process. Furthermore, increasing the number of activities to be classified will require a much larger number of training datasets. Since training the model with a sparse dataset can only provide limited features to recognition models, it can cause problems such as overfitting and suboptimal results. In this paper, we present a data augmentation technique named gravity control-based augmentation (GCDA) to alleviate the sparse data problem by generating new training data based on the existing data. The benefits of the symmetrical structure of the data are that it increased the number of data while preserving the properties of the data. The core concept of GCDA is two-fold: (1) decomposing the acceleration data obtained from the inertial measurement unit (IMU) into zero-gravity acceleration and gravitational acceleration, and augmenting them separately, and (2) exploiting gravity as a directional feature and controlling it to augment training datasets. Through the comparative evaluations, we validated that the application of GCDA to training datasets showed a larger improvement in classification accuracy (96.39%) compared to the typical data augmentation methods (92.29%) applied and those that did not apply the augmentation method (85.21%).

Download Full-text

Enhanced Convolutional-Neural-Network Architecture for Crop Classification

Applied Sciences ◽

10.3390/app11094292 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4292

Author(s):

Mónica Y. Moreno-Revelo ◽

Lorena Guachi-Guachi ◽

Juan Bernardo Gómez-Mendoza ◽

Javier Revelo-Fuelagán ◽

Diego H. Peluffo-Ordóñez

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Architecture ◽

Classification Model ◽

Classification Error ◽

Small Scale ◽

Post Processing ◽

Average Accuracy ◽

Processing Step ◽

Crop Classification

Automatic crop identification and monitoring is a key element in enhancing food production processes as well as diminishing the related environmental impact. Although several efficient deep learning techniques have emerged in the field of multispectral imagery analysis, the crop classification problem still needs more accurate solutions. This work introduces a competitive methodology for crop classification from multispectral satellite imagery mainly using an enhanced 2D convolutional neural network (2D-CNN) designed at a smaller-scale architecture, as well as a novel post-processing step. The proposed methodology contains four steps: image stacking, patch extraction, classification model design (based on a 2D-CNN architecture), and post-processing. First, the images are stacked to increase the number of features. Second, the input images are split into patches and fed into the 2D-CNN model. Then, the 2D-CNN model is constructed within a small-scale framework, and properly trained to recognize 10 different types of crops. Finally, a post-processing step is performed in order to reduce the classification error caused by lower-spatial-resolution images. Experiments were carried over the so-named Campo Verde database, which consists of a set of satellite images captured by Landsat and Sentinel satellites from the municipality of Campo Verde, Brazil. In contrast to the maximum accuracy values reached by remarkable works reported in the literature (amounting to an overall accuracy of about 81%, a f1 score of 75.89%, and average accuracy of 73.35%), the proposed methodology achieves a competitive overall accuracy of 81.20%, a f1 score of 75.89%, and an average accuracy of 88.72% when classifying 10 different crops, while ensuring an adequate trade-off between the number of multiply-accumulate operations (MACs) and accuracy. Furthermore, given its ability to effectively classify patches from two image sequences, this methodology may result appealing for other real-world applications, such as the classification of urban materials.

Download Full-text