DevNet: An Efficient CNN Architecture for Handwritten Devanagari Character Recognition

Author(s):  
Riya Guha ◽  
Nibaran Das ◽  
Mahantapas Kundu ◽  
Mita Nasipuri ◽  
K. C. Santosh

The writing style is a unique characteristic of a human being as it varies from one person to another. Due to such diversity in writing style, handwritten character recognition (HCR) under the purview of pattern recognition is not trivial. Conventional methods used handcrafted features that required a-priori domain knowledge, which is always not feasible. In such a case, extracting features automatically could potentially attract more interests. For this, in the literature, convolutional neural network (CNN) has been a popular approach to extract features from the image data. However, state-of-the-art works do not provide a generic CNN model for character recognition, Devanagari script, for instance. Therefore, in this work, we first study several different CNN models on publicly available handwritten Devanagari characters and numerals datasets. This means that our study is primarily focusing on comparative study by taking trainable parameters, training time and memory consumption into account. Later, we propose and design DevNet, a modified CNN architecture that produced promising results, since computational complexity and memory space are our primary concerns in design.

2022 ◽  
Vol 13 (1) ◽  
pp. 1-20
Author(s):  
Wen-Cheng Chen ◽  
Wan-Lun Tsai ◽  
Huan-Hua Chang ◽  
Min-Chun Hu ◽  
Wei-Ta Chu

Tactic learning in virtual reality (VR) has been proven to be effective for basketball training. Endowed with the ability of generating virtual defenders in real time according to the movement of virtual offenders controlled by the user, a VR basketball training system can bring more immersive and realistic experiences for the trainee. In this article, an autoregressive generative model for instantly producing basketball defensive trajectory is introduced. We further focus on the issue of preserving the diversity of the generated trajectories. A differentiable sampling mechanism is adopted to learn the continuous Gaussian distribution of player position. Moreover, several heuristic loss functions based on the domain knowledge of basketball are designed to make the generated trajectories assemble real situations in basketball games. We compare the proposed method with the state-of-the-art works in terms of both objective and subjective manners. The objective manner compares the average position, velocity, and acceleration of the generated defensive trajectories with the real ones to evaluate the fidelity of the results. In addition, more high-level aspects such as the empty space for offender and the defensive pressure of the generated trajectory are also considered in the objective evaluation. As for the subjective manner, visual comparison questionnaires on the proposed and other methods are thoroughly conducted. The experimental results show that the proposed method can achieve better performance than previous basketball defensive trajectory generation works in terms of different evaluation metrics.


1998 ◽  
Vol 10 (8) ◽  
pp. 2175-2200 ◽  
Author(s):  
Holger Schwenk

We present a new classification architecture based on autoassociative neural networks that are used to learn discriminant models of each class. The proposed architecture has several interesting properties with respect to other model-based classifiers like nearest-neighbors or radial basis functions: it has a low computational complexity and uses a compact distributed representation of the models. The classifier is also well suited for the incorporation of a priori knowledge by means of a problem-specific distance measure. In particular, we will show that tangent distance (Simard, Le Cun, & Denker, 1993) can be used to achieve transformation invariance during learning and recognition. We demonstrate the application of this classifier to optical character recognition, where it has achieved state-of-the-art results on several reference databases. Relations to other models, in particular those based on principal component analysis, are also discussed.


Author(s):  
Chandra Kusuma Dewa ◽  
Amanda Lailatul Fadhilah ◽  
A Afiahayati

Convolutional neural network (CNN) is state-of-the-art method in object recognition task. Specialized for spatial input data type, CNN has special convolutional and pooling layers which enable hierarchical feature learning from the input space. For offline handwritten character recognition problem such as classifying character in MNIST database, CNN shows better classification result than any other methods. By leveraging the advantages of CNN over character recognition task, in this paper we developed a software which utilizes digital image processing methods and CNN module for offline handwritten Javanese character recognition. The software performs image segmentation process using contour and Canny edge detection with OpenCV library over captured handwritten Javanese character image. CNN will classify the segmented image into 20 classes of Javanese letters. For evaluation purposes, we compared CNN to multilayer perceptron (MLP) on classification accuracy and training time. Experiment results show that CNN model testing accuracy outperforms MLP accuracy although CNN needs more training time than MLP.


Author(s):  
Mohamed Elleuch ◽  
Monji Kherallah

In recent years, deep learning (DL) based systems have become very popular for constructing hierarchical representations from unlabeled data. Moreover, DL approaches have been shown to exceed foregoing state of the art machine learning models in various areas, by pattern recognition being one of the more important cases. This paper applies Convolutional Deep Belief Networks (CDBN) to textual image data containing Arabic handwritten script (AHS) and evaluated it on two different databases characterized by the low/high-dimension property. In addition to the benefits provided by deep networks, the system is protected against over-fitting. Experimentally, the authors demonstrated that the extracted features are effective for handwritten character recognition and show very good performance comparable to the state of the art on handwritten text recognition. Yet using Dropout, the proposed CDBN architectures achieved a promising accuracy rates of 91.55% and 98.86% when applied to IFN/ENIT and HACDB databases, respectively.


Author(s):  
K. Suzuki ◽  
M. Claesen ◽  
H. Takeda ◽  
B. De Moor

Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed ‘shallow’ machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.


Author(s):  
K. Suzuki ◽  
M. Claesen ◽  
H. Takeda ◽  
B. De Moor

Nowadays deep learning has been intensively in spotlight owing to its great victories at major competitions, which undeservedly pushed ‘shallow’ machine learning methods, relatively naive/handy algorithms commonly used by industrial engineers, to the background in spite of their facilities such as small requisite amount of time/dataset for training. We, with a practical point of view, utilized shallow learning algorithms to construct a learning pipeline such that operators can utilize machine learning without any special knowledge, expensive computation environment, and a large amount of labelled data. The proposed pipeline automates a whole classification process, namely feature-selection, weighting features and the selection of the most suitable classifier with optimized hyperparameters. The configuration facilitates particle swarm optimization, one of well-known metaheuristic algorithms for the sake of generally fast and fine optimization, which enables us not only to optimize (hyper)parameters but also to determine appropriate features/classifier to the problem, which has conventionally been a priori based on domain knowledge and remained untouched or dealt with naïve algorithms such as grid search. Through experiments with the MNIST and CIFAR-10 datasets, common datasets in computer vision field for character recognition and object recognition problems respectively, our automated learning approach provides high performance considering its simple setting (i.e. non-specialized setting depending on dataset), small amount of training data, and practical learning time. Moreover, compared to deep learning the performance stays robust without almost any modification even with a remote sensing object recognition problem, which in turn indicates that there is a high possibility that our approach contributes to general classification problems.


2020 ◽  
Vol 2020 (1) ◽  
pp. 78-81
Author(s):  
Simone Zini ◽  
Simone Bianco ◽  
Raimondo Schettini

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.


Author(s):  
Michael Withnall ◽  
Edvard Lindelöf ◽  
Ola Engkvist ◽  
Hongming Chen

We introduce Attention and Edge Memory schemes to the existing Message Passing Neural Network framework for graph convolution, and benchmark our approaches against eight different physical-chemical and bioactivity datasets from the literature. We remove the need to introduce <i>a priori</i> knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dominik Jens Elias Waibel ◽  
Sayedali Shetab Boushehri ◽  
Carsten Marr

Abstract Background Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. Results We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. Conclusions With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 624
Author(s):  
Stefan Rohrmanstorfer ◽  
Mikhail Komarov ◽  
Felix Mödritscher

With the always increasing amount of image data, it has become a necessity to automatically look for and process information in these images. As fashion is captured in images, the fashion sector provides the perfect foundation to be supported by the integration of a service or application that is built on an image classification model. In this article, the state of the art for image classification is analyzed and discussed. Based on the elaborated knowledge, four different approaches will be implemented to successfully extract features out of fashion data. For this purpose, a human-worn fashion dataset with 2567 images was created, but it was significantly enlarged by the performed image operations. The results show that convolutional neural networks are the undisputed standard for classifying images, and that TensorFlow is the best library to build them. Moreover, through the introduction of dropout layers, data augmentation and transfer learning, model overfitting was successfully prevented, and it was possible to incrementally improve the validation accuracy of the created dataset from an initial 69% to a final validation accuracy of 84%. More distinct apparel like trousers, shoes and hats were better classified than other upper body clothes.


Sign in / Sign up

Export Citation Format

Share Document