A New Combined Feature Extraction Method for Persian Handwritten Digit Recognition

2017 ◽  
Vol 17 (02) ◽  
pp. 1750012 ◽  
Author(s):  
Mohammad Javad Parseh ◽  
Mojtaba Meftahi

Feature extraction is one of the most important steps in Optical Character Recognition (OCR) systems, that is effective in recognition accuracy. In this paper, a suitable combination of different features such as zoning, hole size, crossing counts, etc. for Persian handwritten digits recognition is proposed. Due to high number of features, feature vector dimensions will be high that increases training time exponentially. In this paper, to solve this problem, Principal Component Analysis (PCA) method is employed for reducing the feature vector dimensions. Finally, data are classified by Support Vector Machine (SVM) classification method. The proposed method has been executed on HODA dataset which is one of the largest standard datasets of Persian handwritten digits that includes 60[Formula: see text]000 training and 20[Formula: see text]000 test samples. The proposed method reaches to 99.07% of accuracy in this dataset, and the experimental results show significant improvement in accuracy of Persian handwritten OCR compared to the previous methods.

Author(s):  
Htwe Pa Pa Win ◽  
Phyo Thu Thu Khine ◽  
Khin Nwe Ni Tun

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the scripts’ natures. One major contribution of the work in this paper is the design of logically rigorous coding based features. To show the effectiveness of the proposed method, this paper assumed the documents are successfully segmented into characters and extracted features from these isolated Myanmar characters. These features are extracted using structural analysis of the Myanmar scripts. The experimental results have been carried out using the Support Vector Machine (SVM) classifier and compare the pervious proposed feature extraction method.


2014 ◽  
Vol 568-570 ◽  
pp. 668-671
Author(s):  
Yi Long ◽  
Fu Rong Liu ◽  
Guo Qing Qiu

To address the problem that the dimension of the feature vector extracted by Local Binary Pattern (LBP) for face recognition is too high and Principal Component Analysis (PCA) extract features are not the best classification features, an efficient feature extraction method using LBP, PCA and Maximum scatter difference (MSD) has been introduced in this paper. The original face image is firstly divided into sub-images, then the LBP operator is applied to extract the histogram feature. and the feature dimensions are further reduced by using PCA. Finally,MSD is performed on the reduced PCA-based feature.The experimental results on ORL and Yale database demonstrate that the proposed method can classify more effectively and can get higher recognition rate than the traditional recognition methods.


Author(s):  
Soumia Kerrache ◽  
Beladgham Mohammed ◽  
Hamza Aymen ◽  
Kadri Ibrahim

Features extraction is an essential process in identifying person biometrics because the effectiveness of the system depends on it. Multiresolution Analysis success can be used in the system of a person’s identification and pattern recognition. In this paper, we present a feature extraction method for two-dimensional face and iris authentication.  Our approach is a combination of principal component analysis (PCA) and curvelet transform as an improved fusion approach for feature extraction. The proposed fusion approach involves image denoising using 2D-Curvelet transform to achieve compact representations of curves singularities. This is followed by the application of PCA as a fusion rule to improve upon the spatial resolution. The limitations of the only PCA algorithm are a poor recognition speed and complex mathematical calculating load, to reduce these limitations, we are applying the curvelet transform. <br /> To assess the performance of the presented method, we have employed three classification techniques: Neural networks (NN), K-Nearest Neighbor (KNN) and Support Vector machines (SVM).<br />The results reveal that the extraction of image features is more efficient using Curvelet/PCA.


2019 ◽  
Vol 892 ◽  
pp. 200-209
Author(s):  
Rayner Pailus ◽  
Rayner Alfred

Adaboost Viola-Jones method is indeed a profound discovery in detecting face images mainly because it is fast, light and one of the easiest methods of detecting face images among other techniques of face detection. Viola Jones uses Haar wavelet filter to detect face images and it produces almost 80%accuracy of face detection. This paper discusses proposed methodology and algorithms that involved larger library of filters used to create more discrimination features among the images by processing the proposed 15 Haar rectangular features (an extension from 4 Haar wavelet filters of Viola Jones) and used them in multiple adaptive ensemble process of detecting face image. After facial detection, the process continues with normalization processes by applying feature extraction such as PCA combined with LDA or LPP to extract our week learners’ wavelet for more classification features. Upon the process of feature extraction proposed feature selection to index these extracted data. These extracted vectors are used for training and creating MADBoost (Multiple Adaptive Diversified Boost)(an improvement of Adaboost, which uses multiple feature extraction methods combined with multiple classifiers) is able to capture, recognize and distinguish face image (s) faster. MADBoost applies the ensemble approach with better weights for classification to produce better face recognition results. Three experiments have been conducted to investigate the performance of the proposed MADBoost with three other classifiers, Neural Network (NN), Support Vector Machines (SVM) and Adaboost classifiers using Principal Component Analysis (PCA) as the feature extraction method. These experiments were tested against obstacles of POIES (Pose, Obstruction, Illumination, Expression, Sizes). Based on the results obtained, Madboost is found to be able to improve the recognition performance in matching failures, incorrect matching, matching success percentages and acceptable time taken to perform the classification task.


2017 ◽  
Vol 5 (1) ◽  
pp. 154-169 ◽  
Author(s):  
Galih Hendra Wibowo ◽  
Riyanto Sigit ◽  
Aliridho Barakbah

Javanese character is one of Indonesia's noble culture, especially in Java. However, the number of Javanese people who are able to read the letter has decreased so that there need to be conservation efforts in the form of a system that is able to recognize the characters. One solution to these problem lies in Optical Character Recognition (OCR) studies, where one of its heaviest points lies in feature extraction which is to distinguish each character. Shape Energy is one of feature extraction method with the basic idea of how the character can be distinguished simply through its skeleton. Based on the basic idea, then the development of feature extraction is done based on its components to produce an angular histogram with various variations of multiples angle. Furthermore, the performance test of this method and its basic method is performed in Javanese character dataset, which has been obtained from various images, is 240 data with 19 labels by using K-Nearest Neighbors as its classification method. Performance values were obtained based on the accuracy which is generated through the Cross-Validation process of 80.83% in the angular histogram with an angle of 20 degrees, 23% better than Shape Energy. In addition, other test results show that this method is able to recognize rotated character with the lowest performance value of 86% at 180-degree rotation and the highest performance value of 96.97% at 90-degree rotation. It can be concluded that this method is able to improve the performance of Shape Energy in the form of recognition of Javanese characters as well as robust to the rotation.


Author(s):  
Million Meshesha ◽  
C V Jawahar

In Africa around 2,500 languages are spoken. Some of these languages have their own indigenous scripts. Accordingly, there is a bulk of printed documents available in libraries, information centers, museums and offices. Digitization of these documents enables to harness already available information technologies to local information needs and developments. This paper presents an Optical Character Recognition (OCR) system for converting digitized documents in local languages. An extensive literature survey reveals that this is the first attempt that report the challenges towards the recognition of indigenous African scripts and a possible solution for Amharic script. Research in the recognition of African indigenous scripts faces major challenges due to (i) the use of large number characters in the writing and (ii) existence of large set of visually similar characters. In this paper, we propose a novel feature extraction scheme using principal component and linear discriminant analysis, followed by a decision directed acyclic graph based support vector machine classifier. Recognition results are presented on real-life degraded documents such as books, magazines and newspapers to demonstrate the performance of the recognizer.


Optical Character Recognition is a most recent field in area of pattern recognition and machine learning in last decade. In this article, the suitable techniques are designated for better character recognition in document into machine readable form. It is belonging with Content Based Image Retrieval (CBIR) system, which solve the delinquent of searching images in huge dataset. The recognition technique of handwritten character is not developed efficiently till, because of variations in size, shape, style, slats etc. in writing skill of human being. To overcome such problems, the part of concentration is feature extraction and algorithm that take care of such variation. In this paper independent component analysis is used for extracting features. For feature vector selection particle swarm optimization and firefly algorithms are applied. It is observed that due to distributed neighborhood pixel of an image, the PSO gives better recognition rates.


Optical Character Recognition (OCR) is an automatic reading of text components that are optically sensed to translate human-readable characters into machine-rea dable codes. In handwritten the style of writing vary from person to person, so it is very challenging task to segment and recognize the characters. In this paper we are proposing segmentation and feature extraction techniques to recognise camera captured, handwritten Kannada documents. The segmentation is done by using projection profile technique & Connected Component Analysis (CCA). The pre-processing technique to detect the edges of Kannada character, we have proposed our own technique by combining of Sobel and Canny edge detection. The feature selection and extraction is done in two level, global and local features. Global features are extracted from entire image. In local feature extraction we divided an input character image in to four quadrate based on centroid of character and we will extract local features from all quadrates rather than whole image. We have used Support vector machine (SVM) to classify the handwritten Kannada characters. To evaluate the efficiency of proposed system we have used KHDD dataset, our own document and character dataset. The experimental results shows that our proposed features selection and extraction achieved 96.31% of accuracy, results are encouraging


Sign in / Sign up

Export Citation Format

Share Document