gurmukhi script
Recently Published Documents


TOTAL DOCUMENTS

58
(FIVE YEARS 11)

H-INDEX

8
(FIVE YEARS 1)

Author(s):  
Sandhya Sharma ◽  
Sheifali Gupta ◽  
Neeraj Kumar ◽  
Tanvi Arora

Nowadays in the era of automation, the postal automation system is one of the major research areas. Developing a postal automation system for a nation like India is much troublesome than other nations because of India’s multi-script and multi-lingual behavior. This proposed work will be helpful in the postal automation of district names of Punjab (state) written in Gurmukhi script, which is the official language of the state in North India. For this, a holistic approach i.e. a segmentation-free technique has been used with the help of Convolutional Neural Network (CNN) and Deep learning (DL). For the purpose of recognition, a database of 22[Formula: see text]000 images (samples) which are handwritten in Gurmukhi script for all the 22 districts of Punjab is prepared. Each sample is written two times by 500 different writers generating 1000 samples for each district name. Two CNN models are proposed which are named as ConvNetGuru and ConvNetGuruMod for the purpose of recognition. Maximum validation accuracy achieved by ConvNetGuru is 90% and ConvNetGuruMod is 98%.


2020 ◽  
Vol 17 (6) ◽  
pp. 2674-2677
Author(s):  
Sandhya Sharma ◽  
Sheifali Gupta ◽  
Neeraj Kumar

Nowadays, we process all the important information of our lives electronically. Due to the involvement of computers in every sphere there may be a need to develop some efficient and fast techniques so that records can be easily transferred between people and computer systems. Offline text recognition provides an interface between humans and computers. Many researchers are working to recognize the text of Indian scripts like Bangla, Devanagari, Gurmukhi etc. but it is still a challenge to exchange data between people and computers due to the different writing style of the people and very little work has been done for Gurmukhi. In this article different accuracy results are reviewed which are achieved by different researchers using different classification techniques. Various classifiers for the recognition of characters like Support Vector Machine (SVM) based classifier (Upper zone classifier and Lower zone classifier), Hidden Markov Model (HMM) by using a set of features of the normalized x–y traces of the stroke, DCT2 feature set using Linear SVM classifier, Polynomial SVM with iDCT2 features, Multi layered perceptron (MLP) neural network and Knearest neighbor (KNN) etc. classifiers have been used.


Document clustering plays a central role in knowledge discovery and data mining by representing large data-sets into a certain number of data objects called clusters. Each cluster consists similar data objects in such a way that data objects in the same cluster are more similar and dissimilar to the data objects of other clusters. Document clustering technique for Gurmukhi script consists two phases namely: 1) Pre-processing phase 2) Processing phase. This paper concentrates pre-processing phase of document clustering technique for Gurmukhi script. The purpose of pre-processing phase is to convert unstructured text into structured text format. Various sub-phases of pre-processing phase are: segmentation, tokenization, removal of stop words, stemming, and normalization. The purpose of this paper is to present the significant role of pre-processing phase in an overall performance of document clustering technique for Gurmukhi script. The experimental results represent the significant role of pre-processing phase in terms of performance regarding assignment of data objects to the relevant clusters as well as in creation of meaningful cluster title list. .


Author(s):  
Harjeet Singh ◽  
R. K. Sharma ◽  
Rajesh Kumar ◽  
Karun Verma ◽  
Ravinder Kumar ◽  
...  

A Romanization system is used to convert some text of a source script to the Roman script through word by word mapping. The phonological characteristics of the source word are not lost. Only writing script is changed, without any changes in the spoken language. This paper presents a rule based approach for Romanization of Gurmukhi script proper nouns. The aim is to develop a lightweight Romanization system, which may produce multiple possible results for the same input word. The algorithm uses a list of Gurmukhi script characters along with their equivalent character combinations in Roman script. Direct mapping of Gurmukhi script characters to their equivalent Roman script character combinations does not produce efficient results, so some rules are applied to get the correct mappings. The rules are basically to place or remove the letter ‘a’ in between the mapped consonants. Three different sets of rules are applied to get three different Romanized outputs. All these outputs are acceptable for information extraction using pattern matching. In Gurmukhi, some words are written differently than these are pronounced. To handle such words, these words or part of these words are stored in a database table. Along with these words their Romanized form is also stored in second column. The table is used to directly pick the Romanization from the table and use it for Romanization of these words. The result of this Romanization system is a set of possible words that can be generated from the source script word. It enables an application to pattern match those output words with some text or database to get the required information


2019 ◽  
Vol 8 (2S3) ◽  
pp. 1484-1494

Segmentation is always an important step in designing an Optical Character Recognition (OCR) of any script. In this paper, we focus on the line and word segmentation in typewritten Gurmukhi script documents. In order to perform this task, we consider OCR based methodology where several processing steps are implemented. The typewritten documents suffer from several issues such as noise, skew, and quality of the document. In this work, we present a combined pre-processing scheme where document thresholding and skew detection and correction schemes are implemented where image thresholding is obtained using Niblack’s method and skew correction is carried out using gradient histogram algorithm and uniform orientation is obtained. Later, line segmentation scheme is applied where probability density function is applied to generate the text distribution in the probability map. Here, identifying the relation of the text to the exact line is a challenging task hence, we present a 2D-Gaussian modelling which helps to identify the text boundaries in the x and y direction. The proposed methodology is applied for typewritten Gurmukhi documents and an experimental study is carried out to show that the proposed approach achieves better performance when compared with the existing techniques


2019 ◽  
Vol 8 (2) ◽  
pp. 1646-1653

Document clustering is an unsupervised machine learning technique which designates the creation of classes of a certain number of similar objects without prior knowledge of data-sets. These classes of similar objects are known as clusters; each cluster consists unlabeled data objects in such a way that data objects within the same cluster have maximum similarity and have dissimilarity to the data objects of other groups. The purpose of this research work is to develop domain independent Gurmukhi script clustering technique. It is the first ever effort as no prior work has been done to develop domain independent clustering technique for Gurmukhi script. In this paper, a hybrid algorithm for the development of document clustering technique for Gurmukhi script has been developed. The experimental results of proposed document clustering technique reveal that the proposed hybrid technique performs better in terms of defining number of clusters, creation of meaningful cluster titles, and in terms of performance regarding assignment of real time unlabeled data sets to the relevant cluster as a result of various pre-processing steps like segmentation, stemming, normalization as well as extraction of named/noun entities, creation of cluster titles and placing text documents into relevant clusters using fuzzy term weight.


Sign in / Sign up

Export Citation Format

Share Document