Lexical classes based stop words categorization for Gujarati language

Author(s):  
Rajnish M. Rakholia ◽  
Jatinderkumar R. Saini
Keyword(s):  
Author(s):  
Manish M. Kayasth ◽  
Bharat C. Patel

The entire character recognition system is logically characterized into different sections like Scanning, Pre-processing, Classification, Processing, and Post-processing. In the targeted system, the scanned image is first passed through pre-processing modules then feature extraction, classification in order to achieve a high recognition rate. This paper describes mainly on Feature extraction and Classification technique. These are the methodologies which play an important role to identify offline handwritten characters specifically in Gujarati language. Feature extraction provides methods with the help of which characters can identify uniquely and with high degree of accuracy. Feature extraction helps to find the shape contained in the pattern. Several techniques are available for feature extraction and classification, however the selection of an appropriate technique based on its input decides the degree of accuracy of recognition. 


Author(s):  
Vishal A. Naik ◽  
Apurva A. Desai

In this article, an online handwritten word recognition system for the Gujarati language is presented by combining strokes, characters, punctuation marks, and diacritics. The authors have used a support vector machine classification algorithm with a radial basis function kernel. The authors used a hybrid features set. The hybrid feature set consists of directional features with curvature data. The authors have used a normalized chain code and zoning-based chain code features. Words are a combination of characters and diacritics. Recognized strokes require post-processing to form a word. The authors have used location-based and mapping rule-based post-processing methods. The authors have achieved an accuracy of 95.3% for individual characters, 91.5% for individual words, and 83.3% for sentences. The average processing time for individual characters is 0.071 seconds.


2021 ◽  
Vol 13 (3) ◽  
pp. 23-34
Author(s):  
Chandrakant D. Patel ◽  
◽  
Jayesh M. Patel

With the large quantity of information offered on-line, it's equally essential to retrieve correct information for a user query. A large amount of data is available in digital form in multiple languages. The various approaches want to increase the effectiveness of on-line information retrieval but the standard approach tries to retrieve information for a user query is to go looking at the documents within the corpus as a word by word for the given query. This approach is incredibly time intensive and it's going to miss several connected documents that are equally important. So, to avoid these issues, stemming has been extensively utilized in numerous Information Retrieval Systems (IRS) to extend the retrieval accuracy of all languages. These papers go through the problem of stemming with Web Page Categorization on Gujarati language which basically derived the stem words using GUJSTER algorithms [1]. The GUJSTER algorithm is based on morphological rules which is used to derived root or stem word from inflected words of the same class. In particular, we consider the influence of extracted a stem or root word, to check the integrity of the web page classification using supervised machine learning algorithms. This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in a WPC application for the Gujarati language with improved accuracy between from 63% to 98% through Machine Learning supervised models with standard ratio 80% as training and 20% as testing.


Sign in / Sign up

Export Citation Format

Share Document