gujarati language
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 35)

H-INDEX

4
(FIVE YEARS 1)

Author(s):  
Parita Shah ◽  
Priya Swaminarayan ◽  
Maitri Patel

<span>Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness (positive) or study (negative). This paper offers a correlation between the outcomes accomplished by applying the calculation arrangement using various classifiers for instance K-nearest neighbor and multinomial naive Bayes. These techniques are utilized to assess a significant assessment with either a positive remark or negative remark. The gathered information considered on the grounds of the extremity film datasets and an association with the results accessible proof has been created for a careful assessment. This paper investigates the word level count vectorizer and term frequency inverse document frequency (TF-IDF) influence on film sentiment analysis. We concluded that multinomial Naive Bayes (MNB) classier generate more accurate result using TF-IDF vectorizer compared to CountVectorizer, K-nearest-neighbors (KNN) classifier has the same accuracy result in case of TF-IDF and CountVectorizer.</span>


2021 ◽  
Vol 6 (4) ◽  
pp. 52-56
Author(s):  
Hemang Jani ◽  
Gauravi Dhruva ◽  
Dinesh Sorani

Background: The Short Form 36 Item Survey is the most typically used instrument for assessing health-related quality of life.1 Two identical versions of the initial instrument are currently available: the general public domain, license-free RAND-36, and also the commercial SF 36.2 RAND 36 don't seem to be available within the Gujarati language. The aim of this study was to translate and culturally adapt the RAND 36 into the Gujarati language and measure its reliability and validity. Methods: According to the guidelines by the International Quality of Life Assessment project, a test of item-scale correlation, a sequence of translation, and validation were implemented for the translation of the Gujarati version of the RAND-36. Following pilot testing, the English and the Gujarati versions of the RAND-36 were administered to a random sample of 120 apparently healthy individuals to test validity and 96 respondents completed the Gujarati RAND-36 again after two weeks to test reliability. Data were analyzed using one-way analysis of variance, multi-trait scaling analysis, one-way analysis of variance, Pearson’s product-moment correlation analysis, and Intra-Class Correlation (ICC) at p < 0.05 Results: The median Cronbach's alphas for the Gujarati RAND-36 in multiple subgroups exceeded 0.70 for every scale except one. Two of the English RAND-36 scales had median Cronbach's alphas that exceeded 0.70; the rest exceeded 0.50. Test-retest correlations were found statistically significant for both versions. Product-moment correlations to test the equivalence of the corresponding Gujarati and English versions of the RAND-36 ranged from 0.73 to 0.92. The Gujarati version of the RAND-36 has high internal consistency (Cronbach’s α=0.809) and test-retest reliability (Intra-class correlation coefficient=0.746, 95% CI: 0.58, 0.94). Conclusions: The Gujarati version of the RAND-36 performed well and the findings suggest that it is a reliable and valid measure of health-related quality of life among the general Gujarati population. Keywords: RAND-36, cross-cultural translation, quality of life, health status assessment, Gujarati.


2021 ◽  
Author(s):  
Charmi Jobanputra ◽  
Nihit Parikh ◽  
Vishwa Vora ◽  
Santosh Kumar Bharti

Author(s):  
Nasrin Aasofwala ◽  
Shanti Verma ◽  
Kalyani Patel

2021 ◽  
Vol 13 (3) ◽  
pp. 23-34
Author(s):  
Chandrakant D. Patel ◽  
◽  
Jayesh M. Patel

With the large quantity of information offered on-line, it's equally essential to retrieve correct information for a user query. A large amount of data is available in digital form in multiple languages. The various approaches want to increase the effectiveness of on-line information retrieval but the standard approach tries to retrieve information for a user query is to go looking at the documents within the corpus as a word by word for the given query. This approach is incredibly time intensive and it's going to miss several connected documents that are equally important. So, to avoid these issues, stemming has been extensively utilized in numerous Information Retrieval Systems (IRS) to extend the retrieval accuracy of all languages. These papers go through the problem of stemming with Web Page Categorization on Gujarati language which basically derived the stem words using GUJSTER algorithms [1]. The GUJSTER algorithm is based on morphological rules which is used to derived root or stem word from inflected words of the same class. In particular, we consider the influence of extracted a stem or root word, to check the integrity of the web page classification using supervised machine learning algorithms. This research work is intended to focus on the analysis of Web Page Categorization (WPC) of Gujarati language and concentrate on a research problem to do verify the influence of a stemming algorithm in a WPC application for the Gujarati language with improved accuracy between from 63% to 98% through Machine Learning supervised models with standard ratio 80% as training and 20% as testing.


Author(s):  
Stuti Mehta ◽  
Suman K. Mitra

Text classification is an extremely important area of Natural Language Processing (NLP). This paper studies various methods for embedding and classification in the Gujarati language. The dataset comprises of Gujarati News Headlines classified into various categories. Different embedding methods for Gujarati language and various classifiers are used to classify the headlines into given categories. Gujarati is a low resource language. This language is not commonly worked upon. This paper deals with one of the most important NLP tasks - classification and along with it, an idea about various embedding techniques for Gujarati language can be obtained since they help in feature extraction for the process of classification. This paper first performs embedding to get a valid representation of the textual data and then uses already existing robust classifiers to perform classification over the embedded data. Additionally, the paper provides an insight into how various NLP tasks can be performed over a low resource language like Gujarati. Finally, the research paper carries out a comparative analysis between the performances of various existing methods of embedding and classification to get an idea of which combination gives a better outcome.


Author(s):  
Uttam Chauhan ◽  
Apurva Shah

A topic model is one of the best stochastic models for summarizing an extensive collection of text. It has accomplished an inordinate achievement in text analysis as well as text summarization. It can be employed to the set of documents that are represented as a bag-of-words, without considering grammar and order of the words. We modeled the topics for Gujarati news articles corpus. As the Gujarati language has a diverse morphological structure and inflectionally rich, Gujarati text processing finds more complexity. The size of the vocabulary plays an important role in the inference process and quality of topics. As the vocabulary size increases, the inference process becomes slower and topic semantic coherence decreases. If the vocabulary size is diminished, then the topic inference process can be accelerated. It may also improve the quality of topics. In this work, the list of suffixes has been prepared that encounters too frequently with words in Gujarati text. The inflectional forms have been reduced to the root words concerning the suffixes in the list. Moreover, Gujarati single-letter words have been eliminated for faster inference and better quality of topics. Experimentally, it has been proved that if inflectional forms are reduced to their root words, then vocabulary length is shrunk to a significant extent. It also caused the topic formation process quicker. Moreover, the inflectional forms reduction and single-letter word removal enhanced the interpretability of topics. The interpretability of topics has been assessed on semantic coherence, word length, and topic size. The experimental results showed improvements in the topical semantic coherence score. Also, the topic size grew notably as the number of tokens assigned to the topics increased.


Sign in / Sign up

Export Citation Format

Share Document