Text Classification of Gujarati Newspaper Headlines
Text classification is an extremely important area of Natural Language Processing (NLP). This paper studies various methods for embedding and classification in the Gujarati language. The dataset comprises of Gujarati News Headlines classified into various categories. Different embedding methods for Gujarati language and various classifiers are used to classify the headlines into given categories. Gujarati is a low resource language. This language is not commonly worked upon. This paper deals with one of the most important NLP tasks - classification and along with it, an idea about various embedding techniques for Gujarati language can be obtained since they help in feature extraction for the process of classification. This paper first performs embedding to get a valid representation of the textual data and then uses already existing robust classifiers to perform classification over the embedded data. Additionally, the paper provides an insight into how various NLP tasks can be performed over a low resource language like Gujarati. Finally, the research paper carries out a comparative analysis between the performances of various existing methods of embedding and classification to get an idea of which combination gives a better outcome.