A Lemmatizer Tool for Assamese Language

Author(s):  
Arindam Roy ◽  
Sunita Sarkar ◽  
Hsubhas Borkakoty
Keyword(s):  
Author(s):  
Hemanta Konch

North-East is a hub of many ethnic languages. This region constitutes with eight major districts; like-Assam, Arunachal Pradesh, Nagaland, Manipur, Mizoram, Tripura, Meghalaya and Sikkim. Tutsa is a minor tribe of Arunachal Pradesh. The Tutsa was migrated from the place ‘RangkhanSanchik’ of the South-East Asia through ‘Hakmen-Haksan’ way to Arunachal Pradesh. The Tutsa community is mainly inhabited in Tirap district and southern part of Changlang district and a few people are co-exists in Tinsukia district of Assam. The Tutsa language belongs to the Naga group of Sino-Tibetan language family. According to the Report of UNESCO, the Tutsa language is in endangered level and it included in the EGIDS Level 6B. The language has no written literature; songs, folk tales, stories are found in a colloquial form. They use Roman Script. Due to the influence of other languages it causes lack of sincerity for the use of their languages in a united form. Now-a-days the new generation is attracted for using English, Hindi and Assamese language. No study is found till now in a scientific way about the language. So, in this prospect the topic Nominal Inflection of the Tutsa Language has been selected for study. It will help to preserve the language and also help in making of dictionary, Grammar and language guide book.


To bridge the language constraint of the people residing in northeastern region of India, machine translation system is a necessity. Large number of people in this region cannot access many services due to the language incomprehensibility. Among several languages spoken, Assamese is one of the major languages used in northeast India. Machine translation for Assamese language is limited compared to other languages. As a result, large number of people using Assamese language cannot avail lots of benefits associated with it. This paper has focused on the development of the English to Assamese translation system using n-gram model. The n-gram model works very well with the language pair having high dissimilarity in syntax compared to other models. The value of n has a very big role in the quality and efficiency of the system. Bilingual Evaluation Understudy (BLEU) score differs significantly with the change of the n-gram. This model uses tuples to reduce the consumption of excess memory and to accelerate the translation process. Parallel corpus has been used for training the n-gram based decoder called MARIE. The number of translation units extracted using n-gram model is much less than the translation units extracted using phrase based model. This has a high impact on system efficiency.


Multi-linguistic and multi-ethnic people inhabit in Assam. Amongst these languages Assamese language is being used as a state-language as well as means of communication in Assam. Moreover, Assamese being the main medium of instruction in govt. schools, every student receive formal education in the Assamese language irrespective of their multilingualism. The tribal language of Assam is used among the particular tribe only. These languages lack heritage as there is no script for the same. Some languages are written with the help of the script of other languages. Albeit, the languages could not be opulent in case of written literature. Moreover, the languages have much adversity, for which languages face arduous challenges in the path of development. The current unprecedented development of science and technology, the expansion of transportation and communication as well as educational development etc. has made this challenge more forceful. Also the restricted use of the languages has led to their endangerment. In this case the current situation of these languages of Assam, the problems of tribal languages as well as the development of the language and the obligations towards nurturing these languages are discussed in this study. This paper also describes the degree of endangerment of the tribal languages of Assam and assesses its vitality with reference to the factors proposed by UNESCO. At the end of the study we can conclude in at to keep alive these languages, the govt. as well as the integrated tribe should make proper language planning and take all necessary steps. The only appropriate and useful reliable language for present and future will encourage and attract the future generation to use the same


Sign in / Sign up

Export Citation Format

Share Document