End to End Parts of Speech Tagging and Named Entity Recognition in Bangla Language

Author(s):  
Jillur Rahman Saurav ◽  
Summit Haque ◽  
Farida Chowdhury
2019 ◽  
Vol 8 (2S3) ◽  
pp. 1028-1036

This paper presents a full abstraction for Indian languages, specifically Kannada, in the context of guided summarization. The proposed process generates the abstractive sum-mary by focusing on a unified presentation model with aspect based Information Extrac-tion (IE) rules and scheme based Templates. TF/IDF rules are used for classification into categories. Lexical analysis (like Parts Of Speech tagging and Named Entity Recognition) reduces prolixity, which leads to robust IE rules. Usage of Templates for sentence genera-tion makes the summaries succinct and information intensive. The IE rules are designed to accommodate the complexities of the considered languages. Later, the system aims to produce a guided summary of domain specific documents. An abstraction scheme is a collection of aspects and associated IE rules. Each abstraction scheme is designed based on a theme or subcategory. An extensive statistical and qualitative evaluation of the summaries generated by the system has been conducted and the results are found to be very promising.


2019 ◽  
Vol 8 (2S8) ◽  
pp. 1225-1233

This paper presents a full abstraction for Indian languages, specifically Kannada, in the context of guided summarization. The proposed process generates the abstractive summary by focusing on a unified presentation model with aspect based Information Extraction (IE) rules and scheme based Templates. TF/IDF rules are used for classification into categories. Lexical analysis (like Parts Of Speech tagging and Named Entity Recognition) reduces prolixity, which leads to robust IE rules. Usage of Templates for sentence generation makes the summaries succinct and information intensive. The IE rules are designed to accommodate the complexities of the considered languages. Later, the system aims to produce a guided summary of domain specific documents. An abstraction scheme is a collection of aspects and associated IE rules. Each abstraction scheme is designed based on a theme or subcategory. An extensive statistical and qualitative evaluation of the summaries generated by the system has been conducted and the results are found to be very promising.


2021 ◽  
Vol 11 (4) ◽  
pp. 1-13
Author(s):  
Arpitha Swamy ◽  
Srinath S.

Parts-of-speech (POS) tagging is a method used to assign the POS tag for every word present in the text, and named entity recognition (NER) is a process to identify the proper nouns in the text and to classify the identified nouns into certain predefined categories. A POS tagger and a NER system for Kannada text have been proposed utilizing conditional random fields (CRFs). The dataset used for POS tagging consists of 147K tokens, where 103K tokens are used for training and the remaining tokens are used for testing. The proposed CRF model for POS tagging of Kannada text obtained 91.3% of precision, 91.6% of recall, and 91.4% of f-score values, respectively. To develop the NER system for Kannada, the data required is created manually using the modified tag-set containing 40 labels. The dataset used for NER system consists of 16.5K tokens, where 70% of the total words are used for training the model, and the remaining 30% of total words are used for model testing. The developed NER model obtained the 94% of precision, 93.9% of recall, and 93.9% of F1-measure values, respectively.


2020 ◽  
Author(s):  
Hemant Yadav ◽  
Sreyan Ghosh ◽  
Yi Yu ◽  
Rajiv Ratn Shah

Sign in / Sign up

Export Citation Format

Share Document