An Integrated Process Based Natural Language Processing System

2020 ◽  
Vol 17 (4) ◽  
pp. 1842-1846
Author(s):  
Praveen Edward James ◽  
Mun Hou Kit ◽  
Chockalingam Aravind Vaithilingam ◽  
Alan Tan Wee Chiat

Natural Language Processing (NLP) systems involve Natural Language Understanding (NLU), Dialogue Management (DM) and Natural Language Generation (NLG). The purpose of this work involves integrating learning with examples and rule-based processing to design an NLP system. The design involves a three-stage processing framework, which combines syntactic generation, semantic extraction and a strong rule-based control. The syntactic generator generates syntax by aligning sentences with Part-of-Speech (POS) tags limited by the number of words in the lexicon. The semantic extractor extracts meaningful keywords from the queries raised. The above two modules are controlled by generalized rules by the rule-based controller module. The system is evaluated under different domains. The results reveal that the accuracy of the system is 92.33% on an average. The design process is simple, and the processing time is 2.12 seconds, which is minimal compared to similar statistical models. The performance of an NLP tool in a certain task can be estimated by the quality of its predictions on the classification of unseen data. The results reveal similar performance with existing systems indicating the possibility of usage for similar tasks. The system supports a vocabulary of about 700 words and can be used as an NLP module in a spoken dialogue system for various domains or task areas.

2015 ◽  
Vol 7 (1) ◽  
Author(s):  
Paula Carvalho ◽  
Mário J. Silva

This paper describes the main characteristics of SentiLex-PT, a sentiment lexicon designed for the extraction of sentiment and opinion about human entities in Portuguese texts. The potential of this resource is illustrated on its application to two types of corpora, the SentiCorpus-PT, a social media corpus, consisting of user comments to news articles, and a literary piece of the early twentieth century, The Poor (Os Pobres), by Raul Brandão. The data were processed by UNITEX, a natural language processing system based on dictionaries and grammars.


Sign in / Sign up

Export Citation Format

Share Document