scholarly journals Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity

2017 ◽  
Vol 12 (9) ◽  
pp. 1-7 ◽  
Author(s):  
Ifeanyi-Reuben Nkechi J. ◽  
Ugwu Chidiebere ◽  
Nwachukwu E. O.
Author(s):  
Adam Piotr Idczak

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.


2015 ◽  
Vol 104 (1) ◽  
pp. 63-74 ◽  
Author(s):  
Ondřej Klejch ◽  
Eleftherios Avramidis ◽  
Aljoscha Burchardt ◽  
Martin Popel

Abstract The tool described in this article has been designed to help MT developers by implementing a web-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance and n-gram statistics. We also present a demo server http://wmt.ufal.cz with WMT14 and WMT15 translations.


Author(s):  
ThanhThuong T. Huynh ◽  
TruongAn Phamnguyen ◽  
Nhon V. Do

To represent the text document more expressively, a kind of graph-based semantic model is proposed, in which more semantic information among keyphrases as well as the structural information of the text are incorporated. The method produces structured representations of texts by utilizing common, popular knowledge bases (e.g. DBpedia, Wikipedia) to acquire fine-grained information about concepts, entities, and their semantic relations, thus resulting in a knowledge-rich interpretation. We demonstrate the benefits of these representations in the task of document similarity measurement. Relevance evaluation between two documents is done by calculating the semantic similarity between two keyphrase graphs that represent them. Experimental results show that our approach outperforms standard baselines based on traditional document representations, and able to come close in performance to the specialized methods particularly tuned to this task on the specific dataset.


Sign in / Sign up

Export Citation Format

Share Document