Comparative Analysis of N-gram Text Representation on Igbo Text Document Similarity

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.

Download Full-text

MT-ComparEval: Graphical evaluation interface for Machine Translation development

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0014 ◽

2015 ◽

Vol 104 (1) ◽

pp. 63-74 ◽

Cited By ~ 3

Author(s):

Ondřej Klejch ◽

Eleftherios Avramidis ◽

Aljoscha Burchardt ◽

Martin Popel

Keyword(s):

Comparative Analysis ◽

User Interface ◽

Machine Translation ◽

Graphical User Interface ◽

Statistical Significance ◽

Web Based ◽

Link Type ◽

Evaluation Panel ◽

N Gram

Abstract The tool described in this article has been designed to help MT developers by implementing a web-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance and n-gram statistics. We also present a demo server http://wmt.ufal.cz with WMT14 and WMT15 translations.

Download Full-text

Keyphrase Graph in Text Representation for Document Similarity Measurement

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200590 ◽

2020 ◽

Author(s):

ThanhThuong T. Huynh ◽

TruongAn Phamnguyen ◽

Nhon V. Do

Keyword(s):

Structural Information ◽

Knowledge Bases ◽

Similarity Measurement ◽

Document Similarity ◽

Fine Grained ◽

Text Document ◽

Structured Representations ◽

Popular Knowledge ◽

Relevance Evaluation ◽

To Come

To represent the text document more expressively, a kind of graph-based semantic model is proposed, in which more semantic information among keyphrases as well as the structural information of the text are incorporated. The method produces structured representations of texts by utilizing common, popular knowledge bases (e.g. DBpedia, Wikipedia) to acquire fine-grained information about concepts, entities, and their semantic relations, thus resulting in a knowledge-rich interpretation. We demonstrate the benefits of these representations in the task of document similarity measurement. Relevance evaluation between two documents is done by calculating the semantic similarity between two keyphrase graphs that represent them. Experimental results show that our approach outperforms standard baselines based on traditional document representations, and able to come close in performance to the specialized methods particularly tuned to this task on the specific dataset.

Download Full-text