scholarly journals Improving Polarity Classification for Financial News Using Semantic Similarity Techniques

2018 ◽  
Vol 14 (4) ◽  
pp. 39-54 ◽  
Author(s):  
Tan Li Im ◽  
Phang Wai San ◽  
Patricia Anthony ◽  
Chin Kim On

This article discusses polarity classification for financial news articles. The proposed Semantic Sentiment Analyser makes use of semantic similarity techniques, sentiment composition rules, and the Positivity/Negativity (P/N) ratio in performing polarity classification. An experiment was conducted to compare the performance of three semantic similarity metrics namely HSO, LESK, and LIN to find the semantically similar pair of word as the input word. The best similarity technique (HSO) is incorporated into the sentiment analyser to find the possible polarity carrier from the analysed text before performing polarity classification. The performance of the proposed Semantic Sentiment Analyser was evaluated using a set of manually annotated financial news articles. The results obtained from the experiment showed that the proposed SSA was able to achieve an F-Score of 90.89% for all cases classification.

2018 ◽  
Author(s):  
Prashanti Manda ◽  
Todd Vision

1AbstractSemantic similarity has been used for comparing genes, proteins, phenotypes, diseases, etc. for various biological applications. The rise of ontology-based data representation in biology has also led to the development of several semantic similarity metrics that use different statistics to estimate similarity.Although semantic similarity has become a crucial computational tool in several applications, there has not been a formal evaluation of the statistical sensitivity of these metrics and their ability to recognize similarity between distantly related biological objects.Here, we present a statistical sensitivity comparison of five semantic similarity metrics (Jaccard, Resnik, Lin, Jiang& Conrath, and Hybrid Relative Specificity Similarity) representing three different kinds of metrics (Edge based, Node based, and Hybrid) and explore key parameter choices that can impact sensitivity. Furthermore, we compare four methods of aggregating individual annotation similarities to estimate similarity between two biological objects - All Pairs, Best Pairs, Best Pairs Symmetric, and Groupwise.To evaluate sensitivity in a controlled fashion, we explore two different models for simulating data with varying levels of similarity and compare to the noise distribution using resampling. Source data are derived from the Phenoscape Knowledgebase of evolutionary phenotypes.Our results indicate that the choice of similarity metric along with different parameter choices can substantially affect sensitivity. Among the five metrics evaluated, we find that Resnik similarity shows the greatest sensitivity to weak semantic similarity. Among the ways to combine pairwise statistics, the Groupwise approach provides the greatest discrimination among values above the sensitivity threshold, while the Best Pairs statistic can be parametrically tuned to provide the highest sensitivity.Our findings serve as a guideline for an appropriate choice and parameterization of semantic similarity metrics, and point to the need for improved reporting of the statistical significance of semantic similarity matches in cases where weak similarity is of interest


2016 ◽  
Vol 43 (4) ◽  
pp. 458-479 ◽  
Author(s):  
María del Pilar Salas-Zárate ◽  
Rafael Valencia-García ◽  
Antonio Ruiz-Martínez ◽  
Ricardo Colomo-Palacios

Financial news plays a significant role with regard to predicting the behaviour of financial markets. However, the exponential growth of financial news on the Web has led to a need for new technologies that automatically collect and categorise large volumes of information in a fast and easy manner. Sentiment analysis, or opinion mining, is the field of study that analyses people’s opinions, moods and evaluations using written text on Web platforms. In recent research, a substantial effort has been made to develop sophisticated methods with which to classify sentiments in the financial domain. However, there is a lack of approaches that analyse the positive or negative orientation of each aspect contained in a document. In this respect, we propose a new sentiment analysis method for feature and news polarity classification. The method presented is based on an ontology-driven approach that makes it possible to semantically describe relations between concepts in the financial news domain. The polarity of the features in each document is also calculated by taking into account the words from around the linguistic expression of the feature. These words are obtained by using the ‘N_GRAM After’, ‘N_GRAM Before’, ‘N_GRAM Around’ and ‘All_Phrase’ methods. The effectiveness of our method has been proved by carrying out a set of experiments on a corpus of 1000 financial news items. Our proposal obtained encouraging results with an accuracy of 66.7% and an F-measure of 64.9% for feature polarity classification and an accuracy of 89.8% and an F-measure of 89.7% for news polarity classification. The experimental results additionally show that the N_GRAM Around method provides the best average results.


2016 ◽  
Author(s):  
Prashanti Manda ◽  
James P Balhoff ◽  
Todd J Vision

In phenotype annotations curated from the biological and medical literature, considerable human effort must be invested to select ontological classes that capture the expressivity of the original natural language descriptions, and finer annotation granularity can also entail higher computational costs for particular reasoning tasks. Do coarse annotations suffice for certain applications? Here, we measure how annotation granularity affects the statistical behavior of semantic similarity metrics. We use a randomized dataset of phenotype profiles drawn from 57,051 taxon-phenotype annotations in the Phenoscape Knowledgebase. We compared query profiles having variable proportions of matching phenotypes to subject database profiles using both pairwise and groupwise Jaccard (edge-based) and Resnik (node-based) semantic similarity metrics, and compared statistical performance for three different levels of annotation granularity: entities alone, entities plus attributes, and entities plus qualities (with implicit attributes). All four metrics examined showed more extreme values than expected by chance when approximately half the annotations matched between the query and subject profiles, with a more sudden decline for pairwise statistics and a more gradual one for the groupwise statistics. Annotation granularity had a negligible effect on the position of the threshold at which matches could be discriminated from noise. These results suggest that coarse annotations of phenotypes, at the level of entities with or without attributes, may be sufficient to identify phenotype profiles with statistically significant semantic similarity.


Sign in / Sign up

Export Citation Format

Share Document