The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization

PurposeThe study aims to report empirical evidence on the impact of mandatory adoption of International Financial Reporting Standards (IFRS) in India on the voluntary intellectual capital reporting (ICR) and its value relevance. The study also tests the effect of term-weighting schemes used for information retrieval studies in the domain area of ICR.Design/methodology/approachThe study uses computational linguistics tools to measure ICR by Indian firms in the period 2014–2019. The study developed term frequencies for 23 ICR attributes using bag-of-words methodology from the annual reports. The word counts were used to construct two distinct measures of ICR, quantity and quality, deploying different term-weighting schemes, equal weighting and the term frequency-inverted document frequency (TF-IDF) weighting, respectively. A combination of parametric and non-parametric tests has been employed to examine the different hypothesis.FindingsThe quantity of ICR was found to have increased post-IFRS adoption. However, the quality of ICR had fallen significantly, which resulted in the loss of value relevance of ICR. Firms making higher disclosures but of inferior quality experienced suboptimal market returns. Variation in inter-firm ICR has reduced. Size effect and sector effect continue but have attenuated. The study acknowledges the enormous impact of term-weighting schemes, used for information retrieval studies, in the domain area of ICR.Practical implicationsThe study strongly adds to the momentum in favour of a formal ICR standard to improve its quality, restore its value relevance and facilitate more effective decision-making where the valuation of a firm is a critical input. The study presages the firms not to make poor-quality disclosures to avoid suboptimal stock performance.Originality/valueThe study sheds light on the impact of the adoption of post-IFRS on ICR in India. The study establishes the effect of term-weighting schemes, used for linguistic studies, in the domain area of ICR and adds to the literature by explaining one of the critical reasons for the dichotomy in ICR trends.

Download Full-text

Evaluating the effect of compressing algorithms for trajectory similarity and classification problems

GeoInformatica ◽

10.1007/s10707-021-00434-1 ◽

2021 ◽

Author(s):

Antonios Makris ◽

Camila Leite da Silva ◽

Vania Bogorny ◽

Luis Otavio Alvares ◽

Jose Antonio Macedo ◽

...

Keyword(s):

Trajectory Analysis ◽

Similarity Measures ◽

Classification Problems ◽

Trajectory Data ◽

Compression Algorithms ◽

Time Ratio ◽

Ratio Speed ◽

Trajectory Similarity ◽

Real World Datasets ◽

The Impact

AbstractDuring the last few years the volumes of the data that synthesize trajectories have expanded to unparalleled quantities. This growth is challenging traditional trajectory analysis approaches and solutions are sought in other domains. In this work, we focus on data compression techniques with the intention to minimize the size of trajectory data, while, at the same time, minimizing the impact on the trajectory analysis methods. To this extent, we evaluate five lossy compression algorithms: Douglas-Peucker (DP), Time Ratio (TR), Speed Based (SP), Time Ratio Speed Based (TR_SP) and Speed Based Time Ratio (SP_TR). The comparison is performed using four distinct real world datasets against six different dynamically assigned thresholds. The effectiveness of the compression is evaluated using classification techniques and similarity measures. The results showed that there is a trade-off between the compression rate and the achieved quality. The is no “best algorithm” for every case and the choice of the proper compression algorithm is an application-dependent process.

Download Full-text

Improving Term Weighting Schemes for Short Text Classification in Vector Space Model

IEEE Access ◽

10.1109/access.2019.2953918 ◽

2019 ◽

Vol 7 ◽

pp. 166578-166592

Author(s):

Surender Singh Samant ◽

N. L. Bhanu Murthy ◽

Aruna Malapati

Keyword(s):

Vector Space ◽

Text Classification ◽

Vector Space Model ◽

Term Weighting ◽

Weighting Schemes ◽

Short Text ◽

Space Model

Download Full-text

An interpretation of index term weighting schemes based on document components

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '86 ◽

10.1145/253168.253226 ◽

1986 ◽

Author(s):

K. L. Kwok

Keyword(s):

Term Weighting ◽

Weighting Schemes ◽

Index Term

Download Full-text

Information-theoretic term weighting schemes for document clustering and classification

International Journal on Digital Libraries ◽

10.1007/s00799-014-0121-3 ◽

2014 ◽

Vol 16 (2) ◽

pp. 145-159 ◽

Cited By ~ 5

Author(s):

Weimao Ke

Keyword(s):

Document Clustering ◽

Term Weighting ◽

Weighting Schemes ◽

Information Theoretic ◽

Theoretic Term ◽

Clustering And Classification

Download Full-text

Co-operation of Biology Related Algorithms for Solving Opinion Mining Problems by Using Different Term Weighting Schemes

Informatics in Control, Automation and Robotics - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-319-55011-4_4 ◽

2017 ◽

pp. 73-90 ◽

Cited By ~ 1

Author(s):

Shakhnaz Akhmedova ◽

Eugene Semenkin ◽

Vladimir Stanovov

Keyword(s):

Opinion Mining ◽

Term Weighting ◽

Weighting Schemes

Download Full-text

An evaluation of evolved term-weighting schemes in information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05 ◽

10.1145/1099554.1099639 ◽

2005 ◽

Cited By ~ 2

Author(s):

Ronan Cummins ◽

Colm O'Riordan

Keyword(s):

Information Retrieval ◽

Term Weighting ◽

Weighting Schemes

Download Full-text

A Comparison of Vector and Network-Based Measures for Assessing Design Similarity

Volume 8: 32nd International Conference on Design Theory and Methodology (DTM) ◽

10.1115/detc2020-22424 ◽

2020 ◽

Author(s):

Ananya Nandy ◽

Andy Dong ◽

Kosa Goucher-Lambert

Keyword(s):

Functional Model ◽

Similarity Measures ◽

Current Function ◽

Surface Level ◽

Graph Similarity ◽

Functional Models ◽

Design By Analogy ◽

Computational Systems ◽

The Impact ◽

Target Design

Abstract In order to retrieve analogous designs for design-by-analogy, computational systems require the calculation of similarity between the target design and a repository of source designs. Representing designs as functional abstractions can support designers in practicing design-by-analogy by minimizing fixation on surface-level similarities. In addition, when a design is represented by a functional model using a function-flow format, many measures are available to determine functional similarity. In most current function-based design-by-analogy systems, the functions are represented as vectors and measures like cosine similarity are used to retrieve analogous designs. However, it is hypothesized that changing the similarity measure can significantly change the examples that are retrieved. In this paper, several similarity measures are empirically tested across a set of functional models of energy harvesting products. In addition, the paper explores representing the functional models as networks to find functionally similar designs using graph similarity measures. Surprisingly, the types of designs that are considered similar by vector-based and one of the graph similarity measures are found to vary significantly. Even among a set of functional models that share known similar technology, the different measures find inconsistent degrees of similarity — some measures find the set of models to be very similar and some find them to be very dissimilar. The findings have implications on the choice of similarity metric and its effect on finding analogous designs that, in this case, have similar pairs of functions and flows in their functional models. Since literature has shown that the types of designs presented can impact their effectiveness in aiding the design process, this work intends to spur further consideration of the impact of using different similarity measures when assessing design similarity computationally.

Download Full-text