The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization

2021 ◽  
Vol 169 ◽  
pp. 114510
Author(s):  
Jesus M. Sanchez-Gomez ◽  
Miguel A. Vega-Rodríguez ◽  
Carlos J. Pérez
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ankur Kulshrestha ◽  
Archana Patro

PurposeThe study aims to report empirical evidence on the impact of mandatory adoption of International Financial Reporting Standards (IFRS) in India on the voluntary intellectual capital reporting (ICR) and its value relevance. The study also tests the effect of term-weighting schemes used for information retrieval studies in the domain area of ICR.Design/methodology/approachThe study uses computational linguistics tools to measure ICR by Indian firms in the period 2014–2019. The study developed term frequencies for 23 ICR attributes using bag-of-words methodology from the annual reports. The word counts were used to construct two distinct measures of ICR, quantity and quality, deploying different term-weighting schemes, equal weighting and the term frequency-inverted document frequency (TF-IDF) weighting, respectively. A combination of parametric and non-parametric tests has been employed to examine the different hypothesis.FindingsThe quantity of ICR was found to have increased post-IFRS adoption. However, the quality of ICR had fallen significantly, which resulted in the loss of value relevance of ICR. Firms making higher disclosures but of inferior quality experienced suboptimal market returns. Variation in inter-firm ICR has reduced. Size effect and sector effect continue but have attenuated. The study acknowledges the enormous impact of term-weighting schemes, used for information retrieval studies, in the domain area of ICR.Practical implicationsThe study strongly adds to the momentum in favour of a formal ICR standard to improve its quality, restore its value relevance and facilitate more effective decision-making where the valuation of a firm is a critical input. The study presages the firms not to make poor-quality disclosures to avoid suboptimal stock performance.Originality/valueThe study sheds light on the impact of the adoption of post-IFRS on ICR in India. The study establishes the effect of term-weighting schemes, used for linguistic studies, in the domain area of ICR and adds to the literature by explaining one of the critical reasons for the dichotomy in ICR trends.


2021 ◽  
Author(s):  
Antonios Makris ◽  
Camila Leite da Silva ◽  
Vania Bogorny ◽  
Luis Otavio Alvares ◽  
Jose Antonio Macedo ◽  
...  

AbstractDuring the last few years the volumes of the data that synthesize trajectories have expanded to unparalleled quantities. This growth is challenging traditional trajectory analysis approaches and solutions are sought in other domains. In this work, we focus on data compression techniques with the intention to minimize the size of trajectory data, while, at the same time, minimizing the impact on the trajectory analysis methods. To this extent, we evaluate five lossy compression algorithms: Douglas-Peucker (DP), Time Ratio (TR), Speed Based (SP), Time Ratio Speed Based (TR_SP) and Speed Based Time Ratio (SP_TR). The comparison is performed using four distinct real world datasets against six different dynamically assigned thresholds. The effectiveness of the compression is evaluated using classification techniques and similarity measures. The results showed that there is a trade-off between the compression rate and the achieved quality. The is no “best algorithm” for every case and the choice of the proper compression algorithm is an application-dependent process.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 166578-166592
Author(s):  
Surender Singh Samant ◽  
N. L. Bhanu Murthy ◽  
Aruna Malapati

Author(s):  
Ananya Nandy ◽  
Andy Dong ◽  
Kosa Goucher-Lambert

Abstract In order to retrieve analogous designs for design-by-analogy, computational systems require the calculation of similarity between the target design and a repository of source designs. Representing designs as functional abstractions can support designers in practicing design-by-analogy by minimizing fixation on surface-level similarities. In addition, when a design is represented by a functional model using a function-flow format, many measures are available to determine functional similarity. In most current function-based design-by-analogy systems, the functions are represented as vectors and measures like cosine similarity are used to retrieve analogous designs. However, it is hypothesized that changing the similarity measure can significantly change the examples that are retrieved. In this paper, several similarity measures are empirically tested across a set of functional models of energy harvesting products. In addition, the paper explores representing the functional models as networks to find functionally similar designs using graph similarity measures. Surprisingly, the types of designs that are considered similar by vector-based and one of the graph similarity measures are found to vary significantly. Even among a set of functional models that share known similar technology, the different measures find inconsistent degrees of similarity — some measures find the set of models to be very similar and some find them to be very dissimilar. The findings have implications on the choice of similarity metric and its effect on finding analogous designs that, in this case, have similar pairs of functions and flows in their functional models. Since literature has shown that the types of designs presented can impact their effectiveness in aiding the design process, this work intends to spur further consideration of the impact of using different similarity measures when assessing design similarity computationally.


Sign in / Sign up

Export Citation Format

Share Document