Document visualization: an overview of current research

2013 ◽  
Vol 6 (1) ◽  
pp. 19-36 ◽  
Author(s):  
Qihong Gan ◽  
Min Zhu ◽  
Mingzhao Li ◽  
Ting Liang ◽  
Yu Cao ◽  
...  
Author(s):  
N. I. Tikhonov

Collections of scientific publications are growing rapidly. Scientists have access to portals containing a large number of documents. Such a large amount of data is difficult to investigate. Methods of document visualization are used to reduce labor costs, search for necessary and similar documents, evaluate the scientific contribution of certain publications and reveal hidden links between documents. The methods of document visualization can be based on various models of document representation. In recent years, word embedding methods for natural language processing have become extremely popular. Following them, methods for analyzing text collections began to appear to obtain vector representations of documents. Although there are many document analyzing systems, new methods can give new understandings of collections, have better performance for analyzing large collections of documents, or find new relationships between documents. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. The text provides a brief description of the considered methods for analyzing collections of scientific publications, describes experiments with these methods, including the visualization of the results of the methods and a description of the problems that arise.


Author(s):  
Ka Kit Hoi ◽  
Dik Lun Lee ◽  
Jianliang Xu

2009 ◽  
Author(s):  
Delia Rusu ◽  
Blaž Fortuna ◽  
Dunja Mladenic ◽  
Marko Grobelnik ◽  
Ruben Sipoš

2014 ◽  
Vol 22 (1) ◽  
pp. 73-95 ◽  
Author(s):  
GÁBOR BEREND

AbstractKeyphrases are the most important phrases of documents that make them suitable for improving natural language processing tasks, including information retrieval, document classification, document visualization, summarization and categorization. Here, we propose a supervised framework augmented by novel extra-textual information derived primarily from Wikipedia. Wikipedia is utilized in such an advantageous way that – unlike most other methods relying on Wikipedia – a full textual index of all the Wikipedia articles is not required by our approach, as we only exploit the category hierarchy and a list of multiword expressions derived from Wikipedia. This approach is not only less resource intensive, but also produces comparable or superior results compared to previous similar works. Our thorough evaluations also suggest that the proposed framework performs consistently well on multiple datasets, being competitive or even outperforming the results obtained by other state-of-the-art methods. Besides introducing features that incorporate extra-textual information, we also experimented with a novel way of representing features that are derived from the POS tagging of the keyphrase candidates.


2014 ◽  
Vol 03 (02) ◽  
pp. 65-70
Author(s):  
Luiz Cláudio Santos Silva ◽  
Renelson Ribeiro Sampaio

2020 ◽  
Author(s):  
FRANCISCO CARLOS PALETTA ◽  
Armando Manuel Barreiros da Silva

In this work, we focus on the document visualization strategy to support the access to information in the digital era. First, we discuss the dynamics of the document visualization approach and the ability to generate innovations with a direct impact in the digital transformation competitive scenario. Second, we discuss the visualization and computational intelligence methods such as data mining and knowledge discovery as important tools to improve decision making process. Then we present the knowledge organization systems concept and the main challenges related to document visualization strategy. Finally, we discuss the digital and visual literacies have become common to how we read and view information and communicate with others to meet the demands of the transformations of the digital age.


Sign in / Sign up

Export Citation Format

Share Document