document visualization Latest Research Papers

The patterns of thematic progression, greatly influenced by culture-peculiar thinking patterns and language-particular features, reflect the integration of form and meaning in the flow of information in discourse. The economic discourse has its distinct linguistic characteristics and important communicative purposes. Thus, related research on the thematic progression of economic discourse is important for us to understand the language use in the context of economic and financial activities and also has important implications for language learning and teaching, translation, automated language information processing etc. This study first employs CiteSpace, a document visualization tool, to review the existing related studies on the economic discourse in China, and then analyzes the major patterns of thematic progression in the economic discourse based on the English and Chinese reports from The Economist. Through our discussion, we aim to explore universals and peculiarities of thematic progression in English-Chinese economic discourse and discuss the reasons attributing to major distinctions between the two languages in terms of thematic progression.

Download Full-text

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2021.10.pp.032-039 ◽

2021 ◽

pp. 32-39

Author(s):

N. I. Tikhonov

Keyword(s):

Language Processing ◽

Document Representation ◽

Labor Costs ◽

Scientific Contribution ◽

Scientific Publications ◽

New Methods ◽

Text Collections ◽

Vector Representations ◽

Embedding Methods ◽

Document Visualization

Collections of scientific publications are growing rapidly. Scientists have access to portals containing a large number of documents. Such a large amount of data is difficult to investigate. Methods of document visualization are used to reduce labor costs, search for necessary and similar documents, evaluate the scientific contribution of certain publications and reveal hidden links between documents. The methods of document visualization can be based on various models of document representation. In recent years, word embedding methods for natural language processing have become extremely popular. Following them, methods for analyzing text collections began to appear to obtain vector representations of documents. Although there are many document analyzing systems, new methods can give new understandings of collections, have better performance for analyzing large collections of documents, or find new relationships between documents. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. The text provides a brief description of the considered methods for analyzing collections of scientific publications, describes experiments with these methods, including the visualization of the results of the methods and a description of the problems that arise.

Download Full-text

INFORMATION ACCESS IN THE DIGITAL ERA - DOCUMENT VISUALIZATION STRATEGY

10.31219/osf.io/wyjs7 ◽

2020 ◽

Author(s):

FRANCISCO CARLOS PALETTA ◽

Armando Manuel Barreiros da Silva

Keyword(s):

Computational Intelligence ◽

Information Access ◽

Digital Transformation ◽

Decision Making Process ◽

Access To Information ◽

Digital Era ◽

Knowledge Organization Systems ◽

Computational Intelligence Methods ◽

Visualization Strategy ◽

Document Visualization

In this work, we focus on the document visualization strategy to support the access to information in the digital era. First, we discuss the dynamics of the document visualization approach and the ability to generate innovations with a direct impact in the digital transformation competitive scenario. Second, we discuss the visualization and computational intelligence methods such as data mining and knowledge discovery as important tools to improve decision making process. Then we present the knowledge organization systems concept and the main challenges related to document visualization strategy. Finally, we discuss the digital and visual literacies have become common to how we read and view information and communicate with others to meet the demands of the transformations of the digital age.

Download Full-text

Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i02.9432 ◽

2019 ◽

Vol 15 (02) ◽

pp. 39

Author(s):

Nidheesh Melethadathil ◽

Jaap Heringa ◽

Bipin Nair ◽

Shyam Diwakar

Keyword(s):

Natural Language Processing ◽

Systems Biology ◽

Natural Language ◽

Language Processing ◽

Clustering Algorithms ◽

Scientific Publications ◽

User Query ◽

Online Databases ◽

Clustering Quality ◽

Document Visualization

<strong>With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.</strong>

Download Full-text

A Document Visualization Strategy Based on Semantic Multimedia Big Data

Pervasive Systems, Algorithms and Networks - Communications in Computer and Information Science ◽

10.1007/978-3-030-30143-9_4 ◽

2019 ◽

pp. 43-57 ◽

Cited By ~ 1

Author(s):

Antonio M. Rinaldi

Keyword(s):

Big Data ◽

Multimedia Big Data ◽

Visualization Strategy ◽

Document Visualization

Download Full-text

Information access in the digital era: document visualization strategy

Challenges and Opportunities for Knowledge Organization in the Digital Age ◽

10.5771/9783956504211-597 ◽

2018 ◽

pp. 597-605

Author(s):

Francisco Carlos Paletta ◽

Armando Malheiro da Silva

Keyword(s):

Information Access ◽

Digital Era ◽

Visualization Strategy ◽

Document Visualization

Download Full-text

- Text and Document Visualization

Interactive Data Visualization ◽

10.1201/b18379-14 ◽

2015 ◽

pp. 362-385

Keyword(s):

Document Visualization

Download Full-text

Exploiting extra-textual and linguistic information in keyphrase extraction

Natural Language Engineering ◽

10.1017/s1351324914000126 ◽

2014 ◽

Vol 22 (1) ◽

pp. 73-95 ◽

Cited By ~ 6

Author(s):

GÁBOR BEREND

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Keyphrase Extraction ◽

Textual Information ◽

Multiword Expressions ◽

Pos Tagging ◽

Multiple Datasets ◽

Document Visualization

AbstractKeyphrases are the most important phrases of documents that make them suitable for improving natural language processing tasks, including information retrieval, document classification, document visualization, summarization and categorization. Here, we propose a supervised framework augmented by novel extra-textual information derived primarily from Wikipedia. Wikipedia is utilized in such an advantageous way that – unlike most other methods relying on Wikipedia – a full textual index of all the Wikipedia articles is not required by our approach, as we only exploit the category hierarchy and a list of multiword expressions derived from Wikipedia. This approach is not only less resource intensive, but also produces comparable or superior results compared to previous similar works. Our thorough evaluations also suggest that the proposed framework performs consistently well on multiple datasets, being competitive or even outperforming the results obtained by other state-of-the-art methods. Besides introducing features that incorporate extra-textual information, we also experimented with a novel way of representing features that are derived from the POS tagging of the keyphrase candidates.

Download Full-text