cluster visualization
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 7)

H-INDEX

9
(FIVE YEARS 0)

Author(s):  
Yasufumi Takama ◽  
◽  
Yuna Tanaka ◽  
Yoshiyuki Mori ◽  
Hiroki Shibata

This paper proposes Treemap-based visualization for supporting cluster analysis of multi-dimensional data. It is important to grasp data distribution in a target dataset for such tasks as machine learning and cluster analysis. When dealing with multi-dimensional data such as statistical data and document datasets, dimensionality reduction algorithms are usually applied to project original data to lower-dimensional space. However, dimensionality reduction tends to lose the characteristics of data in the original space. In particular, the border between different data groups could not be represented correctly in lower-dimensional space. To overcome this problem, the proposed visualization method applies Fuzzy c-Means to target data and visualizes the result on the basis of the highest and the second-highest membership values with Treemap. Visualizing the information about not only the closest clusters but also the second closest ones is expected to be useful for identifying objects around the border between different clusters, as well as for understanding the relationship between different clusters. A prototype interface is implemented, of which the effectiveness is investigated with a user experiment on a news articles dataset. As another kind of text data, a case study of applying it to a word embedding space is also shown.


2021 ◽  
Author(s):  
Gabriel Araujo ◽  
Richard Francis ◽  
Cristina Ferreira ◽  
Alba Rangel

Background and Objectives: The dissimilarity matrix (DM) is an important component of phylogenetic analysis, and many software packages exist to build and show DMs. However, as the common input for this type of software are sequences in FASTA file format, the process of extracting and aligning each set of sequences to produce a big number of matrices can be laborious. Additionally, existing software does not facilitate the comparison of clusters of similarity across several DMs built for the same group of individuals, using different genomic regions. To address our requirements of such a tool, we designed Straintables to extract specific genomic region sequences from a group of intraspecies genomic assemblies, using extracted sequences to build dissimilarity matrices. Methods: A Python module with executable scripts was developed for a study on genetic diversity across strains of Toxoplasma gondii, being a general purpose system for DM calculation and visualization for preliminary phylogenetic studies. For automatic region sequence extraction from genomic assemblies we assembled a system that designs virtual primers using reference sequences located at genomic annotations, then matches those primers on genome files by using regex patterns. Extracted sequences are then aligned using Clustal Omega and compared to generate matrices. Results: Using this software saves the user from manual preparation and alignment of the sequences, a process that can be laborious when a large number of assemblies or regions are involved. The automatic sequence extraction process can be checked against BLAST results using the extracted sequence as queries, where correct results were observed for same-species pools for various organisms. The package also contains a matrix visualization tool focused on cluster visualization, capable of drawing matrices into image files with custom settings, and features methods of reordering matrices to facilitate the comparison of clustering patterns across two or more matrices. Conclusion: Straintables may replace and extend the functionality of existing matrix-oriented phylogenetic software, featuring automatic region extraction from genomic assemblies and enhanced matrix visualization capabilities emphasizing cluster identification. This module is open source, available at GitHub (https://github.com/Gab0/straintables) under a MIT license and also as a PIPY package.


Author(s):  
Tetsuro Kawano-Sugaya ◽  
Koji Yatsu ◽  
Tsuyoshi Sekizuka ◽  
Kentaro Itokawa ◽  
Masanori Hashino ◽  
...  

Abstract Summary Many of software for network visualization are available, but existing software have not been optimized to infection cluster visualization, especially the current worldwide invasion of COVID-19 since 2019. To reach the spatiotemporal understanding of epidemics, we have developed Haplotype Explorer. In Haplotype Explorer, users can explore the network interactively with metadata like accession number, locations, and collection dates. Time dependent transition of the network can be exported as continuous sections for making a movie. Here, we introduce features and products of Haplotype Explorer, demonstrating time-dependent snapshots and a movie of haplotype networks inferred from total of 4,282 SARS-CoV-2 genomes. Abstract The worldwide eruption of COVID-19 that began in Wuhan, China in late 2019 reached 10 million cases by late June 2020. In order to understand the epidemiological landscape of the COVID-19 pandemic, many studies have attempted to elucidate phylogenetic relationships between collected viral genome sequences using haplotype networks. However, currently available applications for network visualization are not suited to understand the COVID-19 epidemic spatiotemporally due to functional limitations, that motivated us to develop Haplotype Explorer, an intuitive tool for visualizing and exploring haplotype networks. Haplotype Explorer enables to dissect epidemiological consequences via interactive node filters and provides the perspective on infectious disease dynamics depend on regions and time, such as introduction, outbreak, expansion, and containment. Here, we demonstrate the effectiveness of Haplotype Explorer by showing features and an example of visualization. The demo using SARS-CoV-2 genomes are available at https://github.com/TKSjp/HaplotypeExplorer/blob/master/Example/. There are several examples using SARS-CoV-2 genomes and Dengue virus serotype 1 E-genes sequence.


2020 ◽  
Author(s):  
Tetsuro Kawano-Sugaya ◽  
Koji Yatsu ◽  
Tsuyoshi Sekizuka ◽  
Kentaro Itokawa ◽  
Masanori Hashino ◽  
...  

AbstractThe worldwide eruption of COVID-19 that began in Wuhan, China in late 2019 reached 10 million cases by late June 2020. In order to understand the epidemiological landscape of the COVID-19 pandemic, many studies have attempted to elucidate phylogenetic relationships between collected viral genome sequences using haplotype networks. However, currently available applications for network visualization are not suited to understand the COVID-19 epidemic spatiotemporally, due to functional limitations That motivated us to develop Haplotype Explorer, an intuitive tool for visualizing and exploring haplotype networks. Haplotype Explorer enables people to dissect epidemiological consequences via interactive node filters to provide spatiotemporal perspectives on multimodal spectra of infectious diseases, including introduction, outbreak, expansion, and containment, for given regions and time spans. Here, we demonstrate the effectiveness of Haplotype Explorer by showing an example of its visualization and features. The demo using SARS-CoV-2 genome sequences is available at https://github.com/TKSjp/HaplotypeExplorerSummaryA lot of software for network visualization are available, but existing software have not been optimized to infection cluster visualization against the current worldwide invasion of COVID-19 started since 2019. To reach the spatiotemporal understanding of its epidemics, we developed Haplotype Explorer. It is superior to other applications in the point of generating HTML distribution files with metadata searches which interactively reflects GISAID IDs, locations, and collection dates. Here, we introduce the features and products of Haplotype Explorer, demonstrating the time-dependent snapshots of haplotype networks inferred from total of 4,282 SARS-CoV-2 genomes.


Purpose. Modelling environmentally safe bioenergy trends based on national and international patent databases and scientific databases. Methods. Bibliometric method of analysis using the Scopus database and patent databases, modeling methods using a special visualization software package. Results. An analytical diagram based on the review of patent databases was developed, as well as a model for visualization of interrelationships between clusters of bioenergy development trends as a complex solution for environmental protection. Thus, 4 clusters were formed based on data from the Scopus database using VOSviewer software: 1) cluster (red) reveals the environmental problems of changing the direction of implementation of stationary energy sources with the development of bioenergy potential, and the creation of strategies for this development at the level of regions; 2) cluster (yellow) covers the process of restoration of ecological systems, in particular forests and reduction of CO2 emissions from bioenergy; 3) cluster (green) covers the production and use of different types of fuel and energy produced by the introduction and improvement of bioenergy technologies; 4) cluster (blue) covers the impact of bioenergy technologies on environmental restoration and purification and reduction of damage from anthropogenic impact. Conclusions. The analysis of patent databases with cluster visualization based on a bibliometric approach allowed to identify the most promising areas of research in the field of bioenergy solutions development. Further research will be focused on the development of a lab bench for biogenic gas production with the possibility of complex processing of secondary raw materials and obtaining environmentally safe digestates.


Author(s):  
Rodrigo Santos do Amor Divino Lima ◽  
Carlos Gustavo Resque dos Santos ◽  
Sandro de Paula Mendonça ◽  
Jefferson Magalhães de Morais ◽  
Bianchi Serique Meiguins

Sign in / Sign up

Export Citation Format

Share Document