A study on high dimensional large-scale data visualization

Detection and realization of new trends from corpus are achieved through Emergent Trend Detection (ETD) methods, which is a principal application of text mining. This article discusses the influence of the Particle Swarm Optimization (PSO) on Dynamic Adaptive Self Organizing Maps (DASOM) in the design of an efficient ETD scheme by optimizing the neural parameters of the network. This hybrid machine learning scheme is designed to accomplish maximum accuracy with minimum computational time. The efficiency and scalability of the proposed scheme is analyzed and compared with standard algorithms such as SOM, DASOM and Linear Regression analysis. The system is trained and tested on DBLP database, University of Trier, Germany. The superiority of hybrid DASOM algorithm over the well-known algorithms in handling high dimensional large-scale data to detect emergent trends from the corpus is established in this article.

Download Full-text

Visualization of Large-Scale Distributed Data

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch011 ◽

2012 ◽

pp. 242-274

Author(s):

Jason Leigh ◽

Andrew Johnson ◽

Luc Renambot ◽

Venkatram Vishwanath ◽

Tom Peterka ◽

...

Keyword(s):

Distributed Computing ◽

Data Visualization ◽

High Speed ◽

Large Scale ◽

Distributed Data ◽

Large Scale Data ◽

Distribution Of Resources ◽

Integrated Facility ◽

Effective Visualization ◽

Scale Data

An effective visualization is best achieved through the creation of a proper representation of data and the interactive manipulation and querying of the visualization. Large-scale data visualization is particularly challenging because the size of the data is several orders of magnitude larger than what can be managed on an average desktop computer. Large-scale data visualization therefore requires the use of distributed computing. By leveraging the widespread expansion of the Internet and other national and international high-speed network infrastructure such as the National LambdaRail, Internet-2, and the Global Lambda Integrated Facility, data and service providers began to migrate toward a model of widespread distribution of resources. This chapter introduces different instantiations of the visualization pipeline and the historic motivation for their creation. The authors examine individual components of the pipeline in detail to understand the technical challenges that must be solved in order to ensure continued scalability. They discuss distributed data management issues that are specifically relevant to large-scale visualization. They also introduce key data rendering techniques and explain through case studies approaches for scaling them by leveraging distributed computing. Lastly they describe advanced display technologies that are now considered the “lenses” for examining large-scale data.

Download Full-text

Issues and Architectures in Large-Scale Data Visualization

Visualization Handbook ◽

10.1016/b978-012387582-2/50030-7 ◽

2005 ◽

pp. 551-567

Author(s):

CONSTANTINE PAVLAKOS ◽

PHILIP D. HEERMANN

Keyword(s):

Data Visualization ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

High-Resolution Interactive and Collaborative Data Visualization Framework for Large-Scale Data Analysis

2016 International Conference on Collaboration Technologies and Systems (CTS) ◽

10.1109/cts.2016.0059 ◽

2016 ◽

Cited By ~ 4

Author(s):

Simon Su ◽

Vincent Perry ◽

Nicholas Cantner ◽

Dylan Kobayashi ◽

Jason Leigh

Keyword(s):

Data Analysis ◽

High Resolution ◽

Data Visualization ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Mercator: a pipeline for multi-method, unsupervised visualization and distance generation

Bioinformatics ◽

10.1093/bioinformatics/btab037 ◽

2021 ◽

Author(s):

Zachary B Abrams ◽

Caitlin E Coombes ◽

Suli Li ◽

Kevin R Coombes

Keyword(s):

Large Scale ◽

R Package ◽

High Dimensional ◽

Vast Number ◽

Large Scale Data ◽

User Friendly ◽

Exploratory Pattern ◽

Scale Data ◽

Selection Of ◽

Publication Quality

Abstract Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).

Download Full-text