Multi-aspect visual analytics on large-scale high-dimensional cyber security data

2013 ◽  
Vol 14 (1) ◽  
pp. 62-75 ◽  
Author(s):  
Victor Y Chen ◽  
Ahmad M Razip ◽  
Sungahn Ko ◽  
Cheryl Z Qian ◽  
David S Ebert

In this article, we present a visual analytics system, SemanticPrism, which aims to analyze large-scale high-dimensional cyber security datasets containing logs of a million computers. SemanticPrism visualizes the data from three different perspectives: spatiotemporal distribution, overall temporal trends, and pixel-based IP (Internet Protocol) address blocks. With each perspective, we use semantic zooming to present more detailed information. The interlinked visualizations and multiple levels of detail allow us to detect unexpected changes taking place in different dimensions of the data and to identify potential anomalies in the network. After comparing our approach to other submissions, we outline potential paths for future improvement.

2019 ◽  
Author(s):  
Robert Krueger ◽  
Johanna Beyer ◽  
Won-Dong Jang ◽  
Nam Wook Kim ◽  
Artem Sokolov ◽  
...  

AbstractFacetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.


2019 ◽  
Author(s):  
Junghoon Chae ◽  
Debsindhu Bhowmik ◽  
Heng Ma ◽  
Arvind Ramanathan ◽  
Chad Steed

AbstractMolecular Dynamics (MD) simulation have been emerging as an excellent candidate for understanding complex atomic and molecular scale mechanism of bio-molecules that control essential bio-physical phenomenon in a living organism. But this MD technique produces large-size and long-timescale data that are inherently high-dimensional and occupies many terabytes of data. Processing this immense amount of data in a meaningful way is becoming increasingly difficult. Therefore, specific dimensionality reduction algorithm using deep learning technique has been employed here to embed the high-dimensional data in a lower-dimension latent space that still preserves the inherent molecular characteristics i.e. retains biologically meaningful information. Subsequently, the results of the embedding models are visualized for model evaluation and analysis of the extracted underlying features. However, most of the existing visualizations for embeddings have limitations in evaluating the embedding models and understanding the complex simulation data. We propose an interactive visual analytics system for embeddings of MD simulations to not only evaluate and explain an embedding model but also analyze various characteristics of the simulations. Our system enables exploration and discovery of meaningful and semantic embedding results and supports the understanding and evaluation of results by the quantitatively described features of the MD simulations (even without specific labels).


Author(s):  
Amitabh Varshney ◽  
Jihad El-Sana ◽  
Francine Evans ◽  
Lucia Darsa ◽  
Bruno Costa ◽  
...  

Abstract Reconciling scene realism with interactivity has emerged as one of the most important areas in making virtual reality feasible for large-scale mechanical CAD datasets consisting of several millions of primitives. This paper surveys our research and related work for achieving interactivity without sacrificing realism in virtual reality walkthroughs and flythroughs of polygonal CAD datasets. We outline our recent work on efficient generation of triangle strips from polygonal models that takes advantage of compression of connectivity information. This results in substantial savings in rendering, transmission, and storage. We outline our work on genus-reducing simplifications as well as real-time view-dependent simplifications that allow on-the-fly selection amongst multiple levels of detail, based upon lighting and viewing parameters. Our method allows multiple levels of detail to coexist on the same object at different regions and to merge seamlessly without any cracks or shading artifacts. We also present an overview of our work on hardware-assisted image-based rendering that allows interactive exploration of computer-generated scenes.


2013 ◽  
Vol 13 (2) ◽  
pp. 159-183 ◽  
Author(s):  
Weijia Xu ◽  
Maria Esteva ◽  
Suyog D Jain ◽  
Varun Jain

To make decisions about the long-term preservation of and access to large digital collections, digital curators use information such as the collections’ digital object types, their contents and preservation risks, and how they are organized. To date, the process of analyzing a collection—from data gathering to exploratory analysis and final conclusions—has largely been conducted using linear review and pen and paper methods. To help curators analyze large-scale digital collections, we developed an interactive visual analytics application. We have put methods in place to summarize large and diverse information about the collection and to present it as integrated views. Multiple views can be linked or unlinked on demand to enable curators to identify trends and particularities at different levels of detail and to compare and contrast views. We describe two analysis workflows to illustrate how the application can be used to triage digital collections and facilitate collection management decision making and to provide access. After conducting a focus group study with domain specialists, we introduced features to address their concerns and needs.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

Author(s):  
Katherine L. Bryant ◽  
Dirk Jan Ardesch ◽  
Lea Roumazeilles ◽  
Lianne H. Scholtens ◽  
Alexandre A. Khrapitchev ◽  
...  

AbstractLarge-scale comparative neuroscience requires data from many species and, ideally, at multiple levels of description. Here, we contribute to this endeavor by presenting diffusion and structural MRI data from eight primate species that have not or rarely been described in the literature. The selected samples from the Primate Brain Bank cover a prosimian, New and Old World monkeys, and a great ape. We present preliminary labelling of the cortical sulci and tractography of the optic radiation, dorsal part of the cingulum bundle, and dorsal parietal–frontal and ventral temporal-frontal longitudinal white matter tracts. Both dorsal and ventral association fiber systems could be observed in all samples, with the dorsal tracts occupying much less relative volume in the prosimian than in other species. We discuss the results in the context of known primate specializations and present hypotheses for further research. All data and results presented here are available online as a resource for the scientific community.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xin Chen ◽  
Wei Hou ◽  
Sina Rashidian ◽  
Yu Wang ◽  
Xia Zhao ◽  
...  

AbstractOpioid overdose related deaths have increased dramatically in recent years. Combating the opioid epidemic requires better understanding of the epidemiology of opioid poisoning (OP). To discover trends and patterns of opioid poisoning and the demographic and regional disparities, we analyzed large scale patient visits data in New York State (NYS). Demographic, spatial, temporal and correlation analyses were performed for all OP patients extracted from the claims data in the New York Statewide Planning and Research Cooperative System (SPARCS) from 2010 to 2016, along with Decennial US Census and American Community Survey zip code level data. 58,481 patients with at least one OP diagnosis and a valid NYS zip code address were included. Main outcome and measures include OP patient counts and rates per 100,000 population, patient level factors (gender, age, race and ethnicity, residential zip code), and zip code level social demographic factors. The results showed that the OP rate increased by 364.6%, and by 741.5% for the age group > 65 years. There were wide disparities among groups by race and ethnicity on rates and age distributions of OP. Heroin and non-heroin based OP rates demonstrated distinct temporal trends as well as major geospatial variation. The findings highlighted strong demographic disparity of OP patients, evolving patterns and substantial geospatial variation.


Energies ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 2772
Author(s):  
Vishwas Powar ◽  
Rajendra Singh

Plummeting reserves and increasing demand of freshwater resources have culminated into a global water crisis. Desalination is a potential solution to mitigate the freshwater shortage. However, the process of desalination is expensive and energy-intensive. Due to the water-energy-climate nexus, there is an urgent need to provide sustainable low-cost electrical power for desalination that has the lowest impact on climate and related ecosystem challenges. For a large-scale reverse osmosis desalination plant, we have proposed the design and analysis of a photovoltaics and battery-based stand-alone direct current power network. The design methodology focusses on appropriate sizing, optimum tilt and temperature compensation techniques based on 10 years of irradiation data for the Carlsbad Desalination Plant in California, USA. A decision-tree approach is employed for ensuring hourly load-generation balance. The power flow analysis evaluates self-sufficient generation even during cloud cover contingencies. The primary goal of the proposed system is to maximize the utilization of generated photovoltaic power and battery energy storage with minimal conversions and transmission losses. The direct current based topology includes high-voltage transmission, on-the-spot local inversion, situational awareness and cyber security features. Lastly, economic feasibility of the proposed system is carried out for a plant lifetime of 30 years. The variable effect of utility-scale battery storage costs for 16–18 h of operation is studied. Our results show that the proposed design will provide low electricity costs ranging from 3.79 to 6.43 ¢/kWh depending on the debt rate. Without employing the concept of baseload electric power, photovoltaics and battery-based direct current power networks for large-scale desalination plants can achieve tremendous energy savings and cost reduction with negligible carbon footprint, thereby providing affordable water for all.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


Sign in / Sign up

Export Citation Format

Share Document