Graphs from Features: Tree-Based Graph Layout for Feature Analysis

Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.

Download Full-text

Are You Hooked on Paid Music Streaming?

International Journal of E-Business Research ◽

10.4018/ijebr.2018010101 ◽

2018 ◽

Vol 14 (1) ◽

pp. 1-20 ◽

Cited By ~ 4

Author(s):

Charlie C. Chen ◽

Steven Leon ◽

Makoto Nakayama

Keyword(s):

Data Analysis ◽

Music Industry ◽

Clear Understanding ◽

On Demand ◽

Behavioral Dynamics ◽

Revenue Sources ◽

Streaming Services ◽

Facilitating Conditions ◽

Communication Control ◽

Music Streaming

The proliferation of free on-demand music streaming services (e.g., Spotify) is offsetting the traditional revenue sources (e.g., purchases of downloads or CDs) of the music industry. In order to increase revenue and sustain business, the music industry is directing its efforts toward increasing paid subscriptions by converting free listeners into paying subscribers. However, most companies are struggling with these attempts because they lack a clear understanding of the psychological and social purchase motivations of consumers. This study compares and contrasts the two different phases of Millennial generation consumer behaviors: the alluring phase and the hooking phase. A survey was conducted with 73 paying users and 163 non-paying users of on-demand music streaming services. The authors' data analysis shows two separate behavioral dynamics seen between these groups of users. While social influence and attitude are primary drivers for the non-paying users in the alluring phase, facilitating conditions and communication control capacity play critical roles for the paying users in the hooking phase. These results imply that the music industry should apply different approaches to prospective and current customers of music streaming services.

Download Full-text

The economics of inpatient on-demand treatment for haemophilia with high-responding inhibitors: a US retrospective data analysis

Haemophilia ◽

10.1111/j.1365-2516.2011.02623.x ◽

2011 ◽

Vol 18 (2) ◽

pp. 284-290 ◽

Cited By ~ 8

Author(s):

S. M. POKRAS ◽

A. A. PETRILLA ◽

J. WEATHERALL ◽

W. C. LEE

Keyword(s):

Data Analysis ◽

Retrospective Data ◽

On Demand ◽

Retrospective Data Analysis

Download Full-text

Use of reorderable matrices and heatmaps to support data analysis of students transcripts

10.5753/sibgrapi.est.2019.8334 ◽

2019 ◽

Author(s):

Rafael Tavares Carvalho Barros ◽

Thiago Gonçalves Mendes ◽

Celmar Guimarães Da Silva

Keyword(s):

Data Analysis ◽

Visual Representations ◽

Numerical Approach ◽

Specific Subject ◽

On Demand ◽

Color Scale ◽

Student Grades ◽

Order Algorithm

For a course coordinator, the analysis of several students’ transcripts to identify the situation of subjects or students is often an old-fashioned process executed through a textual and numerical approach. This work is part of a larger project aimed at choosing appropriate visual representations to help course coordinators to analyze sets of students transcripts. In this work, we developed a system that allows the visualization of student transcripts through a heatmap of student grades per subject. The heatmap represent grades based on a user-defined color scale. To assist in the analysis, it is possible to reorder subjects and students using the optimal leaf order algorithm, or even to reorder according to the grades of a specific subject or student. In addition, some features have been developed to meet visual guidelines, such as overview, zoom, filter and details-on-demand.

Download Full-text

It Pays to Be Lazy: Reusing Force Approximations to Compute Better Graph Layouts Faster

10.31219/osf.io/wgzn5 ◽

2018 ◽

Cited By ~ 1

Author(s):

Robert Gove

Keyword(s):

Approximation Algorithms ◽

Approximation Algorithm ◽

Fast Multipole Method ◽

Quality Metrics ◽

Graph Layout ◽

Fast Multipole ◽

Improve Performance ◽

Running Time ◽

Multipole Method ◽

Graph Layouts

N-body simulations are common in applications ranging from physics simulations to computing graph layouts. The simulations are slow, but tree-based approximation algorithms like Barnes-Hut or the Fast Multipole Method dramatically improve performance. This paper proposes two new update schedules, uniform and dynamic, to make this type of approximation algorithm even faster by updating the approximation less often. An evaluation of these new schedules on computing graph layouts finds that the schedules typically decrease the running time by 9% to 18% for Barnes-Hut and 88% to 92% for the Fast Multipole Method. An experiment using 4 layout quality metrics on 50 graphs shows that the uniform schedule has similar or better graph layout quality compared to the standard Barnes-Hut or Fast Multipole Method algorithms.

Download Full-text

Force-Directed Graph Layouts by Edge Sampling

10.31219/osf.io/6q7ck ◽

2019 ◽

Author(s):

Robert Gove

Keyword(s):

Directed Graph ◽

Graph Visualization ◽

Graph Layout ◽

Spring Force ◽

Graph Layouts ◽

Speed Up ◽

Sampling Algorithms ◽

Edge Sampling ◽

Force Calculation ◽

Comparable Quality

Recent work shows that sampling algorithms can be an effective tool for graph visualization. This paper extends prior work by applying edge sampling algorithms to speed up the spring force calculation in force-directed graph layout algorithms. An experiment on 72 graphs finds that some sampling algorithms achieve comparable quality as no sampling. This result is confirmed with visualizations of the graph layout results. However, runtime improvements are small, especially for graphs with 10,000 vertices or fewer, indicating that the runtime savings might not be worth the risk to layout quality. Therefore, this paper suggests that accurate spring forces may be more important to force-directed graph layout algorithms than accurate electric forces. A copy of this paper plus the code and data to reproduce the results are available at https://osf.io/4ja29/

Download Full-text

ANALISIS FAKTOR-FAKTOR YANG MEMPENGARUHI PERMINTAAN SALAK PONDOH DI PASAR BUAH KOTA LHOKSEUMAWE

Jurnal Ekonomi Pertanian Unimal ◽

10.29103/jepu.v2i2.1693 ◽

2019 ◽

Vol 2 (2) ◽

pp. 93

Author(s):

Ita Puspita Sari ◽

Cut Putri Mellita Sari

Keyword(s):

Data Analysis ◽

Linear Regression ◽

Multiple Linear Regression ◽

Income Level ◽

Primary Data ◽

Analysis Method ◽

Income Levels ◽

On Demand ◽

Negative Effect ◽

Positive Effect

The purpose of this study was to know the effect of Salak Pondok Prices, Medan Salak Prices and Income Levels on the Demand for Salak Pondok in the Fruit Market of Lhokseumawe City. This study used primary data sourced from 100 respondents. The data analysis method used in this study was multiple linear regression with the help of Eviews 10. The results of the study showed that Salak Pondok prices had a positive effect on demand, while the Salak Medan prices had a negative effect on demand, and the income levels had an effect positively on demand, but simultaneously, Salak Pondok prices and Salak Medan prices, and income level had a positive effect on the demand for Salak Pondok in the fruit market of Lhokseumawe City and the magnitude of the effect of Salak Pondok prices, Salak Medan Prices and income levels on demand (R2) was 0.5340 (53.40%).

Download Full-text

Graph layout techniques and multidimensional data analysis

Institute of Mathematical Statistics Lecture Notes - Monograph Series - Game theory, optimal stopping, probability and statistics ◽

10.1214/lnms/1215089755 ◽

2000 ◽

pp. 219-248 ◽

Cited By ~ 9

Author(s):

Jan de Leeuw ◽

George Michailidis

Keyword(s):

Data Analysis ◽

Multidimensional Data ◽

Graph Layout ◽

Multidimensional Data Analysis

Download Full-text

Dynamic and on demand data streams

EPJ Web of Conferences ◽

10.1051/epjconf/201921404030 ◽

2019 ◽

Vol 214 ◽

pp. 04030

Author(s):

Matteo Duranti ◽

Valerio Formato ◽

Valerio Vagelli

Keyword(s):

Data Analysis ◽

Data Streams ◽

Original Data ◽

End Users ◽

Experiment Data ◽

On Demand ◽

Analysis Strategy ◽

Computing Centers ◽

Definition Of ◽

High Level

Replicability and efficiency of data processing on the same data samples are a major challenge for the analysis of data produced by HEP experiments. High level data analyzed by end-users are typically produced as a subset of the whole experiment data sample to study interesting selection of data (streams). For standard applications, streams may be eventually copied from servers and analyzed on local computing centers or user machine clients. The creation of streams as copy of a subset of the original data results in redundant information stored in filesystems and may be not efficient: if the definition of streams changes, it may force a reprocessing of the low-level files with consequent impact on the data analysis efficiency. We propose an approach based on a database of lookup tables intended for dynamic and on-demand definition of data streams. This enables the end-users, as the data analysis strategy evolves, to explore different definitions of streams with minimal cost in computing resources. We also present a prototype demonstration application of this database for the analysis of the AMS-02 experiment data.

Download Full-text