A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization

Glyphs are graphical entities that convey one or more data values via attributes such as shape, size, color, and position. They have been widely used in the visualization of data and information, and are especially well suited for displaying complex, multivariate data sets. The placement or layout of glyphs on a display can communicate significant information regarding the data values themselves as well as relationships between data points, and a wide assortment of placement strategies have been developed to date. Methods range from simply using data dimensions as positional attributes to basing placement on implicit or explicit structure within the data set. This paper presents an overview of multivariate glyphs, a list of issues regarding the layout of glyphs, and a comprehensive taxonomy of placement strategies to assist the visualization designer in selecting the technique most suitable to his or her data and task. Examples, strengths, weaknesses, and design considerations are given for each category of technique. We conclude with some general guidelines for selecting a placement strategy, along with a brief description of some of our future research directions.

Download Full-text

Measuring the Efficiency of Free and Open Source Software Projects Using Data Envelopment Analysis

Emerging Free and Open Source Software Practices ◽

10.4018/978-1-59904-210-7.ch002 ◽

2007 ◽

pp. 25-46

Author(s):

Stefan Koch

Keyword(s):

Data Envelopment Analysis ◽

Open Source ◽

Optimization Method ◽

Future Research ◽

Data Envelopment ◽

Software Projects ◽

Data Set ◽

Output Factors ◽

Future Research Directions ◽

Using Data

In this chapter, we propose for the first time a method to compare the efficiency of free and open source projects, based on the data envelopment analysis (DEA) methodology. DEA offers several advantages in this context, as it is a non-parametric optimization method without any need for the user to define any relations between different factors or a production function, can account for economies or diseconomies of scale, and is able to deal with multi-input, multi-output systems in which the factors have different scales. Using a data set of 43 large F/OS projects retrieved from SourceForge.net, we demonstrate the application of DEA, and show that DEA indeed is usable for comparing the efficiency of projects. We will also show additional analyses based on the results, exploring whether the inequality in work distribution within the projects, the licensing scheme or the intended audience have an effect on their efficiency. As this is a first attempt at using this method for F/OS projects, several future research directions are possible. These include additional work on determining input and output factors, comparisons within application areas, and comparison to commercial or mixed-mode development projects.

Download Full-text

Measuring the Efficiency of Free and Open Source Software Projects Using Data Envelopment Analysis

Software Applications ◽

10.4018/978-1-60566-060-8.ch173 ◽

2009 ◽

pp. 2963-2977

Author(s):

Stefan Koch

Keyword(s):

Data Envelopment Analysis ◽

Open Source ◽

Optimization Method ◽

Future Research ◽

Data Envelopment ◽

Software Projects ◽

Data Set ◽

Output Factors ◽

Future Research Directions ◽

Using Data

In this chapter, we propose for the first time a method to compare the efficiency of free and open source projects, based on the data envelopment analysis (DEA) methodology. DEA offers several advantages in this context, as it is a non-parametric optimization method without any need for the user to define any relations between different factors or a production function, can account for economies or diseconomies of scale, and is able to deal with multi-input, multi-output systems in which the factors have different scales. Using a data set of 43 large F/OS projects retrieved from SourceForge.net, we demonstrate the application of DEA, and show that DEA indeed is usable for comparing the efficiency of projects. We will also show additional analyses based on the results, exploring whether the inequality in work distribution within the projects, the licensing schem,e or the intended audience have an effect on their efficiency. As this is a first attempt at using this method for F/OS projects, several future research directions are possible. These include additional work on determining input and output factors, comparisons within application areas, and comparison to commercial or mixed-mode development projects.

Download Full-text

A comparative user study of visualization techniques for cluster analysis of multidimensional data sets

Information Visualization ◽

10.1177/1473871620922166 ◽

2020 ◽

Vol 19 (4) ◽

pp. 318-338 ◽

Cited By ~ 1

Author(s):

Elio Ventocilla ◽

Maria Riveiro

Keyword(s):

User Study ◽

Quality Measures ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Novice Users ◽

Data Points ◽

Perceived Usability ◽

Multidimensional Data Sets ◽

Cluster Quality

This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, [Formula: see text], embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of [Formula: see text] as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with [Formula: see text] values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward’s hierarchical clustering) are likely to lead to estimates of [Formula: see text] that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.

Download Full-text

WEB APPLICATION FOR LARGE-SCALE MULTIDIMENSIONAL DATA VISUALIZATION

Mathematical Modelling and Analysis ◽

10.3846/13926292.2011.580381 ◽

2011 ◽

Vol 16 (1) ◽

pp. 273-285 ◽

Cited By ~ 4

Author(s):

Gintautas Dzemyda ◽

Virginijus Marcinkevičius ◽

Viktor Medvedev

Keyword(s):

Data Mining ◽

Data Visualization ◽

Web Application ◽

Large Scale ◽

Visual Presentation ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Multidimensional Data Visualization ◽

Multidimensional Data Set

In this paper, we present an approach of the web application (as a service) for data mining oriented to the multidimensional data visualization. This paper focuses on visualization methods as a tool for the visual presentation of large-scale multidimensional data sets. The proposed implementation of such a web application obtains a multidimensional data set and as a result produces a visualization of this data set. It also supports different configuration parameters of the data mining methods used. Parallel computation has been used in the proposed implementation to run the algorithms simultaneously on different computers.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

A Network Data Science Approach to People Analytics

Information Resources Management Journal ◽

10.4018/irmj.2019040102 ◽

2019 ◽

Vol 32 (2) ◽

pp. 28-51 ◽

Cited By ~ 3

Author(s):

Nan Wang ◽

Evangelos Katsamakas

Keyword(s):

Power Law ◽

Data Science ◽

Weighted Graph ◽

Future Research ◽

Network Data ◽

Development Organization ◽

Centrality Metrics ◽

Science Approach ◽

Future Research Directions ◽

Using Data

The best companies compete with people analytics. They maximize the business value of their people to gain competitive advantage. This article proposes a network data science approach to people analytics. Using data from a software development organization, the article models developer contributions to project repositories as a bipartite weighted graph. This graph is projected into a weighted one-mode developer network to model collaboration. Techniques applied include centrality metrics, power-law estimation, community detection, and complex network dynamics. Among other results, the authors validate the existence of power-law relationships on project sizes (number of developers). As a methodological contribution, the article demonstrates how network data science can be used to derive a broad spectrum of insights about employee effort and collaboration in organizations. The authors discuss implications for managers and future research directions.

Download Full-text

ON THE APPLICATION OF METHODS USED TO CALCULATE THE FRACTAL DIMENSION OF FRACTURE SURFACES

Fractals ◽

10.1142/s0218348x01000464 ◽

2001 ◽

Vol 09 (01) ◽

pp. 105-128 ◽

Cited By ~ 26

Author(s):

TAYFUN BABADAGLI ◽

KAYHAN DEVELI

Keyword(s):

Fractal Dimension ◽

Fractal Dimensions ◽

Natural Fracture ◽

Data Sets ◽

Data Set ◽

Fracture Surfaces ◽

Power Spectral ◽

Data Points ◽

2D Data ◽

Fracturing Mechanism

This paper presents an evaluation of the methods applied to calculate the fractal dimension of fracture surfaces. Variogram (applicable to 1D self-affine sets) and power spectral density analyses (applicable to 2D self-affine sets) are selected to calculate the fractal dimension of synthetic 2D data sets generated using fractional Brownian motion (fBm). Then, the calculated values are compared with the actual fractal dimensions assigned in the generation of the synthetic surfaces. The main factor considered is the size of the 2D data set (number of data points). The critical sample size that yields the best agreement between the calculated and actual values is defined for each method. Limitations and the proper use of each method are clarified after an extensive analysis. The two methods are also applied to synthetically and naturally developed fracture surfaces of different types of rocks. The methods yield inconsistent fractal dimensions for natural fracture surfaces and the reasons of this are discussed. The anisotropic feature of fractal dimension that may lead to a correlation of fracturing mechanism and multifractality of the fracture surfaces is also addressed.

Download Full-text

Knowledge Graph Analysis of Human Health Research Related to Climate Change

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17207395 ◽

2020 ◽

Vol 17 (20) ◽

pp. 7395

Author(s):

Yating Zhao ◽

Jingjing Guo ◽

Chao Bao ◽

Changyong Liang ◽

Hemant K Jain

Keyword(s):

Climate Change ◽

Human Health ◽

Future Research ◽

Research Directions ◽

Development Status ◽

Future Research Directions ◽

Knowledge Support ◽

Using Data ◽

Research Hotspots ◽

Impacts Of Climate Change

In order to explore the development status, knowledge base, research hotspots, and future research directions related to the impacts of climate change on human health, a systematic bibliometric analysis of 6719 published articles from 2003 to 2018 in the Web of Science was performed. Using data analytics tools such as HistCite and CiteSpace, the time distribution, spatial distribution, citations, and research hotspots were analyzed and visualized. The analysis revealed the development status of the research on the impacts of climate change on human health and analyzed the research hotspots and future development trends in this field, providing important knowledge support for researchers in this field.

Download Full-text

Variable selection study using Procrustes analysis

Open Journal of Archaeometry ◽

10.4081/arc.2013.e7 ◽

2013 ◽

Vol 1 (1) ◽

pp. 7 ◽

Cited By ~ 3

Author(s):

Casimiro S. Munita ◽

Lúcia P. Barroso ◽

Paulo M.S. Oliveira

Keyword(s):

Analytical Techniques ◽

Procrustes Analysis ◽

Multidimensional Data ◽

Data Sets ◽

Multivariate Statistical ◽

Data Set ◽

Essential Information ◽

Selection Of Variables ◽

Archaeological Ceramic ◽

Multidimensional Data Sets

Several analytical techniques are often used in archaeometric studies, and when used in combination, these techniques can be used to assess 30 or more elements. Multivariate statistical methods are frequently used to interpret archaeometric data, but their applications can be problematic or difficult to interpret due to the large number of variables. In general, the analyst first measures several variables, many of which may be found to be uninformative, this is naturally very time consuming and expensive. In subsequent studies the analyst may wish to measure fewer variables while attempting to minimize the loss of essential information. Such multidimensional data sets must be closely examined to draw useful information. This paper aims to describe and illustrate a stopping rule for the identification of redundant variables, and the selection of variables subsets, preserving multivariate data structure using Procrustes analysis, selecting those variables that are in some senses adequate for discrimination purposes. We provide an illustrative example of the procedure using a data set of 40 samples in which were determined the concentration of As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, and U obtained via instrumental neutron activation analysis (INAA) on archaeological ceramic samples. The results showed that for this data set, only eight variables (As, Cr, Fe, Hf, La, Nd, Sm, and Th) are required to interpret the data without substantial loss information.

Download Full-text