Using Penalized Regression with Parallel Coordinates for Visualization of Significance in High Dimensional Data

AbstractThe Parallel Coordinates Plot (PCP) is a popular technique for the exploration of high-dimensional data. In many cases, researchers apply it as an effective method to analyze and mine data. However, when today’s data volume is getting larger, visual clutter and data clarity become two of the main challenges in parallel coordinates plot. Although Arc Coordinates Plot (ACP) is a popular approach to address these challenges, few optimization and improvement have been made on it. In this paper, we do three main contributions on the state-of-the-art PCP methods. One approach is the improvement of visual method itself. The other two approaches are mainly on the improvement of perceptual scalability when the scale or the dimensions of the data turn to be large in some mobile and wireless practical applications. 1) We present an improved visualization method based on ACP, termed as double arc coordinates plot (DACP). It not only reduces the visual clutter in ACP, but use a dimension-based bundling method with further optimization to deals with the issues of the conventional parallel coordinates plot (PCP). 2)To reduce the clutter caused by the order of the axes and reveal patterns that hidden in the data sets, we propose our first dimensional reordering method, a contribution-based method in DACP, which is based on the singular value decomposition (SVD) algorithm. The approach computes the importance score of attributes (dimensions) of the data using SVD and visualize the dimensions from left to right in DACP according the score in SVD. 3) Moreover, a similarity-based method, which is based on the combination of nonlinear correlation coefficient and SVD algorithm, is proposed as well in the paper. To measure the correlation between two dimensions and explains how the two dimensions interact with each other, we propose a reordering method based on non-linear correlation information measurements. We mainly use mutual information to calculate the partial similarity of dimensions in high-dimensional data visualization, and SVD is used to measure global data. Lastly, we use five case scenarios to evaluate the effectiveness of DACP, and the results show that our approaches not only do well in visualizing multivariate dataset, but also effectively alleviate the visual clutter in the conventional PCP, which bring users a better visual experience.

Download Full-text

Targeted Inference Involving High-Dimensional Data Using Nuisance Penalized Regression

Journal of the American Statistical Association ◽

10.1080/01621459.2020.1737079 ◽

2020 ◽

pp. 1-15

Author(s):

Qiang Sun ◽

Heping Zhang

Keyword(s):

High Dimensional Data ◽

Penalized Regression ◽

High Dimensional

Download Full-text

A new metric on parallel coordinates and its application for high-dimensional data visualization

2015 International Conference on Advanced Technologies for Communications (ATC) ◽

10.1109/atc.2015.7388338 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tran Van Long

Keyword(s):

Data Visualization ◽

High Dimensional Data ◽

High Dimensional ◽

Parallel Coordinates

Download Full-text

An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data

Acta Mathematica Sinica English Series ◽

10.1007/s10114-018-7096-8 ◽

2018 ◽

Vol 34 (12) ◽

pp. 1892-1906 ◽

Cited By ~ 2

Author(s):

Yue Yong Shi ◽

Yu Ling Jiao ◽

Yong Xiu Cao ◽

Yan Yan Liu

Keyword(s):

High Dimensional Data ◽

Penalized Regression ◽

Alternating Direction Method ◽

High Dimensional ◽

Method Of Multipliers ◽

Alternating Direction

Download Full-text

Parallel Coordinates Based Visualization for High-Dimensional Data

Proceedings of the 2019 3rd International Conference on Big Data Research ◽

10.1145/3372454.3372460 ◽

2019 ◽

Author(s):

Weiyu Li ◽

Jiying Lang ◽

He Zhang ◽

Fei Yang ◽

Lei Zhang ◽

...

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Parallel Coordinates

Download Full-text

Interactive local clustering operations for high dimensional data in parallel coordinates

2010 IEEE Pacific Visualization Symposium (PacificVis) ◽

10.1109/pacificvis.2010.5429608 ◽

2010 ◽

Cited By ~ 17

Author(s):

Peihong Guo ◽

He Xiao ◽

Zuchao Wang ◽

Xiaoru Yuan

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Parallel Coordinates ◽

Local Clustering

Download Full-text

Penalized regression calibration: A method for the prediction of survival outcomes using complex longitudinal and high‐dimensional data

Statistics in Medicine ◽

10.1002/sim.9178 ◽

2021 ◽

Author(s):

Mirko Signorelli ◽

Pietro Spitali ◽

Cristina Al‐Khalili Szigyarto ◽

Roula Tsonaka ◽

Keyword(s):

High Dimensional Data ◽

Penalized Regression ◽

High Dimensional ◽

Survival Outcomes ◽

Regression Calibration ◽

Prediction Of Survival

Download Full-text

Proposing Robust LAD-Atan Penalty of Regression Model Estimation for High Dimensional Data

Baghdad Science Journal ◽

10.21123/bsj.2020.17.2.0550 ◽

2020 ◽

Vol 17 (2) ◽

pp. 0550

Author(s):

Ali Hameed Yousef ◽

Omar Abdulmohsin Ali

Keyword(s):

Variable Selection ◽

Regression Model ◽

High Dimensional Data ◽

Real Data ◽

Penalized Regression ◽

Good Method ◽

Superior Performance ◽

High Dimensional ◽

Absolute Deviation ◽

Heavy Tailed

The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.

Download Full-text

A Comparison of Multifactor Dimensionality Reduction and L1-Penalized Regression to Identify Gene-Gene Interactions in Genetic Association Studies

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1613 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 6

Author(s):

Stacey Winham ◽

Chong Wang ◽

Alison A Motsinger-Reif

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Human Genetics ◽

Association Studies ◽

High Dimensional Data ◽

Penalized Regression ◽

High Dimensional ◽

Gene Interactions ◽

Advantages And Disadvantages ◽

Main Effects

Recently, the amount of high-dimensional data has exploded, creating new analytical challenges for human genetics. Furthermore, much evidence suggests that common complex diseases may be due to complex etiologies such as gene-gene interactions, which are difficult to identify in high-dimensional data using traditional statistical approaches. Data-mining approaches are gaining popularity for variable selection in association studies, and one of the most commonly used methods to evaluate potential gene-gene interactions is Multifactor Dimensionality Reduction (MDR). Additionally, a number of penalized regression techniques, such as Lasso, are gaining popularity within the statistical community and are now being applied to association studies, including extensions for interactions. In this study, we compare the performance of MDR, the traditional lasso with L1 penalty (TL1), and the group lasso for categorical data with group-wise L1 penalty (GL1) to detect gene-gene interactions through a broad range of simulations.We find that each method has both advantages and disadvantages, and relative performance is context dependent. TL1 frequently over-fits, identifying false positive as well as true positive loci. MDR has higher power for epistatic models that exhibit independent main effects; for both Lasso methods, main effects tend to dominate. For purely epistatic models, GL1 has the best performance for lower minor allele frequencies, but MDR performs best for higher frequencies. These results provide guidance of when each approach might be best suited for detecting and characterizing interactions with different mechanisms.

Download Full-text

Parallel Coordinates: Visualization, Exploration and Classification of High-Dimensional Data

Springer Handbooks Comp.Statistics - Handbook of Data Visualization ◽

10.1007/978-3-540-33037-0_25 ◽

2007 ◽

pp. 643-680 ◽

Cited By ~ 6

Author(s):

Alfred Inselberg

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Parallel Coordinates

Download Full-text