Random Forest Similarity Maps: A Scalable Visual Representation for Global and Local Interpretation

Dipankar Mazumdar; Mário Popolin Neto; Fernando V. Paulovich

doi:10.3390/electronics10222862

TV-MV Analytics: A visual analytics framework to explore time-varying multivariate data

Information Visualization ◽

10.1177/1473871619858937 ◽

2019 ◽

Vol 19 (1) ◽

pp. 3-23

Author(s):

Aurea Soriano-Vargas ◽

Bernd Hamann ◽

Maria Cristina F de Oliveira

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Multivariate Data ◽

Visual Exploration ◽

Data Sets ◽

Time Varying ◽

Domain Experts ◽

Data Mining Algorithms ◽

Temporal Relationships ◽

Visualization Techniques

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.

Download Full-text

Making the Unseen Visible

Perceptions of Knowledge Visualization - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-4666-4703-9.ch011 ◽

2014 ◽

pp. 277-331

Keyword(s):

Visual Analytics ◽

Web Search ◽

Visual Learning ◽

Network Visualization ◽

Knowledge Visualization ◽

Knowledge Maps ◽

Big Data Analytics and Visualization for Food Health Status Determination Using Bigmart Data

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch011 ◽

2020 ◽

pp. 179-205

Author(s):

Sumit Arun Hirve ◽

Pradeep Reddy C. H.

Keyword(s):

Visual Analytics ◽

Data Analytics ◽

Recommendation System ◽

Predictive Accuracy ◽

Big Data Analytics ◽

Huge Amount ◽

R Language ◽

Commercial Market ◽

The One ◽

Visualization Techniques

Being premature, the traditional data visualization techniques suffer from several challenges and lack the ability to handle a huge amount of data, particularly in gigabytes and terabytes. In this research, we propose an R-tool and data analytics framework for handling a huge amount of commercial market stored data and discover knowledge patterns from the dataset for conveying the derived conclusion. In this chapter, we elaborate on pre-processing a commercial market dataset using the R tool and its packages for information and visual analytics. We suggest a recommendation system based on the data which identifies if the food entry inserted into the database is hygienic or non-hygienic based on the quality preserved attributes. For a precise recommendation system with strong predictive accuracy, we will put emphasis on Algorithms such as J48 or Naive Bayes and utilize the one who outclasses the comparison based on accuracy. Such a system, when combined with R language, can be potentially used for enhanced decision making.

Download Full-text

Building a Visual Analytics Tool for Location-Based Services

Geospatial Research ◽

10.4018/978-1-4666-9845-1.ch028 ◽

2016 ◽

pp. 620-642 ◽

Cited By ~ 1

Author(s):

Erdem Kaya ◽

Mustafa Tolga Eren ◽

Candemir Doger ◽

Selim Saffet Balcisoy

Keyword(s):

Visual Analytics ◽

3D Visualization ◽

Location Based Services ◽

Task Completion ◽

Temporal Data ◽

Density Map ◽

Spatio Temporal ◽

Visualization Techniques ◽

Analytical Tools ◽

2D And 3D

Conventional visualization techniques and tools may need to be modified and tailored for analysis purposes when the data is spatio-temporal. However, there could be a number of pitfalls for the design of such analysis tools that completely rely on the well-known techniques with well-known limitations possibly due to the multidimensionality of spatio-temporal data. In this chapter, an experimental study to empirically testify whether widely accepted advantages and limitations of 2D and 3D representations are valid for the spatio-temporal data visualization is presented. The authors implemented two simple representations, namely density map and density cube, and conducted a laboratory experiment to compare these techniques from task completion time and correctness perspectives. Results of the experiment revealed that the validity of the generally accepted properties of 2D and 3D visualization needs to be reconsidered when designing analytical tools to analyze spatio-temporal data.

Download Full-text

Combining 2D and 3D Visualization with Visual Analytics in the Environmental Domain

Information ◽

10.3390/info13010007 ◽

2021 ◽

Vol 13 (1) ◽

pp. 7

Author(s):

Milena Vuckovic ◽

Johanna Schmidt ◽

Thomas Ortner ◽

Daniel Cornel

Keyword(s):

Data Visualization ◽

Visual Analytics ◽

Simulation Analysis ◽

3D Visualization ◽

Heavy Rain ◽

Daily Practice ◽

Environmental Domain ◽

2D Data ◽

Visualization Techniques ◽

2D And 3D

The application potential of Visual Analytics (VA), with its supporting interactive 2D and 3D visualization techniques, in the environmental domain is unparalleled. Such advanced systems may enable an in-depth interactive exploration of multifaceted geospatial and temporal changes in very large and complex datasets. This is facilitated by a unique synergy of modules for simulation, analysis, and visualization, offering instantaneous visual feedback of transformative changes in the underlying data. However, even if the resulting knowledge holds great potential for supporting decision-making in the environmental domain, the consideration of such techniques still have to find their way to daily practice. To advance these developments, we demonstrate four case studies that portray different opportunities in data visualization and VA in the context of climate research and natural disaster management. Firstly, we focus on 2D data visualization and explorative analysis for climate change detection and urban microclimate development through a comprehensive time series analysis. Secondly, we focus on the combination of 2D and 3D representations and investigations for flood and storm water management through comprehensive flood and heavy rain simulations. These examples are by no means exhaustive, but serve to demonstrate how a VA framework may apply to practical research.

Download Full-text

Building a Visual Analytics Tool for Location-Based Services

Big Data ◽

10.4018/978-1-4666-9840-6.ch028 ◽

2016 ◽

pp. 615-637

Author(s):

Erdem Kaya ◽

Mustafa Tolga Eren ◽

Candemir Doger ◽

Selim Saffet Balcisoy

Keyword(s):

Visual Analytics ◽

3D Visualization ◽

Location Based Services ◽

Task Completion ◽

Temporal Data ◽

Density Map ◽

Spatio Temporal ◽

Visualization Techniques ◽

Analytical Tools ◽

2D And 3D

Conventional visualization techniques and tools may need to be modified and tailored for analysis purposes when the data is spatio-temporal. However, there could be a number of pitfalls for the design of such analysis tools that completely rely on the well-known techniques with well-known limitations possibly due to the multidimensionality of spatio-temporal data. In this chapter, an experimental study to empirically testify whether widely accepted advantages and limitations of 2D and 3D representations are valid for the spatio-temporal data visualization is presented. The authors implemented two simple representations, namely density map and density cube, and conducted a laboratory experiment to compare these techniques from task completion time and correctness perspectives. Results of the experiment revealed that the validity of the generally accepted properties of 2D and 3D visualization needs to be reconsidered when designing analytical tools to analyze spatio-temporal data.

Download Full-text

Parameter estimation in distributed hydrological modelling: comparison of global and local optimisation techniques

Hydrology Research ◽

10.2166/nh.2007.024 ◽

2007 ◽

Vol 38 (4-5) ◽

pp. 451-476 ◽

Cited By ~ 31

Author(s):

R.-S. Blasone ◽

H. Madsen ◽

Dan Rosbjerg

Keyword(s):

Computational Time ◽

Fully Integrated ◽

Distributed Models ◽

Shuffled Complex Evolution ◽

Model Response ◽

Physically Based ◽

Gradient Based ◽

Complex Models ◽

Global And Local ◽

Local Optimisation

Much research has been spent in the last three decades in developing more effective and efficient automatic calibration procedures and in demonstrating their applicability to hydrological problems. Several problems have emerged when applying these procedures to calibration of conceptual rainfall–runoff and groundwater (GW) models, such as computational time, large number of calibration parameters, parameter identifiability, model response surface complexity, handling of multiple objectives and parameter equifinality. All these are expected to be much more severe for more complex models, for which comprehensive calibration studies have not so far been conducted. The scope of this paper is to investigate the performance of a global and a local optimisation technique, respectively, the Shuffled Complex Evolution algorithm and the gradient-based Gauss–Marquard–Levenberg algorithm, in calibration of physically based distributed models of different complexity. The models considered are a steady-state GW model, a transient GW model and a fully integrated model of the same catchment. The calibration is conducted in a multi-objective framework where two different aspects of the model response, the simulated runoff and the groundwater elevation are aggregated and simultaneously optimised. Different aggregated objective functions are used to give different weights to the calibration criteria. The results of the calibration procedures are compared in terms of effectiveness and efficiency and demonstrate the different performance of the methods. Moreover, a combination of the global and local techniques is investigated as an attempt to exploit the advantages of both procedures, while overcoming their drawbacks.

Download Full-text

Explainable Matrix — Visualization for Global and Local Interpretability of Random Forest Classification Ensembles

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2020.3030354 ◽

2020 ◽

pp. 1-1

Author(s):

Mario Popolin Neto ◽

Fernando V. Paulovich

Keyword(s):

Random Forest ◽

Random Forest Classification ◽

Forest Classification ◽

Matrix Visualization ◽

Global And Local

Download Full-text

Visual Analytics of Tuberculosis Detection Rat Performance

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v13i2.11465 ◽

2021 ◽

Vol 13 (2) ◽

Author(s):

Joan Jonathan ◽

Camilius Sanga ◽

Magesa Mwita ◽

Georgies Mgode

Keyword(s):

Random Forest ◽

Visual Analytics ◽

Detection Performance ◽

Large Sample Size ◽

Global Challenge ◽

Detection Technology ◽

Interactive Visualizations ◽

Diagnostic Approaches ◽

Accuracy Check ◽

Pouched Rats

The diagnosis of tuberculosis (TB) disease remains a global challenge, and the need for innovative diagnostic approaches is inevitable. Trained African giant pouched rats are the scent TB detection technology for operational research. The adoption of this technology is beneficial to countries with a high TB burden due to its cost-effectiveness and speed than microscopy. However, rats with some factors perform better. Thus, more insights on factors that may affect performance is important to increase rats’ TB detection performance. This paper intends to provide understanding on the factors that influence rats TB detection performance using visual analytics approach. Visual analytics provide insight of data through the combination of computational predictive models and interactive visualizations. Three algorithms such as Decision tree, Random Forest and Naive Bayes were used to predict the factors that influence rats TB detection performance. Hence, our study found that age is the most significant factor, and rats of ages between 3.1 to 6 years portrayed potentiality. The algorithms were validated using the same test data to check their prediction accuracy. The accuracy check showed that the random forest outperforms with an accuracy of 78.82% than the two. However, their accuracies difference is small. The study findings may help rats TB trainers, researchers in rats TB and Information system, and decision makers to improve detection performance. This study recommends further research that incorporates gender factors and a large sample size.

Download Full-text

Top-down transformation choice

Statistical Modelling ◽

10.1177/1471082x17748081 ◽

2018 ◽

Vol 18 (3-4) ◽

pp. 274-298 ◽

Cited By ~ 5

Author(s):

Torsten Hothorn

Keyword(s):

Simple Model ◽

Distribution Functions ◽

Model Fit ◽

Transformation Models ◽

Constant Variance ◽

Alternative Approach ◽

Complex Models ◽

Interpretable Models ◽

The Impact ◽

Simple Models

Simple models are preferred over complex models, but over-simplistic models could lead to erroneous interpretations. The classical approach is to start with a simple model, whose shortcomings are assessed in residual-based model diagnostics. Eventually, one increases the complexity of this initial overly simple model and obtains a better-fitting model. I illustrate how transformation analysis can be used as an alternative approach to model choice. Instead of adding complexity to simple models, step-wise complexity reduction is used to help identify simpler and better interpretable models. As an example, body mass index (BMI) distributions in Switzerland are modelled by means of transformation models to understand the impact of sex, age, smoking and other lifestyle factors on a person's BMI. In this process, I searched for a compromise between model fit and model interpretability. Special emphasis is given to the understanding of the connections between transformation models of increasing complexity. The models used in this analysis ranged from evergreens, such as the normal linear regression model with constant variance, to novel models with extremely flexible conditional distribution functions, such as transformation trees and transformation forests.

Download Full-text