scholarly journals Random Forest Similarity Maps: A Scalable Visual Representation for Global and Local Interpretation

Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2862
Author(s):  
Dipankar Mazumdar ◽  
Mário Popolin Neto ◽  
Fernando V. Paulovich

Machine Learning prediction algorithms have made significant contributions in today’s world, leading to increased usage in various domains. However, as ML algorithms surge, the need for transparent and interpretable models becomes essential. Visual representations have shown to be instrumental in addressing such an issue, allowing users to grasp models’ inner workings. Despite their popularity, visualization techniques still present visual scalability limitations, mainly when applied to analyze popular and complex models, such as Random Forests (RF). In this work, we propose Random Forest Similarity Map (RFMap), a scalable interactive visual analytics tool designed to analyze RF ensemble models. RFMap focuses on explaining the inner working mechanism of models through different views describing individual data instance predictions, providing an overview of the entire forest of trees, and highlighting instance input feature values. The interactive nature of RFMap allows users to visually interpret model errors and decisions, establishing the necessary confidence and user trust in RF models and improving performance.

2019 ◽  
Vol 19 (1) ◽  
pp. 3-23
Author(s):  
Aurea Soriano-Vargas ◽  
Bernd Hamann ◽  
Maria Cristina F de Oliveira

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.


Themes and examples examined in this chapter discuss the fast growing field of visualization. First, basic terms: data, information, knowledge, dimensions, and variables are discussed before going into the visualization issues. The next part of the text overviews some of the basics in visualization techniques: data-, information-, and knowledge-visualization, and tells about tools and techniques used in visualization such as data mining, clusters and biclustering, concept mapping, knowledge maps, network visualization, Web-search result visualization, open source intelligence, visualization of the Semantic Web, visual analytics, and tag cloud visualization. This is followed by some remarks on music visualization. The next part of the chapter is about the meaning and the role of visualization in various kinds of presentations. Discussion relates to concept visualization in visual learning, visualization in education, collaborative visualization, professions that employ visualization skills, and well-known examples of visualization that progress science. Comments on cultural heritage knowledge visualization conclude the chapter.


Author(s):  
Sumit Arun Hirve ◽  
Pradeep Reddy C. H.

Being premature, the traditional data visualization techniques suffer from several challenges and lack the ability to handle a huge amount of data, particularly in gigabytes and terabytes. In this research, we propose an R-tool and data analytics framework for handling a huge amount of commercial market stored data and discover knowledge patterns from the dataset for conveying the derived conclusion. In this chapter, we elaborate on pre-processing a commercial market dataset using the R tool and its packages for information and visual analytics. We suggest a recommendation system based on the data which identifies if the food entry inserted into the database is hygienic or non-hygienic based on the quality preserved attributes. For a precise recommendation system with strong predictive accuracy, we will put emphasis on Algorithms such as J48 or Naive Bayes and utilize the one who outclasses the comparison based on accuracy. Such a system, when combined with R language, can be potentially used for enhanced decision making.


2016 ◽  
pp. 620-642 ◽  
Author(s):  
Erdem Kaya ◽  
Mustafa Tolga Eren ◽  
Candemir Doger ◽  
Selim Saffet Balcisoy

Conventional visualization techniques and tools may need to be modified and tailored for analysis purposes when the data is spatio-temporal. However, there could be a number of pitfalls for the design of such analysis tools that completely rely on the well-known techniques with well-known limitations possibly due to the multidimensionality of spatio-temporal data. In this chapter, an experimental study to empirically testify whether widely accepted advantages and limitations of 2D and 3D representations are valid for the spatio-temporal data visualization is presented. The authors implemented two simple representations, namely density map and density cube, and conducted a laboratory experiment to compare these techniques from task completion time and correctness perspectives. Results of the experiment revealed that the validity of the generally accepted properties of 2D and 3D visualization needs to be reconsidered when designing analytical tools to analyze spatio-temporal data.


Information ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 7
Author(s):  
Milena Vuckovic ◽  
Johanna Schmidt ◽  
Thomas Ortner ◽  
Daniel Cornel

The application potential of Visual Analytics (VA), with its supporting interactive 2D and 3D visualization techniques, in the environmental domain is unparalleled. Such advanced systems may enable an in-depth interactive exploration of multifaceted geospatial and temporal changes in very large and complex datasets. This is facilitated by a unique synergy of modules for simulation, analysis, and visualization, offering instantaneous visual feedback of transformative changes in the underlying data. However, even if the resulting knowledge holds great potential for supporting decision-making in the environmental domain, the consideration of such techniques still have to find their way to daily practice. To advance these developments, we demonstrate four case studies that portray different opportunities in data visualization and VA in the context of climate research and natural disaster management. Firstly, we focus on 2D data visualization and explorative analysis for climate change detection and urban microclimate development through a comprehensive time series analysis. Secondly, we focus on the combination of 2D and 3D representations and investigations for flood and storm water management through comprehensive flood and heavy rain simulations. These examples are by no means exhaustive, but serve to demonstrate how a VA framework may apply to practical research.


Big Data ◽  
2016 ◽  
pp. 615-637
Author(s):  
Erdem Kaya ◽  
Mustafa Tolga Eren ◽  
Candemir Doger ◽  
Selim Saffet Balcisoy

Conventional visualization techniques and tools may need to be modified and tailored for analysis purposes when the data is spatio-temporal. However, there could be a number of pitfalls for the design of such analysis tools that completely rely on the well-known techniques with well-known limitations possibly due to the multidimensionality of spatio-temporal data. In this chapter, an experimental study to empirically testify whether widely accepted advantages and limitations of 2D and 3D representations are valid for the spatio-temporal data visualization is presented. The authors implemented two simple representations, namely density map and density cube, and conducted a laboratory experiment to compare these techniques from task completion time and correctness perspectives. Results of the experiment revealed that the validity of the generally accepted properties of 2D and 3D visualization needs to be reconsidered when designing analytical tools to analyze spatio-temporal data.


2007 ◽  
Vol 38 (4-5) ◽  
pp. 451-476 ◽  
Author(s):  
R.-S. Blasone ◽  
H. Madsen ◽  
Dan Rosbjerg

Much research has been spent in the last three decades in developing more effective and efficient automatic calibration procedures and in demonstrating their applicability to hydrological problems. Several problems have emerged when applying these procedures to calibration of conceptual rainfall–runoff and groundwater (GW) models, such as computational time, large number of calibration parameters, parameter identifiability, model response surface complexity, handling of multiple objectives and parameter equifinality. All these are expected to be much more severe for more complex models, for which comprehensive calibration studies have not so far been conducted. The scope of this paper is to investigate the performance of a global and a local optimisation technique, respectively, the Shuffled Complex Evolution algorithm and the gradient-based Gauss–Marquard–Levenberg algorithm, in calibration of physically based distributed models of different complexity. The models considered are a steady-state GW model, a transient GW model and a fully integrated model of the same catchment. The calibration is conducted in a multi-objective framework where two different aspects of the model response, the simulated runoff and the groundwater elevation are aggregated and simultaneously optimised. Different aggregated objective functions are used to give different weights to the calibration criteria. The results of the calibration procedures are compared in terms of effectiveness and efficiency and demonstrate the different performance of the methods. Moreover, a combination of the global and local techniques is investigated as an attempt to exploit the advantages of both procedures, while overcoming their drawbacks.


2021 ◽  
Vol 13 (2) ◽  
Author(s):  
Joan Jonathan ◽  
Camilius Sanga ◽  
Magesa Mwita ◽  
Georgies Mgode

The diagnosis of tuberculosis (TB) disease remains a global challenge, and the need for innovative diagnostic approaches is inevitable. Trained African giant pouched rats are the scent TB detection technology for operational research. The adoption of this technology is beneficial to countries with a high TB burden due to its cost-effectiveness and speed than microscopy. However, rats with some factors perform better. Thus, more insights on factors that may affect performance is important to increase rats’ TB detection performance. This paper intends to provide understanding on the factors that influence rats TB detection performance using visual analytics approach. Visual analytics provide insight of data through the combination of computational predictive models and interactive visualizations. Three algorithms such as Decision tree, Random Forest and Naive Bayes were used to predict the factors that influence rats TB detection performance. Hence, our study found that age is the most significant factor, and rats of ages between 3.1 to 6 years portrayed potentiality. The algorithms were validated using the same test data to check their prediction accuracy. The accuracy check showed that the random forest outperforms with an accuracy of 78.82% than the two. However, their accuracies difference is small. The study findings may help rats TB trainers, researchers in rats TB and Information system, and decision makers to improve detection performance. This study recommends further research that incorporates gender factors and a large sample size.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 274-298 ◽  
Author(s):  
Torsten Hothorn

Simple models are preferred over complex models, but over-simplistic models could lead to erroneous interpretations. The classical approach is to start with a simple model, whose shortcomings are assessed in residual-based model diagnostics. Eventually, one increases the complexity of this initial overly simple model and obtains a better-fitting model. I illustrate how transformation analysis can be used as an alternative approach to model choice. Instead of adding complexity to simple models, step-wise complexity reduction is used to help identify simpler and better interpretable models. As an example, body mass index (BMI) distributions in Switzerland are modelled by means of transformation models to understand the impact of sex, age, smoking and other lifestyle factors on a person's BMI. In this process, I searched for a compromise between model fit and model interpretability. Special emphasis is given to the understanding of the connections between transformation models of increasing complexity. The models used in this analysis ranged from evergreens, such as the normal linear regression model with constant variance, to novel models with extremely flexible conditional distribution functions, such as transformation trees and transformation forests.


Sign in / Sign up

Export Citation Format

Share Document