Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics

Amirata Ghorbani; Dina Berenbaum; Maor Ivgi; Yuval Dafna; James Y. Zou

doi:10.3390/info13010015

Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics

Information ◽

10.3390/info13010015 ◽

2021 ◽

Vol 13 (1) ◽

pp. 15

Author(s):

Amirata Ghorbani ◽

Dina Berenbaum ◽

Maor Ivgi ◽

Yuval Dafna ◽

James Y. Zou

Keyword(s):

Research Topic ◽

Semantic Relationship ◽

Visualization Technique ◽

Tabular Data ◽

Feature Vectors ◽

Feature Importance ◽

Active Research ◽

Real World Datasets ◽

Python Package ◽

Feature Visualization

Interpretability is becoming an active research topic as machine learning (ML) models are more widely used to make critical decisions. Tabular data are one of the most commonly used modes of data in diverse applications such as healthcare and finance. Much of the existing interpretability methods used for tabular data only report feature-importance scores—either locally (per example) or globally (per model)—but they do not provide interpretation or visualization of how the features interact. We address this limitation by introducing Feature Vectors, a new global interpretability method designed for tabular datasets. In addition to providing feature-importance, Feature Vectors discovers the inherent semantic relationship among features via an intuitive feature visualization technique. Our systematic experiments demonstrate the empirical utility of this new method by applying it to several real-world datasets. We further provide an easy-to-use Python package for Feature Vectors.

Download Full-text

An approach for designing a developable and minimal ruled surfaces using the curvature theory

International Journal of Geometric Methods in Modern Physics ◽

10.1142/s0219887821500158 ◽

2020 ◽

Vol 18 (01) ◽

pp. 2150015

Author(s):

Fatma Güler

Keyword(s):

Computer Graphics ◽

Computer Aided Design ◽

Research Topic ◽

Ruled Surfaces ◽

Developable Surfaces ◽

Computer Aided ◽

Curvature Theory ◽

Active Research ◽

Aided Design

Developable surfaces are defined to be locally isometric to a plane. These surfaces can be formed by bending thin flat sheets of material, which makes them an active research topic in computer graphics, computer aided design, computational origami and manufacturing architecture. We obtain condition for developable and minimal ruled surfaces using rotation frame. Also, the validity of the theorems is illustrated with examples.

Download Full-text

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12312 ◽

2020 ◽

Vol 69 ◽

pp. 1255-1285

Author(s):

Ricardo Cardoso Pereira ◽

Miriam Seoane Santos ◽

Pedro Pereira Rodrigues ◽

Pedro Henriques Abreu

Keyword(s):

Missing Data ◽

Missing Values ◽

State Of The Art ◽

Data Imputation ◽

Tabular Data ◽

Missing Data Imputation ◽

Learning Techniques ◽

Real World Datasets ◽

And Training ◽

Machine Learning Models

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

Download Full-text

Data Visualization

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch013 ◽

2014 ◽

pp. 322-351 ◽

Cited By ~ 1

Author(s):

Ravishankar Palaniappan

Keyword(s):

Data Structure ◽

Data Visualization ◽

Real World ◽

Research Field ◽

Use Case ◽

Visualization Method ◽

Visualization Process ◽

Active Research ◽

Real World Datasets ◽

Qualitative Exploration

Data visualization has the potential to aid humanity not only in exploring and analyzing large volume datasets but also in identifying and predicting trends and anomalies/outliers in a “simple and consumable” approach. These are vital to good and timely decisions for business advantage. Data Visualization is an active research field, focusing on the different techniques and tools for qualitative exploration in conjunction with quantitative analysis of data. However, an increase in volume, multivariate, frequency, and interrelationships of data will make the data visualization process notoriously difficult. This necessitates “innovative and iterative” display techniques. Either overlooking any dimensions/relationships of data structure or choosing an unfitting visualization method will quickly lead to a humanitarian uninterpretable “junk chart,” which leads to incorrect inferences or conclusions. The purpose of this chapter is to introduce the different phases of data visualization and various techniques which help to connect and empower data to mine insights. It exemplifies on how “data visualization” helps to unravel the important, meaningful, and useful insights including trends and outliers from real world datasets, which might otherwise be unnoticed. The use case in this chapter uses both simulated and real-world datasets to illustrate the effectiveness of data visualization.

Download Full-text

Detection of Free-Form Copy-Move Forgery on Digital Images

Security and Communication Networks ◽

10.1155/2019/8124521 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Emre Gürbüz ◽

Guzin Ulutas ◽

Mustafa Ulutas

Keyword(s):

Success Rate ◽

Digital Images ◽

Research Topic ◽

High Rate ◽

High Success Rate ◽

Free Form ◽

Gaussian Filtering ◽

Human Eye ◽

Production And Distribution ◽

Active Research

Nowadays, production and distribution of digital images has become part of our life. Since digital images, which are important carriers of information, are considered as the concrete proofs of facts in many fields and they can be used as evidence in the courts of law, development of techniques to ensure image authenticity is an active research topic. Copy-move forgery is one of the most common manipulation techniques that are implemented on the digital images, and various techniques have been developed for detection of these kinds of forgeries. JPEG format, which presents the ability of making high rate compression without causing remarkable changes in the meaning of the image, is the most commonly used format on digital images. In this study, the topic of detecting free-form copy-move forgeries on digital images is covered. It has been observed that the developed technique is able to detect the professional forgeries in which the copied region is selected in free-form and which are almost impossible to be detected by human eye, with high success rate, and it is able to give successful results even if the image is exposed to postprocesses such as JPEG compression and Gaussian filtering, which make the detection of forgery harder.

Download Full-text

Integrated and Configurable Voice Activation and Speaker Verification System for a Robotic Exoskeleton Glove

Volume 10: 44th Mechanisms and Robotics Conference (MR) ◽

10.1115/detc2020-22365 ◽

2020 ◽

Author(s):

Yunfei Guo ◽

Wenda Xu ◽

Sarthak Pradhan ◽

Cesar Bravo ◽

Pinhas Ben-Tzvi

Keyword(s):

Deep Learning ◽

Speaker Verification ◽

Voice Recognition ◽

Research Topic ◽

Sufficient Accuracy ◽

Computing Power ◽

Robotic Exoskeleton ◽

Verification System ◽

Machine Interface ◽

Active Research

Abstract Efficient human-machine interface (HMI) for exoskeletons remains an active research topic, where sample methods have been proposed including using computer vision, EEG (electroencephalogram), and voice recognition. However, some of these methods lack sufficient accuracy, security, and portability. This paper proposes a HMI referred as integrated trigger-word configurable voice activation and speaker verification system (CVASV). The CVASV system is designed for embedded systems with limited computing power that can be applied to any exoskeleton platform. The CVASV system consists of two main sections, including an API based voice activation section and a deep learning based text-independent voice verification section. These two sections are combined into a system that allows the user to configure the activation trigger-word and verify the user’s command in real-time.

Download Full-text

HANDWRITING-BASED PERSONAL IDENTIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406004612 ◽

2006 ◽

Vol 20 (02) ◽

pp. 209-225 ◽

Cited By ~ 11

Author(s):

ZHENYU HE ◽

XINGE YOU ◽

YUAN YAN TANG ◽

BIN FANG ◽

JIANWEI DU

Keyword(s):

Good Method ◽

Research Topic ◽

Personal Identification ◽

Identification Accuracy ◽

Writer Identification ◽

Challenging Problem ◽

Elapsed Time ◽

Gaussian Density ◽

Generalized Gaussian Density ◽

Active Research

Handwriting-based personal identification, which is also called handwriting-based writer identification, is an active research topic in pattern recognition. Despite continuous effort, offline handwriting-based writer identification still remains as a challenging problem because writing features can only be extracted from the handwriting image. As a result, plenty of dynamic writing information, which is very valuable for writer identification, is unavailable for offline writer identification. In this paper, we present a novel wavelet-based Generalized Gaussian Density (GGD) method for offline writer identification. Compared with the 2-D Gabor model, which is currently widely acknowledged as a good method for offline handwriting identification, GGD method not only achieves a better identification accuracy but also greatly reduces the elapsed time on calculation in our experiments.

Download Full-text

A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions.

10.21203/rs.3.rs-45186/v3 ◽

2021 ◽

Author(s):

Alena Orlenko ◽

Jason H Moore

Keyword(s):

Random Forest ◽

Genetic Association ◽

Simulated Data ◽

Machine Learning Algorithms ◽

Theory Approach ◽

Rank Estimation ◽

Feature Importance ◽

Random Forest Models ◽

Additive Interactions ◽

Real World Datasets

Abstract Background: Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer’s, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model’s performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis. Results: To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions. Conclusions: By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.

Download Full-text

Chiral Catalysts Utilized in the Nucleophilic Addition of Dialkyl-zinc Reagents to Carbonyl Compounds

Letters in Organic Chemistry ◽

10.2174/1570178617666191220145038 ◽

2020 ◽

Vol 17 (8) ◽

pp. 571-585

Author(s):

Adnan Cetin

Keyword(s):

Carbonyl Compounds ◽

Nucleophilic Addition ◽

Bond Formation ◽

Research Topic ◽

Chiral Ligands ◽

Addition Reactions ◽

Chiral Catalyst ◽

Carbon Carbon Bond ◽

Drug Synthesis ◽

Active Research

The aim of this review is an overview of diverse dialkyl zinc pro-chiral aldehydes addition reactions. One way of conducting asymmetric reactions is through the use of chiral catalyst. Therefore, new chiral ligands have attracted considerable attention in organic chemistry. Carbon-carbon bond formation reactions are an active research topic. The addition of dialkyl zinc to pro-chiral aldehydes is one of these popular reactions. Also, the chiral amino alcohols are important substrates for drug synthesis. These chiral ligands can be prepared by a simple synthetic way from easily accessible starting materials. This article reviews current catalyst reactions with the addition of dialkyl zinc to carbonyl compounds.

Download Full-text

Mathematical morphology based on stochastic permutation orderings

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2021-0101 ◽

2021 ◽

Vol 5 (1) ◽

pp. 43-69

Author(s):

Olivier Lézoray

Keyword(s):

Mathematical Morphology ◽

Shortest Path ◽

Multivariate Data ◽

Shortest Paths ◽

Research Topic ◽

Spectral Information ◽

Crucial Importance ◽

Active Research

Abstract The extension of mathematical morphology to multivariate data has been an active research topic in recent years. In this paper we propose an approach that relies on the consensus combination of several stochastic permutation orderings. The latter are obtained by searching for a smooth shortest path on a graph representing an image. This path is obtained with a randomized version nearest of neighbors heuristics on a graph. The construction of the graph is of crucial importance and can be based on both spatial and spectral information to enable the obtaining of smoother shortest paths. The starting vertex of a path being taken at random, many different permutation orderings can be obtained and we propose to build a consensus ordering from several permutation orderings. We show the interest of the approach with both quantitative and qualitative results.

Download Full-text

Graph Based Learning for Hybrid Algorithm to Image Annotation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.179-180.685 ◽

2011 ◽

Vol 179-180 ◽

pp. 685-690

Author(s):

Xiao Hong Hu ◽

Xiao Lei Wang ◽

Xiu Ran Wei

Keyword(s):

Machine Learning ◽

Learning Community ◽

Hybrid Algorithm ◽

Image Annotation ◽

Research Topic ◽

Annotation Method ◽

Active Research ◽

Graph Based Learning ◽

New Framework

Graph based learning has been an active research topic in machine learning community as well as many application areas including image annotation recently. In order to exploit the correlation between keywords and images, we proposed a novel image annotation method via graph based learning and semantic fusion to estimate the probability of keywords being the caption of an image, and present a new framework to solve the problem. The experiments over Corel images have shown that this approach outperforms other methods and is effective for image annotation.

Download Full-text