scholarly journals Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics

Information ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 15
Author(s):  
Amirata Ghorbani ◽  
Dina Berenbaum ◽  
Maor Ivgi ◽  
Yuval Dafna ◽  
James Y. Zou

Interpretability is becoming an active research topic as machine learning (ML) models are more widely used to make critical decisions. Tabular data are one of the most commonly used modes of data in diverse applications such as healthcare and finance. Much of the existing interpretability methods used for tabular data only report feature-importance scores—either locally (per example) or globally (per model)—but they do not provide interpretation or visualization of how the features interact. We address this limitation by introducing Feature Vectors, a new global interpretability method designed for tabular datasets. In addition to providing feature-importance, Feature Vectors discovers the inherent semantic relationship among features via an intuitive feature visualization technique. Our systematic experiments demonstrate the empirical utility of this new method by applying it to several real-world datasets. We further provide an easy-to-use Python package for Feature Vectors.

2020 ◽  
Vol 18 (01) ◽  
pp. 2150015
Author(s):  
Fatma Güler

Developable surfaces are defined to be locally isometric to a plane. These surfaces can be formed by bending thin flat sheets of material, which makes them an active research topic in computer graphics, computer aided design, computational origami and manufacturing architecture. We obtain condition for developable and minimal ruled surfaces using rotation frame. Also, the validity of the theorems is illustrated with examples.


2020 ◽  
Vol 69 ◽  
pp. 1255-1285
Author(s):  
Ricardo Cardoso Pereira ◽  
Miriam Seoane Santos ◽  
Pedro Pereira Rodrigues ◽  
Pedro Henriques Abreu

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.


Author(s):  
Ravishankar Palaniappan

Data visualization has the potential to aid humanity not only in exploring and analyzing large volume datasets but also in identifying and predicting trends and anomalies/outliers in a “simple and consumable” approach. These are vital to good and timely decisions for business advantage. Data Visualization is an active research field, focusing on the different techniques and tools for qualitative exploration in conjunction with quantitative analysis of data. However, an increase in volume, multivariate, frequency, and interrelationships of data will make the data visualization process notoriously difficult. This necessitates “innovative and iterative” display techniques. Either overlooking any dimensions/relationships of data structure or choosing an unfitting visualization method will quickly lead to a humanitarian uninterpretable “junk chart,” which leads to incorrect inferences or conclusions. The purpose of this chapter is to introduce the different phases of data visualization and various techniques which help to connect and empower data to mine insights. It exemplifies on how “data visualization” helps to unravel the important, meaningful, and useful insights including trends and outliers from real world datasets, which might otherwise be unnoticed. The use case in this chapter uses both simulated and real-world datasets to illustrate the effectiveness of data visualization.


2019 ◽  
Vol 2019 ◽  
pp. 1-14
Author(s):  
Emre Gürbüz ◽  
Guzin Ulutas ◽  
Mustafa Ulutas

Nowadays, production and distribution of digital images has become part of our life. Since digital images, which are important carriers of information, are considered as the concrete proofs of facts in many fields and they can be used as evidence in the courts of law, development of techniques to ensure image authenticity is an active research topic. Copy-move forgery is one of the most common manipulation techniques that are implemented on the digital images, and various techniques have been developed for detection of these kinds of forgeries. JPEG format, which presents the ability of making high rate compression without causing remarkable changes in the meaning of the image, is the most commonly used format on digital images. In this study, the topic of detecting free-form copy-move forgeries on digital images is covered. It has been observed that the developed technique is able to detect the professional forgeries in which the copied region is selected in free-form and which are almost impossible to be detected by human eye, with high success rate, and it is able to give successful results even if the image is exposed to postprocesses such as JPEG compression and Gaussian filtering, which make the detection of forgery harder.


Author(s):  
Yunfei Guo ◽  
Wenda Xu ◽  
Sarthak Pradhan ◽  
Cesar Bravo ◽  
Pinhas Ben-Tzvi

Abstract Efficient human-machine interface (HMI) for exoskeletons remains an active research topic, where sample methods have been proposed including using computer vision, EEG (electroencephalogram), and voice recognition. However, some of these methods lack sufficient accuracy, security, and portability. This paper proposes a HMI referred as integrated trigger-word configurable voice activation and speaker verification system (CVASV). The CVASV system is designed for embedded systems with limited computing power that can be applied to any exoskeleton platform. The CVASV system consists of two main sections, including an API based voice activation section and a deep learning based text-independent voice verification section. These two sections are combined into a system that allows the user to configure the activation trigger-word and verify the user’s command in real-time.


Author(s):  
ZHENYU HE ◽  
XINGE YOU ◽  
YUAN YAN TANG ◽  
BIN FANG ◽  
JIANWEI DU

Handwriting-based personal identification, which is also called handwriting-based writer identification, is an active research topic in pattern recognition. Despite continuous effort, offline handwriting-based writer identification still remains as a challenging problem because writing features can only be extracted from the handwriting image. As a result, plenty of dynamic writing information, which is very valuable for writer identification, is unavailable for offline writer identification. In this paper, we present a novel wavelet-based Generalized Gaussian Density (GGD) method for offline writer identification. Compared with the 2-D Gabor model, which is currently widely acknowledged as a good method for offline handwriting identification, GGD method not only achieves a better identification accuracy but also greatly reduces the elapsed time on calculation in our experiments.


2021 ◽  
Author(s):  
Alena Orlenko ◽  
Jason H Moore

Abstract Background: Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer’s, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model’s performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis. Results: To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions. Conclusions: By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.


2020 ◽  
Vol 17 (8) ◽  
pp. 571-585
Author(s):  
Adnan Cetin

The aim of this review is an overview of diverse dialkyl zinc pro-chiral aldehydes addition reactions. One way of conducting asymmetric reactions is through the use of chiral catalyst. Therefore, new chiral ligands have attracted considerable attention in organic chemistry. Carbon-carbon bond formation reactions are an active research topic. The addition of dialkyl zinc to pro-chiral aldehydes is one of these popular reactions. Also, the chiral amino alcohols are important substrates for drug synthesis. These chiral ligands can be prepared by a simple synthetic way from easily accessible starting materials. This article reviews current catalyst reactions with the addition of dialkyl zinc to carbonyl compounds.


2021 ◽  
Vol 5 (1) ◽  
pp. 43-69
Author(s):  
Olivier Lézoray

Abstract The extension of mathematical morphology to multivariate data has been an active research topic in recent years. In this paper we propose an approach that relies on the consensus combination of several stochastic permutation orderings. The latter are obtained by searching for a smooth shortest path on a graph representing an image. This path is obtained with a randomized version nearest of neighbors heuristics on a graph. The construction of the graph is of crucial importance and can be based on both spatial and spectral information to enable the obtaining of smoother shortest paths. The starting vertex of a path being taken at random, many different permutation orderings can be obtained and we propose to build a consensus ordering from several permutation orderings. We show the interest of the approach with both quantitative and qualitative results.


2011 ◽  
Vol 179-180 ◽  
pp. 685-690
Author(s):  
Xiao Hong Hu ◽  
Xiao Lei Wang ◽  
Xiu Ran Wei

Graph based learning has been an active research topic in machine learning community as well as many application areas including image annotation recently. In order to exploit the correlation between keywords and images, we proposed a novel image annotation method via graph based learning and semantic fusion to estimate the probability of keywords being the caption of an image, and present a new framework to solve the problem. The experiments over Corel images have shown that this approach outperforms other methods and is effective for image annotation.


Sign in / Sign up

Export Citation Format

Share Document