scholarly journals Certifai: A Toolkit for Building Trust in AI Systems

Author(s):  
Jette Henderson ◽  
Shubham Sharma ◽  
Alan Gee ◽  
Valeri Alexiev ◽  
Steve Draper ◽  
...  

As more companies and governments build and use machine learning models to automate decisions, there is an ever-growing need to monitor and evaluate these models' behavior once they are deployed. Our team at CognitiveScale has developed a toolkit called Cortex Certifai to answer this need. Cortex Certifai is a framework that assesses aspects of robustness, fairness, and interpretability of any classification or regression model trained on tabular data, without requiring access to its internal workings. Additionally, Cortex Certifai allows users to compare models along these different axes and only requires 1) query access to the model and 2) an “evaluation” dataset. At its foundation, Cortex Certifai generates counterfactual explanations, which are synthetic data points close to input data points but differing in terms of model prediction. The tool then harnesses characteristics of these counterfactual explanations to analyze different aspects of the supplied model and delivers evaluations relevant to a variety of different stakeholders (e.g., model developers, risk analysts, compliance officers). Cortex Certifai can be configured and executed using a command-line interface (CLI), within jupyter notebooks, or on the cloud, and the results are recorded in JSON files and can be visualized in an interactive console. Using these reports, stakeholders can understand, monitor, and build trust in their AI systems. In this paper, we provide a brief overview of a demonstration of Cortex Certifai's capabilities.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
C J Battey ◽  
Gabrielle C Coffing ◽  
Andrew D Kern

Abstract Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs)—generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data—for visualizing population genetic variation. VAEs incorporate nonlinear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population.


Author(s):  
Shuyuan Lin ◽  
Guobao Xiao ◽  
Yan Yan ◽  
David Suter ◽  
Hanzi Wang

Recently, some hypergraph-based methods have been proposed to deal with the problem of model fitting in computer vision, mainly due to the superior capability of hypergraph to represent the complex relationship between data points. However, a hypergraph becomes extremely complicated when the input data include a large number of data points (usually contaminated with noises and outliers), which will significantly increase the computational burden. In order to overcome the above problem, we propose a novel hypergraph optimization based model fitting (HOMF) method to construct a simple but effective hypergraph. Specifically, HOMF includes two main parts: an adaptive inlier estimation algorithm for vertex optimization and an iterative hyperedge optimization algorithm for hyperedge optimization. The proposed method is highly efficient, and it can obtain accurate model fitting results within a few iterations. Moreover, HOMF can then directly apply spectral clustering, to achieve good fitting performance. Extensive experimental results show that HOMF outperforms several state-of-the-art model fitting methods on both synthetic data and real images, especially in sampling efficiency and in handling data with severe outliers.


Author(s):  
C. J. Battey ◽  
Gabrielle C. Coffing ◽  
Andrew D. Kern

AbstractDimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs) – generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data – for visualizing population genetic variation. VAEs incorporate non-linear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population.


1994 ◽  
Vol 05 (05) ◽  
pp. 805-809 ◽  
Author(s):  
SALIM G. ANSARI ◽  
PAOLO GIOMMI ◽  
ALBERTO MICOL

On 3rd November, 1993, ESIS announced its Homepage on the World Wide Web (WWW) to the user community. Ever since then, ESIS has steadily increased its Web support to the astronomical community to include a bibliographic service, the ESIS catalogue documentation and the ESIS Data Browser. More functionality will be added in the near future. All these services share a common ESIS structure that is used by other ESIS user paradigms such as the ESIS Graphical User Interface (Giommi and Ansari, 1993), and the ESIS Command Line Interface. A forms-based paradigm, each ESIS-Web application interfaces to the hypertext transfer protocol (http) translating queries from/to the hypertext markup language (html) format understood by the NCSA Mosaic interface. In this paper, we discuss the ESIS system and show how each ESIS service works on the World Wide Web client.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Lev V. Utkin

A fuzzy classification model is studied in the paper. It is based on the contaminated (robust) model which produces fuzzy expected risk measures characterizing classification errors. Optimal classification parameters of the models are derived by minimizing the fuzzy expected risk. It is shown that an algorithm for computing the classification parameters is reduced to a set of standard support vector machine tasks with weighted data points. Experimental results with synthetic data illustrate the proposed fuzzy model.


Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 227
Author(s):  
Eckart Michaelsen ◽  
Stéphane Vujasinovic

Representative input data are a necessary requirement for the assessment of machine-vision systems. For symmetry-seeing machines in particular, such imagery should provide symmetries as well as asymmetric clutter. Moreover, there must be reliable ground truth with the data. It should be possible to estimate the recognition performance and the computational efforts by providing different grades of difficulty and complexity. Recent competitions used real imagery labeled by human subjects with appropriate ground truth. The paper at hand proposes to use synthetic data instead. Such data contain symmetry, clutter, and nothing else. This is preferable because interference with other perceptive capabilities, such as object recognition, or prior knowledge, can be avoided. The data are given sparsely, i.e., as sets of primitive objects. However, images can be generated from them, so that the same data can also be fed into machines requiring dense input, such as multilayered perceptrons. Sparse representations are preferred, because the author’s own system requires such data, and in this way, any influence of the primitive extraction method is excluded. The presented format allows hierarchies of symmetries. This is important because hierarchy constitutes a natural and dominant part in symmetry-seeing. The paper reports some experiments using the author’s Gestalt algebra system as symmetry-seeing machine. Additionally included is a comparative test run with the state-of-the-art symmetry-seeing deep learning convolutional perceptron of the PSU. The computational efforts and recognition performance are assessed.


Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 158
Author(s):  
Ain Cheon ◽  
Jwakyung Sung ◽  
Hangbae Jun ◽  
Heewon Jang ◽  
Minji Kim ◽  
...  

The application of a machine learning (ML) model to bio-electrochemical anaerobic digestion (BEAD) is a future-oriented approach for improving process stability by predicting performances that have nonlinear relationships with various operational parameters. Five ML models, which included tree-, regression-, and neural network-based algorithms, were applied to predict the methane yield in BEAD reactor. The results showed that various 1-step ahead ML models, which utilized prior data of BEAD performances, could enhance prediction accuracy. In addition, 1-step ahead with retraining algorithm could improve prediction accuracy by 37.3% compared with the conventional multi-step ahead algorithm. The improvement was particularly noteworthy in tree- and regression-based ML models. Moreover, 1-step ahead with retraining algorithm showed high potential of achieving efficient prediction using pH as a single input data, which is plausibly an easier monitoring parameter compared with the other parameters required in bioprocess models.


2021 ◽  
Vol 9 ◽  
Author(s):  
Caio Ribeiro ◽  
Lucas Oliveira ◽  
Romina Batista ◽  
Marcos De Sousa

The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility. We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.


Author(s):  
Dr. C. K. Gomathy

Abstract: Apache Sqoop is mainly used to efficiently transfer large volumes of data between Apache Hadoop and relational databases. It helps to certain tasks, such as ETL (Extract transform load) processing, from an enterprise data warehouse to Hadoop, for efficient execution at a much less cost. Here first we import the table which presents in MYSQL Database with the help of command-line interface application called Sqoop and there is a chance of addition of new rows and updating new rows then we have to execute the query again. So, with the help of our project there is no need of executing queries again for that we are using Sqoop job, which consists of total commands for import and next after import we retrieve the data from hive using Java JDBC and we convert the data to JSON Format, which consists of data in an organized way and easy to access manner by using GSON Library. Keywords: Sqoop, Json, Gson, Maven and JDBC


2020 ◽  
Vol 69 ◽  
pp. 1255-1285
Author(s):  
Ricardo Cardoso Pereira ◽  
Miriam Seoane Santos ◽  
Pedro Pereira Rodrigues ◽  
Pedro Henriques Abreu

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.


Sign in / Sign up

Export Citation Format

Share Document