A new information theoretic clustering algorithm using k-nn

Author(s):  
Vidar Vikjord ◽  
Robert Jenssen
Entropy ◽  
2018 ◽  
Vol 20 (7) ◽  
pp. 540 ◽  
Author(s):  
Subhashis Hazarika ◽  
Ayan Biswas ◽  
Soumya Dutta ◽  
Han-Wei Shen

Uncertainty of scalar values in an ensemble dataset is often represented by the collection of their corresponding isocontours. Various techniques such as contour-boxplot, contour variability plot, glyphs and probabilistic marching-cubes have been proposed to analyze and visualize ensemble isocontours. All these techniques assume that a scalar value of interest is already known to the user. Not much work has been done in guiding users to select the scalar values for such uncertainty analysis. Moreover, analyzing and visualizing a large collection of ensemble isocontours for a selected scalar value has its own challenges. Interpreting the visualizations of such large collections of isocontours is also a difficult task. In this work, we propose a new information-theoretic approach towards addressing these issues. Using specific information measures that estimate the predictability and surprise of specific scalar values, we evaluate the overall uncertainty associated with all the scalar values in an ensemble system. This helps the scientist to understand the effects of uncertainty on different data features. To understand in finer details the contribution of individual members towards the uncertainty of the ensemble isocontours of a selected scalar value, we propose a conditional entropy based algorithm to quantify the individual contributions. This can help simplify analysis and visualization for systems with more members by identifying the members contributing the most towards overall uncertainty. We demonstrate the efficacy of our method by applying it on real-world datasets from material sciences, weather forecasting and ocean simulation experiments.


Author(s):  
Ryotaro Kamimura ◽  

In this paper, we propose new information-theoretic methods to stabilize feature detection. We have introduced information-theoretic methods to realize competitive learning. It turned out that mutual information maximization corresponds to a process of competition among neurons. This means that mutual information can be effective in describing competitive processes. Thus, by using this mutual information, we have introduced information loss to interpret internal representations. By relaxing competitive units by some components such as units and connection weights, a neural network’s information is decreased. If the information loss is sufficiently large, the components play important roles. However, with the information loss, there have been some problems, such as the instability of final representations. This means that final outputs are significantly dependent upon chosen parameters. To stabilize final representations, we introduce two computational methods, that is, <em>relative relaxation</em> and <em>weighted information loss</em>. The relative relaxation is introduced because mutual information is dependent upon the Gaussian width. Thus, we can relax competitive units or softly delete some components, relative only to a predetermined base state. In addition, we introduce weighted information loss to take into account information on related components. We applied the methods to the well-known Iris problem and a problem regarding the extinction of animals and plants. In the Iris problem, experimental results confirmed that final representations were significantly stable if we appropriately chose the parameter for the base state. On the other hand, in the extinction problem, weighted information losses showed better performance, where final outputs were significantly more stable than those by the other methods.


2017 ◽  
Author(s):  
Alexander C. Reis ◽  
Howard M. Salis

ABSTRACTGene expression models greatly accelerate the engineering of synthetic metabolic pathways and genetic circuits by predicting sequence-function relationships and reducing trial-and-error experimentation. However, developing models with more accurate predictions is a significant challenge, even though they are essential to engineering complex genetic systems. Here we present a model test system that combines advanced statistics, machine learning, and a database of 9862 characterized genetic systems to automatically quantify model accuracies, accept or reject mechanistic hypotheses, and identify areas for model improvement. We also introduce Model Capacity, a new information theoretic metric that enables correct model comparisons across datasets. We demonstrate the model test system by comparing six models of translation initiation rate, evaluating 100 mechanistic hypotheses, and uncovering new sequence determinants that control protein expression levels. We applied these results to develop a biophysical model of translation initiation rate with significant improvements in accuracy. Automated model test systems will dramatically accelerate the development of gene expression models, and thereby transition synthetic biology into a mature engineering discipline.


Sign in / Sign up

Export Citation Format

Share Document