STATISTICAL MEASURES AS MEASURES OF DIVERSITY

2010 ◽  
Vol 03 (02) ◽  
pp. 173-185 ◽  
Author(s):  
OM PARKASH ◽  
A. K. THUKRAL

Two fields of research have found tremendous applicability in the analysis of biological data-statistics and information theory. Statistics is extensively used for the measurement of central tendency, dispersion, comparison and covariation. Measures of information are used to study diversity and equitability. These two fields have been used independent of each other for data analysis. In this communication, we develop the link between the two and prove that statistical measures can be used as information measures. Our study will be a new interdisciplinary field of research and it will be possible to describe information content of a system from its statistics.

2021 ◽  
Author(s):  
Uwe Ehret

<p>In this contribution, I will – with examples from hydrology - make the case for information theory as a general language and framework for i) characterizing systems, ii) quantifying the information content in data, iii) evaluating how well models can learn from data, and iv) measuring how well models do in prediction. In particular, I will discuss how information measures can be used to characterize systems by the state space volume they occupy, their dynamical complexity, and their distance from equilibrium. Likewise, I will discuss how we can measure the information content of data through systematic perturbations, and how much information a model absorbs (or ignores) from data during learning. This can help building hybrid models that optimally combine information in data and general knowledge from physical and other laws, which is currently among the key challenges in machine learning applied to earth science problems.</p><p>While I will try my best to convince everybody of taking an information perspective henceforth, I will also name the related challenges: Data demands, binning choices, estimation of probability distributions from limited data, and issues with excessive data dimensionality.</p>


1977 ◽  
Vol 9 (4) ◽  
pp. 395-417 ◽  
Author(s):  
J A Walsh ◽  
M J Webber

The concepts of entropy and of information are increasingly used in spatial analysis. This paper analyses these ideas in order to show how measures of spatial distributions may be constructed from them. First, the information content of messages is examined and related to the notion of uncertainty. Then three information measures, due to Shannon, Brillouin, and Good, are derived and shown to be appropriate in analysing different spatial problems; in particular, the Shannon and Brillouin measures are extensively compared and the effects of sample size on them are investigated. The paper also develops appropriate multivariate analogues of the information measures. Finally, some comments are made on the relations between the concepts of entropy, information, and order.


Author(s):  
David J. Galas ◽  
Nikita A. Sakhanenko

Information-related measures are useful tools for multi-variable data analysis, as measures of dependence among variables, and as descriptions of order and disorder in biological and physical systems.  Measures, like marginal entropies, mutual / interaction / multi -information, have long been used in a number of fields including descriptions of systems complexity and biological data analysis.  The mathematical relationships among these measures are therefore of significant inherent interest.  Relations between common information measures include the duality relations based on Möbius inversion on lattices.  These are the direct consequence of the symmetries of the lattices of the sets of variables (subsets ordered by inclusion).  While these relationships are of significant interest there has been, to our knowledge, no systematic examination of the full range of relationships of this diverse range of functions into a unifying formalism as we do here.  In this paper we define operators on functions on these lattices based on the Möbius inversions that map functions into one another (Möbius operators).  We show that these operators form a simple group isomorphic to the symmetric group S3.  Relations among the set of functions on the lattice are transparently expressed in terms of the operator algebra, and, applied to the information measures, can be used to derive a wide range of relationships among diverse information measures.  The Möbius operator algebra is naturally generalized which yields extensive new relationships.  This formalism now provides a fundamental unification of information-related measures, and the isomorphism of all distributive lattices with the subset lattice implies an even broader application of these results.


2020 ◽  
Vol 27 (38) ◽  
pp. 6523-6535 ◽  
Author(s):  
Antreas Afantitis ◽  
Andreas Tsoumanis ◽  
Georgia Melagraki

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.


2013 ◽  
Vol 2013 ◽  
pp. 1-3 ◽  
Author(s):  
Pantelimon-George Popescu ◽  
Florin Pop ◽  
Alexandru Herişanu ◽  
Nicolae Ţăpuş

We refine a classical logarithmic inequality using a discrete case of Bernoulli inequality, and then we refine furthermore two information inequalities between information measures for graphs, based on information functionals, presented by Dehmer and Mowshowitz in (2010) as Theorems 4.7 and 4.8. The inequalities refer to entropy-based measures of network information content and have a great impact for information processing in complex networks (a subarea of research in modeling of complex systems).


Author(s):  
Mária Ždímalová ◽  
Tomáš Bohumel ◽  
Katarína Plachá-Gregorovská ◽  
Peter Weismann ◽  
Hisham El Falougy

2005 ◽  
Vol 20 (2) ◽  
pp. 117-125 ◽  
Author(s):  
MICHAEL LUCK ◽  
EMANUELA MERELLI

The scope of the Technical Forum Group (TFG) on Agents in Bioinformatics (BIOAGENTS) was to inspire collaboration between the agent and bioinformatics communities with the aim of creating an opportunity to propose a different (agent-based) approach to the development of computational frameworks both for data analysis in bioinformatics and for system modelling in computational biology. During the day, the participants examined the future of research on agents in bioinformatics primarily through 12 invited talks selected to cover the most relevant topics. From the discussions, it became clear that there are many perspectives to the field, ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages for use by information agents, and to the use of Grid agents, each of which requires further exploration. The interactions between participants encouraged the development of applications that describe a way of creating agent-based simulation models of biological systems, starting from an hypothesis and inferring new knowledge (or relations) by mining and analysing the huge amount of public biological data. In this report we summarize and reflect on the presentations and discussions.


2017 ◽  
Vol 28 (7) ◽  
pp. 954-966 ◽  
Author(s):  
Colin Bannard ◽  
Marla Rosner ◽  
Danielle Matthews

Of all the things a person could say in a given situation, what determines what is worth saying? Greenfield’s principle of informativeness states that right from the onset of language, humans selectively comment on whatever they find unexpected. In this article, we quantify this tendency using information-theoretic measures and report on a study in which we tested the counterintuitive prediction that children will produce words that have a low frequency given the context, because these will be most informative. Using corpora of child-directed speech, we identified adjectives that varied in how informative (i.e., unexpected) they were given the noun they modified. In an initial experiment ( N = 31) and in a replication ( N = 13), 3-year-olds heard an experimenter use these adjectives to describe pictures. The children’s task was then to describe the pictures to another person. As the information content of the experimenter’s adjective increased, so did children’s tendency to comment on the feature that adjective had encoded. Furthermore, our analyses suggest that children balance informativeness with a competing drive to ease production.


Sign in / Sign up

Export Citation Format

Share Document