Distributivity, a General Information Theoretic Network Measure, or Why the Whole is More than the Sum of Its Parts

Functionalist accounts of language suggest that forms are paired with meanings in ways that support efficient communication. Previous work on grammatical marking suggests that word forms have lengths that enable efficient production, and work on the semantic typology of the lexicon suggests that word meanings represent efficient partitions of semantic space. Here we establish a theoretical link between these two lines of work and present an information-theoretic analysis that captures how communicative pressures influence both form and meaning. We apply our approach to the grammatical features of number, tense, and evidentiality, and show that the approach explains both which systems of feature values are attested across languages and the relative lengths of the forms for those feature values. Our approach shows that general information-theoretic principles can capture variation in both form and meaning, across both grammar and the lexicon.

Download Full-text

The forms and meanings of grammatical markers support efficient communication

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2025993118 ◽

2021 ◽

Vol 118 (49) ◽

pp. e2025993118

Author(s):

Francis Mollica ◽

Geoff Bacon ◽

Noga Zaslavsky ◽

Yang Xu ◽

Terry Regier ◽

...

Keyword(s):

Semantic Space ◽

General Information ◽

Efficient Production ◽

Theoretic Analysis ◽

Word Meanings ◽

Information Theoretic ◽

Efficient Communication ◽

Word Forms ◽

Feature Values ◽

Semantic Typology

Functionalist accounts of language suggest that forms are paired with meanings in ways that support efficient communication. Previous work on grammatical marking suggests that word forms have lengths that enable efficient production, and work on the semantic typology of the lexicon suggests that word meanings represent efficient partitions of semantic space. Here we establish a theoretical link between these two lines of work and present an information-theoretic analysis that captures how communicative pressures influence both form and meaning. We apply our approach to the grammatical features of number, tense, and evidentiality and show that the approach explains both which systems of feature values are attested across languages and the relative lengths of the forms for those feature values. Our approach shows that general information-theoretic principles can capture variation in both form and meaning across languages.

Download Full-text

Information-Theoretic Semi-Supervised Metric Learning via Entropy Regularization

Neural Computation ◽

10.1162/neco_a_00614 ◽

2014 ◽

Vol 26 (8) ◽

pp. 1717-1762 ◽

Cited By ~ 28

Author(s):

Gang Niu ◽

Bo Dai ◽

Makoto Yamada ◽

Masashi Sugiyama

Keyword(s):

Mahalanobis Distance ◽

Metric Learning ◽

General Information ◽

Unlabeled Data ◽

Projection Algorithm ◽

Theoretic Approach ◽

Noisy Environments ◽

Information Theoretic ◽

Low Dimensional ◽

Information Theoretic Approach

We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

Download Full-text

Working Memory Training: Assessing the Efficiency of Mnemonic Strategies

Entropy ◽

10.3390/e22050577 ◽

2020 ◽

Vol 22 (5) ◽

pp. 577

Author(s):

Serena Di Santo ◽

Vanni De Luca ◽

Alessio Isaja ◽

Sara Andreetta

Keyword(s):

Working Memory ◽

Random Order ◽

Memory Systems ◽

General Information ◽

Memory Training ◽

Control Group ◽

Trained Subjects ◽

Information Theoretic ◽

Two Phases ◽

And Control

Recently, there has been increasing interest in techniques for enhancing working memory (WM), casting a new light on the classical picture of a rigid system. One reason is that WM performance has been associated with intelligence and reasoning, while its impairment showed correlations with cognitive deficits, hence the possibility of training it is highly appealing. However, results on WM changes following training are controversial, leaving it unclear whether it can really be potentiated. This study aims at assessing changes in WM performance by comparing it with and without training by a professional mnemonist. Two groups, experimental and control, participated in the study, organized in two phases. In the morning, both groups were familiarized with stimuli through an N-back task, and then attended a 2-hour lecture. For the experimental group, the lecture, given by the mnemonist, introduced memory encoding techniques; for the control group, it was a standard academic lecture about memory systems. In the afternoon, both groups were administered five tests, in which they had to remember the position of 16 items, when asked in random order. The results show much better performance in trained subjects, indicating the need to consider such possibility of enhancement, alongside general information-theoretic constraints, when theorizing about WM span.

Download Full-text

On Genetic Information, Diversity and Distance

Methods of Information in Medicine ◽

10.1055/s-0038-1634063 ◽

2006 ◽

Vol 45 (02) ◽

pp. 173-179

Author(s):

I. Vajda ◽

J. Zvárová

Keyword(s):

Genetic Distance ◽

Wide Class ◽

Genetic Information ◽

Genetic Distances ◽

General Information ◽

Statistical Association ◽

Information Theoretic ◽

Special Cases ◽

Information Diversity ◽

Different Populations

Summary Objectives: General information-theoretic concepts such as f-divergence, f-information and f-entropy are applied to the genetic models where genes are characterized by randomly distributed alleles. The paper thus presents an information-theoretic background for measuring genetic distances between populations, genetic information in various observations on individuals about their alleles and, finally, genetic diversities in various populations. Methods: Genetic distances were derived as divergences between frequencies of alleles representing a gene in two different populations. Genetic information was derived as a measure of statistical association between the observations taken on individuals and the alleles of these individuals. Genetic diversities were derived from divergences and information. Results: The concept of genetic f-information introduced in the paper seems to be new. We show that the measures of genetic distance and diversity used in the previous literature are special cases of the genetic f-divergence and f-diversity introduced in the paper and illustrated by examples. We also display intimate connections between the genetic f-information and the genetic f-divergence on one side and genetic f-diversity on the other side. The examples at the same time also illustrate practical computations and applications of the important concepts of quantitative genetics introduced in the paper. Conclusions: We discussed a general class of f-divergence measures that are suitable measures of genetic distance between populations characterized by concrete frequencies of alleles. We have shown that a wide class of genetic information, called f-information, can be obtained from f-divergences and that a wide class of measures of genetic diversity, called f-diversities, can be obtained from the f-divergences and f-information.

Download Full-text

How Many Clusters? An Information-Theoretic Perspective

Neural Computation ◽

10.1162/0899766042321751 ◽

2004 ◽

Vol 16 (12) ◽

pp. 2483-2506 ◽

Cited By ~ 70

Author(s):

Susanne Still ◽

William Bialek

Keyword(s):

Finite Size ◽

General Information ◽

Data Sets ◽

Complex Data ◽

Sampling Errors ◽

External Criterion ◽

Data Set ◽

Number Of Clusters ◽

Information Theoretic ◽

Clustering Criterion

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate for the description of a given system. Traditional approaches to this problem are based on either a framework in which clusters of a particular shape are assumed as a model of the system or on a two-step procedure in which a clustering criterion determines the optimal assignments for a given number of clusters and a separate criterion measures the goodness of the classification to determine the number of clusters. In a statistical mechanics approach, clustering can be seen as a trade-off between energy- and entropy-like terms, with lower temperature driving the proliferation of clusters to provide a more detailed description of the data. For finite data sets, we expect that there is a limit to the meaningful structure that can be resolved and therefore a minimum temperature beyond which we will capture sampling noise. This suggests that correcting the clustering criterion for the bias that arises due to sampling errors will allow us to find a clustering solution at a temperature that is optimal in the sense that we capture maximal meaningful structure—without having to define an external criterion for the goodness or stability of the clustering. We show that in a general information-theoretic framework, the finite size of a data set determines an optimal temperature, and we introduce a method for finding the maximal number of clusters that can be resolved from the data in the hard clustering limit.

Download Full-text