Information-theoretic measures of uncertainty for interval-set decision tables

Author(s):  
Yimeng Zhang ◽  
Xiuyi Jia ◽  
Zhenmin Tang
Author(s):  
Ryan Ka Yau Lai ◽  
Youngah Do

This article explores a method of creating confidence bounds for information-theoretic measures in linguistics, such as entropy, Kullback-Leibler Divergence (KLD), and mutual information. We show that a useful measure of uncertainty can be derived from simple statistical principles, namely the asymptotic distribution of the maximum likelihood estimator (MLE) and the delta method. Three case studies from phonology and corpus linguistics are used to demonstrate how to apply it and examine its robustness against common violations of its assumptions in linguistics, such as insufficient sample size and non-independence of data points.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
James M. Kunert-Graf ◽  
Nikita A. Sakhanenko ◽  
David J. Galas

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.


Author(s):  
Laurie Beth Feldman ◽  
Vidhushini Srinivasan ◽  
Rachel B. Fernandes ◽  
Samira Shaikh

Abstract Twitter data from a crisis that impacted many English–Spanish bilinguals show that the direction of codeswitches is associated with the statistically documented tendency of single speakers to prefer one language over another in their tweets, as gleaned from their tweeting history. Further, lexical diversity, a measure of vocabulary richness derived from information-theoretic measures of uncertainty in communication, is greater in proximity to a codeswitch than in productions remote from a switch. The prospects of a role for lexical diversity in characterizing the conditions for a language switch suggest that communicative precision may induce conditions that attenuate constraints against language mixing.


Risks ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 89
Author(s):  
Muhammad Sheraz ◽  
Imran Nasir

The volatility analysis of stock returns data is paramount in financial studies. We investigate the dynamics of volatility and randomness of the Pakistan Stock Exchange (PSX-100) and obtain insights into the behavior of investors during and before the coronavirus disease (COVID-19 pandemic). The paper aims to present the volatility estimations and quantification of the randomness of PSX-100. The methodology includes two approaches: (i) the implementation of EGARCH, GJR-GARCH, and TGARCH models to estimate the volatilities; and (ii) analysis of randomness in volatilities series, return series, and PSX-100 closing prices for pre-pandemic and pandemic period by using Shannon’s, Tsallis, approximate and sample entropies. Volatility modeling suggests the existence of the leverage effect in both the underlying periods of study. The results obtained using GARCH modeling reveal that the stock market volatility has increased during the pandemic period. However, information-theoretic results based on Shannon and Tsallis entropies do not suggest notable variation in the estimated volatilities series and closing prices. We have examined regularity and randomness based on the approximate entropy and sample entropy. We have noticed both entropies are extremely sensitive to choices of the parameters.


Author(s):  
Ardeshir Raihanian Mashhadi ◽  
Sara Behdad

Complexity has been one of the focal points of attention in the supply chain management domain, as it deteriorates the performance of the supply chain and makes controlling it problematic. The complexity of supply chains has been significantly increased over the past couple of decades. Meanwhile, Additive Manufacturing (AM) not only revolutionizes the way that the products are made, but also brings a paradigm shift to the whole production system. The influence of AM extends to product design and supply chain as well. The unique capabilities of AM suggest that this manufacturing method can significantly affect the supply chain complexity. More product complexity and demand heterogeneity, faster production cycles, higher levels of automation and shorter supply paths are among the features of additive manufacturing that can directly influence the supply chain complexity. Comparison of additive manufacturing supply chain complexity to its traditional counterpart requires a profound comprehension of the transformative effects of AM on the supply chain. This paper first extracts the possible effects of AM on the supply chain and then tries to connect these effects to the drivers of complexity under three main categories of 1) market, 2) manufacturing technology, and 3) supply, planning and infrastructure. Possible impacts of additive manufacturing adoption on the supply chain complexity have been studied using information theoretic measures. An Agent-based Simulation (ABS) model has been developed to study and compare two different supply chain configurations. The findings of this study suggest that the adoption of AM can decrease the supply chain complexity, particularly when product customization is considered.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Renita Murimi

AbstractCities are microcosms representing a diversity of human experience. The complexity of urban systems arises from this diversity, where the services that cities offer to their inhabitants have to be tailored for their unique requirements. This paper studies the complexity of urban environments in terms of the assimilation of its communities. We examine the urban assimilation complexity with respect to the foreignness between communities and formalize the level of complexity using information-theoretic measures. Our findings contribute to a sociological perspective of the relationship between urban complex systems and the diversity of communities that make up urban systems.


2017 ◽  
Vol 28 (7) ◽  
pp. 954-966 ◽  
Author(s):  
Colin Bannard ◽  
Marla Rosner ◽  
Danielle Matthews

Of all the things a person could say in a given situation, what determines what is worth saying? Greenfield’s principle of informativeness states that right from the onset of language, humans selectively comment on whatever they find unexpected. In this article, we quantify this tendency using information-theoretic measures and report on a study in which we tested the counterintuitive prediction that children will produce words that have a low frequency given the context, because these will be most informative. Using corpora of child-directed speech, we identified adjectives that varied in how informative (i.e., unexpected) they were given the noun they modified. In an initial experiment ( N = 31) and in a replication ( N = 13), 3-year-olds heard an experimenter use these adjectives to describe pictures. The children’s task was then to describe the pictures to another person. As the information content of the experimenter’s adjective increased, so did children’s tendency to comment on the feature that adjective had encoded. Furthermore, our analyses suggest that children balance informativeness with a competing drive to ease production.


Sign in / Sign up

Export Citation Format

Share Document