scholarly journals Topological Information Data Analysis

Entropy ◽  
2019 ◽  
Vol 21 (9) ◽  
pp. 869 ◽  
Author(s):  
Pierre Baudot ◽  
Monica Tapia ◽  
Daniel Bennequin ◽  
Jean-Marc Goaillard

This paper presents methods that quantify the structure of statistical interactions within a given data set, and were applied in a previous article. It establishes new results on the k-multivariate mutual-information ( I k ) inspired by the topological formulation of Information introduced in a serie of studies. In particular, we show that the vanishing of all I k for 2 ≤ k ≤ n of n random variables is equivalent to their statistical independence. Pursuing the work of Hu Kuo Ting and Te Sun Han, we show that information functions provide co-ordinates for binary variables, and that they are analytically independent from the probability simplex for any set of finite variables. The maximal positive I k identifies the variables that co-vary the most in the population, whereas the minimal negative I k identifies synergistic clusters and the variables that differentiate–segregate the most in the population. Finite data size effects and estimation biases severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and the k-dependences. We give an example of application of these methods to genetic expression and unsupervised cell-type classification. The methods unravel biologically relevant subtypes, with a sample size of 41 genes and with few errors. It establishes generic basic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism. We propose that higher-order statistical interactions and non-identically distributed variables are constitutive characteristics of biological systems that should be estimated in order to unravel their significant statistical structure and diversity. The topological information data analysis presented here allows for precisely estimating this higher-order structure characteristic of biological systems.

Author(s):  
Pierre Baudot ◽  
Monica Tapia ◽  
Jean-Marc Goaillard

This paper establishes methods that quantify the structure of statistical interactions within a given data set using the characterization of information theory in cohomology by finite methods, and provides their expression in terms of statistical physic and machine learning. Following [1–3], we show directly that k multivariate mutual-informations (Ik) are k-coboundaries. The k-cocycles are given by Ik = 0, which generalize statistical independence to arbitrary dimension k. The topological approach allows to investigate Shannon’s information in the multivariate case without the assumptions of independent identically distributed variables. We develop the computationally tractable subcase of simplicial information cohomology represented by entropy Hk and information Ik landscapes. The I1 component defines a self-internal energy functional Uk, and (−1)k Ik,k≥2 components define the contribution to a free energy functional Gk of the k-body interactions. The set of information paths in simplicial structures is in bijection with the symmetric group and random processes, provides a topological expression of the 2nd law and points toward a discrete Noether theorem (1st law). The local minima of free-energy, related to conditional information negativity and the non-Shannonian cone of Yeung [4], characterize a minimum free energy complex. This complex formalizes the minimum free-energy principle in topology, provides a definition of a complex system, and characterizes a multiplicity of local minima that quantifies the diversity observed in biology. Finite data size effects and estimation bias severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and for the k-dependences following [5]. We give an example of application of these methods to genetic expression and cell-type classification. The maximal positive Ik identifies the variables that co-vary the most in the population, whereas the minimal negative Ik identifies clusters and the variables that differentiate-segregate the most. The methods unravel biologically relevant I10 with a sample size of 41. It establishes generic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Ann S. Blevins ◽  
Jason Z. Kim ◽  
Dani S. Bassett

AbstractThe complex behavior of many real-world systems depends on a network of both strong and weak edges. Distinguishing between true weak edges and low-weight edges caused by noise is a common problem in data analysis, and solutions tend to either remove noise or study noise in the absence of data. In this work, we instead study how noise and data coexist, by examining the structure of noisy, weak edges that have been synthetically added to model networks. We find that the structure of low-weight, noisy edges varies according to the topology of the model network to which it is added, that at least three qualitative classes of noise structure emerge, and that these noisy edges can be used to classify the model networks. Our results demonstrate that noise does not present as a monolithic nuisance, but rather as a nuanced, topology-dependent, and even useful entity in characterizing higher-order network interactions.


Author(s):  
Intan Permata Sari And Indra Hartoyo

This study is aimed at (1) analyzing reading exercises based Bloom’s taxonomy for VIII grade in English on Sky textbook. (2) Found the distribution of the lower and higher order thinking skill in reading exercises. (3) To reason for level reading exercises. After analyzed the data, the result of the data analysis also infers that the six levels of Bloom’s taxonomy in reading exercises weren’t applied totally. The creating skill doesn’t have distribution in reading exercise, and the understanding – remembering level more dominant than another levels. The distribution of the higher order thinking level was lower than the lower order thinking level and the six levels are not appropriate with the proportion for each level of education based Bloom’s taxonomy, such as the distribution of the creating level in the reading exercise must be a concern because no question that belong to the creating level. It was concluded that reading exercises in English on Sky textbook cannot improve students' critical thinking skills for VIII grade.


2019 ◽  
Author(s):  
Zacharias Kinney ◽  
Viraj Kirinda ◽  
Scott Hartley

<p>Higher-order structure in abiotic foldamer systems represents an important but largely unrealized goal. As one approach to this challenge, covalent assembly can be used to assemble macrocycles with foldamer subunits in well-defined spatial relationships. Such systems have previously been shown to exhibit self-sorting, new folding motifs, and dynamic stereoisomerism, yet there remain important questions about the interplay between folding and macrocyclization and the effect of structural confinement on folding behavior. Here, we explore the dynamic covalent assembly of extended <i>ortho</i>-phenylenes (hexamer and decamer) with rod-shaped linkers. Characteristic <sup>1</sup>H chemical shift differences between cyclic and acyclic systems can be compared with computational conformer libraries to determine the folding states of the macrocycles. We show that the bite angle provides a measure of the fit of an <i>o</i>-phenylene conformer within a shape-persistent macrocycle, affecting both assembly and ultimate folding behavior. For the <i>o</i>-phenylene hexamer, the bite angle and conformer stability work synergistically to direct assembly toward triangular [3+3] macrocycles of well-folded oligomers. For the decamer, the energetic accessibility of conformers with small bite angles allows [2+2] macrocycles to be formed as the predominant species. In these systems, the <i>o</i>-phenylenes are forced into unusual folding states, preferentially adopting a backbone geometry with distinct helical blocks of opposite handedness. The results show that simple geometric restrictions can be used to direct foldamers toward increasingly complex geometries.</p>


2019 ◽  
Author(s):  
Zacharias Kinney ◽  
Viraj Kirinda ◽  
Scott Hartley

<p>Higher-order structure in abiotic foldamer systems represents an important but largely unrealized goal. As one approach to this challenge, covalent assembly can be used to assemble macrocycles with foldamer subunits in well-defined spatial relationships. Such systems have previously been shown to exhibit self-sorting, new folding motifs, and dynamic stereoisomerism, yet there remain important questions about the interplay between folding and macrocyclization and the effect of structural confinement on folding behavior. Here, we explore the dynamic covalent assembly of extended <i>ortho</i>-phenylenes (hexamer and decamer) with rod-shaped linkers. Characteristic <sup>1</sup>H chemical shift differences between cyclic and acyclic systems can be compared with computational conformer libraries to determine the folding states of the macrocycles. We show that the bite angle provides a measure of the fit of an <i>o</i>-phenylene conformer within a shape-persistent macrocycle, affecting both assembly and ultimate folding behavior. For the <i>o</i>-phenylene hexamer, the bite angle and conformer stability work synergistically to direct assembly toward triangular [3+3] macrocycles of well-folded oligomers. For the decamer, the energetic accessibility of conformers with small bite angles allows [2+2] macrocycles to be formed as the predominant species. In these systems, the <i>o</i>-phenylenes are forced into unusual folding states, preferentially adopting a backbone geometry with distinct helical blocks of opposite handedness. The results show that simple geometric restrictions can be used to direct foldamers toward increasingly complex geometries.</p>


2019 ◽  
Vol 26 (1) ◽  
pp. 35-43 ◽  
Author(s):  
Natalie K. Garcia ◽  
Galahad Deperalta ◽  
Aaron T. Wecksler

Background: Biotherapeutics, particularly monoclonal antibodies (mAbs), are a maturing class of drugs capable of treating a wide range of diseases. Therapeutic function and solutionstability are linked to the proper three-dimensional organization of the primary sequence into Higher Order Structure (HOS) as well as the timescales of protein motions (dynamics). Methods that directly monitor protein HOS and dynamics are important for mapping therapeutically relevant protein-protein interactions and assessing properly folded structures. Irreversible covalent protein footprinting Mass Spectrometry (MS) tools, such as site-specific amino acid labeling and hydroxyl radical footprinting are analytical techniques capable of monitoring the side chain solvent accessibility influenced by tertiary and quaternary structure. Here we discuss the methodology, examples of biotherapeutic applications, and the future directions of irreversible covalent protein footprinting MS in biotherapeutic research and development. Conclusion: Bottom-up mass spectrometry using irreversible labeling techniques provide valuable information for characterizing solution-phase protein structure. Examples range from epitope mapping and protein-ligand interactions, to probing challenging structures of membrane proteins. By paring these techniques with hydrogen-deuterium exchange, spectroscopic analysis, or static-phase structural data such as crystallography or electron microscopy, a comprehensive understanding of protein structure can be obtained.


Sign in / Sign up

Export Citation Format

Share Document