scholarly journals On the Resolution of Compositional Datasets into Convex Combinations of Extreme Vectors

2021 ◽  
Author(s):  
◽  
Ross Martyn Renner

<p>Large compositional datasets of the kind assembled in the geosciences are often of remarkably low approximate rank. That is, within a tolerable error, data points representing the rows of such an array can approximately be located in a relatively small dimensional subspace of the row space. A physical mixing process which would account for this phenomenon implies that each observation vector of an array can be estimated by a convex combination of a small number of fixed source or 'endmember' vectors. In practice, neither the compositions of the endmembers nor the coefficients of the convex combinations are known. Traditional methods for attempting to estimate some or all of these quantities have included Q-mode 'factor' analysis and linear programming. In general, neither method is successful. Some of the more important mathematical properties of a convex representation of compositional data are examined in this thesis as well as the background to the development of algorithms for assessing the number of endmembers statistically, locating endmembers and partitioning geological samples into specified endmembers. Keywords and Phrases: Compositional data, convex sets, endmembers, partitioning by least squares, iteration, logratios.</p>

2021 ◽  
Author(s):  
◽  
Ross Martyn Renner

<p>Large compositional datasets of the kind assembled in the geosciences are often of remarkably low approximate rank. That is, within a tolerable error, data points representing the rows of such an array can approximately be located in a relatively small dimensional subspace of the row space. A physical mixing process which would account for this phenomenon implies that each observation vector of an array can be estimated by a convex combination of a small number of fixed source or 'endmember' vectors. In practice, neither the compositions of the endmembers nor the coefficients of the convex combinations are known. Traditional methods for attempting to estimate some or all of these quantities have included Q-mode 'factor' analysis and linear programming. In general, neither method is successful. Some of the more important mathematical properties of a convex representation of compositional data are examined in this thesis as well as the background to the development of algorithms for assessing the number of endmembers statistically, locating endmembers and partitioning geological samples into specified endmembers. Keywords and Phrases: Compositional data, convex sets, endmembers, partitioning by least squares, iteration, logratios.</p>


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 19 (4) ◽  
pp. 649-664
Author(s):  
Agnieszka Dołhańczuk-Śródka ◽  
Zbigniew Ziembik ◽  
Jan Kříž ◽  
Lidmila Hyšplerova ◽  
Maria Wacławek

Abstract The fruiting bodies of fungi sprout from mycelium are capable of accumulating significant amounts of trace elements, both metals and metalloids. Content of these elements in fruiting bodies may exceed their concentration in the substrate where fungi develop. Among the elements the radioactive nuclides are also present. In this work health risk caused by increased radioactivity dose absorbed with Xerocomus badius bay bolete consumption was estimated. In analysis concentrations of radioactive isotopes 137Cs and 40K were taken into consideration. It was found that moderate ingestion of bay bolete does not create health risk due to increased radioactive substances intake. The amount of consumed mushrooms that could deliver the dose exceeding the safe one, is rather improbable in real life. Possible relationships between radioactive isotopes concentrations and concentrations of common alkali metals were investigated using methods designed for compositional data analysis. No clear relationships between 137Cs, Ca, K and Mg concentrations in samples of bay bolete were found and significant influence of outlying data points on statistical inference was noticed.


Author(s):  
Sylvain Billiard ◽  
Maxime Derex ◽  
Ludovic Maisonneuve ◽  
Thomas Rey

Understanding how knowledge emerges and propagates within groups is crucial to explain the evolution of human populations. In this work, we introduce a mathematically oriented model that draws on individual-based approaches, inhomogeneous Markov chains and learning algorithms, such as those introduced in [F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc. 39 (2002) 1–49; F. Cucker, S. Smale and D. X. Zhou, Modeling language evolution, Found. Comput. Math. 4 (2004) 315–343]. After deriving the model, we study some of its mathematical properties, and establish theoretical and quantitative results in a simplified case. Finally, we run numerical simulations to illustrate some properties of the model. Our main result is that, as time goes to infinity, individuals’ knowledge can converge to a common shared knowledge that was not present in the convex combination of initial individuals’ knowledge.


1992 ◽  
Vol 56 (385) ◽  
pp. 469-475 ◽  
Author(s):  
H. R. Rollinson

AbstractCompositional data—that is data where concentrations are expressed as proportions of a whole, such as percentages or parts per million—have a number of peculiar mathematical properties which make standard statistical tests unworkable. In particular correlation analysis can produce geologically meaningless results. Aitchison (1986) proposed a log-ratio transformation of compositional data which allows inter-element relationships to be investigated. This method was applied to two sets of geochemical data—basalts from Kilauea Iki lava lake and grantic gneisses from the Limpopo Belt—and geologically 'sensible' results were obtained. Geochemists are encouraged to adopt the Aitchison method of data analysis in preference to the traditional but invalid approach which uses compositional data.


1975 ◽  
Vol 7 (4) ◽  
pp. 818-829 ◽  
Author(s):  
R. E. Miles

A simple direct proof is given of Minkowski's result that the mean length of the orthogonal projection of a convex set in E3 onto an isotropic random line is (2π)–1 times the integral of mean curvature over its surface. This proof is generalised to a correspondingly direct derivation of an analogous formula for the mean projection of a convex set in En onto an isotropic random s-dimensional subspace in En. (The standard derivation of this, and a companion formula, to be found in Bonnesen and Fenchel's classic book on convex sets, is most indirect.) Finally, an alternative short inductive derivation (due to Matheron) of both formulae, by way of Steiner's formula, is presented.


Author(s):  
SHAN-WEN ZHANG ◽  
XIANFENG WANG ◽  
CHUANLEI ZHANG

A novel supervised dimensionality reduction method called orthogonal maximum margin discriminant projection (OMMDP) is proposed to cope with the high dimensionality, complex, various, irregular-shape plant leaf image data. OMMDP aims at learning a linear transformation. After projecting the original data into a low dimensional subspace by OMMDP, the data points of the same class get as near as possible while the data points of the different classes become as far as possible, thus the classification ability is enhanced. The main differences from linear discriminant analysis (LDA), discriminant locality preserving projections (DLPP) and other supervised manifold learning-based methods are as follows: (1) In OMMDP, Warshall algorithm is first applied to constructing both of the must-link and class-class scatter matrices, whose process is easily and quickly implemented without judging whether any pairwise points belong to the same class. (2) The neighborhood density is defined to construct the objective function of OMMDP, which makes OMMDP be robust to noise and outliers. Experimental results on two public plant leaf databases clearly demonstrate the effectiveness of the proposed method for classifying leaf images.


2019 ◽  
Vol 62 (5) ◽  
pp. 1961-2009
Author(s):  
Mieczysław A. Kłopotek

Abstract The widely discussed and applied Johnson–Lindenstrauss (JL) Lemma has an existential form saying that for each set of data points Q in n-dimensional space, there exists a transformation f into an $$n'$$n′-dimensional space ($$n'<n$$n′<n) such that for each pair $$\mathbf {u},\mathbf {v} \in Q$$u,v∈Q$$(1-\delta )\Vert \mathbf {u}-\mathbf {v}\Vert ^2 \le \Vert f(\mathbf {u})-f(\mathbf {v})\Vert ^2 \le (1+\delta )\Vert \mathbf {u}-\mathbf {v}\Vert ^2 $$(1-δ)‖u-v‖2≤‖f(u)-f(v)‖2≤(1+δ)‖u-v‖2 for a user-defined error parameter $$\delta $$δ. Furthermore, it is asserted that with some finite probability the transformation f may be found as a random projection (with scaling) onto the $$n'$$n′ dimensional subspace so that after sufficiently many repetitions of random projection, f will be found with user-defined success rate $$1-\epsilon $$1-ϵ. In this paper, we make a novel use of the JL Lemma. We prove a theorem stating that we can choose the target dimensionality in a random projection-type JL linear transformation in such a way that with probability $$1-\epsilon $$1-ϵ all of data points from Q fall into predefined error range $$\delta $$δ for any user-predefined failure probability $$\epsilon $$ϵ when performing a single random projection. This result is important for applications such as data clustering where we want to have a priori dimensionality reducing transformation instead of attempting a (large) number of them, as with traditional Johnson–Lindenstrauss Lemma. Furthermore, we investigate an important issue whether or not the projection according to JL Lemma is really useful when conducting data processing, that is whether the solutions to the clustering in the projected space apply to the original space. In particular, we take a closer look at the k-means algorithm and prove that a good solution in the projected space is also a good solution in the original space. Furthermore, under proper assumptions local optima in the original space are also ones in the projected space. We investigate also a broader issue of preserving clusterability under JL Lemma projection. We define the conditions for which clusterability property of the original space is transmitted to the projected space, so that a broad class of clustering algorithms for the original space is applicable in the projected space.


2018 ◽  
Vol 225 ◽  
pp. 06023 ◽  
Author(s):  
Samsul Ariffin Bin Abdul Karim ◽  
Azizan Saaban

Scattered data technique is important to visualize the geometrical images of the surface data especially for terrain, earthquake, geochemical distribution, rainfall etc. The main objective of this study is to visualize the terrain data by using cubic Ball triangular patches. First step, the terrain data is triangulated by using Delaunay triangulation. Then partial derivative will be estimated at the data points. Sufficient condition for C1 continuity will be derived for each triangle. Finally, a convex combination comprising three rational local scheme is used to construct the surface. The scheme is tested to visualize the terrain data collected at central region of Malaysia.


1975 ◽  
Vol 7 (04) ◽  
pp. 818-829 ◽  
Author(s):  
R. E. Miles

A simple direct proof is given of Minkowski's result that the mean length of the orthogonal projection of a convex set in E 3 onto an isotropic random line is (2π)–1 times the integral of mean curvature over its surface. This proof is generalised to a correspondingly direct derivation of an analogous formula for the mean projection of a convex set in En onto an isotropic random s-dimensional subspace in En. (The standard derivation of this, and a companion formula, to be found in Bonnesen and Fenchel's classic book on convex sets, is most indirect.) Finally, an alternative short inductive derivation (due to Matheron) of both formulae, by way of Steiner's formula, is presented.


Sign in / Sign up

Export Citation Format

Share Document