scholarly journals Dimension-Free Error Bounds from Random Projections

Author(s):  
Ata Kabán

Learning from high dimensional data is challenging in general – however, often the data is not truly high dimensional in the sense that it may have some hidden low complexity geometry. We give new, user-friendly PAC-bounds that are able to take advantage of such benign geometry to reduce dimensional-dependence of error-guarantees in settings where such dependence is known to be essential in general. This is achieved by employing random projection as an analytic tool, and exploiting its structure-preserving compression ability. We introduce an auxiliary function class that operates on reduced dimensional inputs, and a new complexity term, as the distortion of the loss under random projections. The latter is a hypothesis-dependent data-complexity, whose analytic estimates turn out to recover various regularisation schemes in parametric models, and a notion of intrinsic dimension, as quantified by the Gaussian width of the input support in the case of the nearest neighbour rule. If there is benign geometry present, then the bounds become tighter, otherwise they recover the original dimension-dependent bounds.

2013 ◽  
Vol 23 (2) ◽  
pp. 447-461 ◽  
Author(s):  
Ewa Skubalska-Rafajłowicz

The method of change (or anomaly) detection in high-dimensional discrete-time processes using a multivariate Hotelling chart is presented. We use normal random projections as a method of dimensionality reduction. We indicate diagnostic properties of the Hotelling control chart applied to data projected onto a random subspace of Rn. We examine the random projection method using artificial noisy image sequences as examples.


Author(s):  
Shreya Arya ◽  
Jean-Daniel Boissonnat ◽  
Kunal Dutta ◽  
Martin Lotz

AbstractGiven a set P of n points and a constant k, we are interested in computing the persistent homology of the Čech filtration of P for the k-distance, and investigate the effectiveness of dimensionality reduction for this problem, answering an open question of Sheehy (The persistent homology of distance functions under random projection. In Cheng, Devillers (eds), 30th Annual Symposium on Computational Geometry, SOCG’14, Kyoto, Japan, June 08–11, p 328, ACM, 2014). We show that any linear transformation that preserves pairwise distances up to a $$(1\pm {\varepsilon })$$ ( 1 ± ε ) multiplicative factor, must preserve the persistent homology of the Čech filtration up to a factor of $$(1-{\varepsilon })^{-1}$$ ( 1 - ε ) - 1 . Our results also show that the Vietoris-Rips and Delaunay filtrations for the k-distance, as well as the Čech filtration for the approximate k-distance of Buchet et al. [J Comput Geom, 58:70–96, 2016] are preserved up to a $$(1\pm {\varepsilon })$$ ( 1 ± ε ) factor. We also prove extensions of our main theorem, for point sets (i) lying in a region of bounded Gaussian width or (ii) on a low-dimensional submanifold, obtaining embeddings having the dimension bounds of Lotz (Proc R Soc A Math Phys Eng Sci, 475(2230):20190081, 2019) and Clarkson (Tighter bounds for random projections of manifolds. In Teillaud (ed) Proceedings of the 24th ACM Symposium on Computational Geom- etry, College Park, MD, USA, June 9–11, pp 39–48, ACM, 2008) respectively. Our results also work in the terminal dimensionality reduction setting, where the distance of any point in the original ambient space, to any point in P, needs to be approximately preserved.


2021 ◽  
Vol 11 (15) ◽  
pp. 6963
Author(s):  
Jan Y. K. Chan ◽  
Alex Po Leung ◽  
Yunbo Xie

Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the kernel matrix and distances in a lower-dimensional space Rd before the kernel k-means clustering motivated by upper error bounds. With random projections, previous work on bounds for dot products and an improved bound for kernel methods are considered for kernel k-means. The complexities for both kernel k-means with Lloyd’s algorithm and centroid initialization with k-means++ are known to be O(nkD) and Θ(nkD), respectively, with n being the number of data points, the dimensionality of input feature vectors D and the number of clusters k. The proposed method reduces the computational complexity for the kernel computation of kernel k-means from O(n2D) to O(n2d) and the subsequent computation for k-means with Lloyd’s algorithm and centroid initialization from O(nkD) to O(nkd). Our experiments demonstrate that the speed-up of the clustering method with reduced dimensionality d=200 is 2 to 26 times with very little performance degradation (less than one percent) in general.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 1046 ◽  
Author(s):  
Abeer D. Algarni ◽  
Ghada M. El Banby ◽  
Naglaa F. Soliman ◽  
Fathi E. Abd El-Samie ◽  
Abdullah M. Iliyasu

To circumvent problems associated with dependence on traditional security systems on passwords, Personal Identification Numbers (PINs) and tokens, modern security systems adopt biometric traits that are inimitable to each individual for identification and verification. This study presents two different frameworks for secure person identification using cancellable face recognition (CFR) schemes. Exploiting its ability to guarantee irrevocability and rich diversity, both frameworks utilise Random Projection (RP) to encrypt the biometric traits. In the first framework, a hybrid structure combining Intuitionistic Fuzzy Logic (IFL) with RP is used to accomplish full distortion and encryption of the original biometric traits to be saved in the database, which helps to prevent unauthorised access of the biometric data. The framework involves transformation of spatial-domain greyscale pixel information to a fuzzy domain where the original biometric images are disfigured and further distorted via random projections that generate the final cancellable traits. In the second framework, cancellable biometric traits are similarly generated via homomorphic transforms that use random projections to encrypt the reflectance components of the biometric traits. Here, the use of reflectance properties is motivated by its ability to retain most image details, while the guarantee of the non-invertibility of the cancellable biometric traits supports the rationale behind our utilisation of another RP stage in both frameworks, since independent outcomes of both the IFL stage and the reflectance component of the homomorphic transform are not enough to recover the original biometric trait. Our CFR schemes are validated on different datasets that exhibit properties expected in actual application settings such as varying backgrounds, lightings, and motion. Outcomes in terms standard metrics, including structural similarity index metric (SSIM) and area under the receiver operating characteristic curve (AROC), suggest the efficacy of our proposed schemes across many applications that require person identification and verification.


2012 ◽  
Vol 24 (11) ◽  
pp. 2994-3024 ◽  
Author(s):  
Varun Raj Kompella ◽  
Matthew Luciw ◽  
Jürgen Schmidhuber

We introduce here an incremental version of slow feature analysis (IncSFA), combining candid covariance-free incremental principal components analysis (CCIPCA) and covariance-free incremental minor components analysis (CIMCA). IncSFA's feature updating complexity is linear with respect to the input dimensionality, while batch SFA's (BSFA) updating complexity is cubic. IncSFA does not need to store, or even compute, any covariance matrices. The drawback to IncSFA is data efficiency: it does not use each data point as effectively as BSFA. But IncSFA allows SFA to be tractably applied, with just a few parameters, directly on high-dimensional input streams (e.g., visual input of an autonomous agent), while BSFA has to resort to hierarchical receptive-field-based architectures when the input dimension is too high. Further, IncSFA's updates have simple Hebbian and anti-Hebbian forms, extending the biological plausibility of SFA. Experimental results show IncSFA learns the same set of features as BSFA and can handle a few cases where BSFA fails.


2018 ◽  
Vol 55 (4) ◽  
pp. 1060-1077 ◽  
Author(s):  
Steven S. Kim ◽  
Kavita Ramanan

Abstract The study of high-dimensional distributions is of interest in probability theory, statistics, and asymptotic convex geometry, where the object of interest is the uniform distribution on a convex set in high dimensions. The ℓp-spaces and norms are of particular interest in this setting. In this paper we establish a limit theorem for distributions on ℓp-spheres, conditioned on a rare event, in a high-dimensional geometric setting. As part of our proof, we establish a certain large deviation principle that is also relevant to the study of the tail behavior of random projections of ℓp-balls in a high-dimensional Euclidean space.


Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 727 ◽  
Author(s):  
Hlynur Jónsson ◽  
Giovanni Cherubini ◽  
Evangelos Eleftheriou

Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.


2018 ◽  
Vol 5 (2) ◽  
pp. 39 ◽  
Author(s):  
Syifaul Fuada ◽  
Trio Adiono

 The development of educational kit must be compiled on how to prepare undergraduate students in the engineering field in following the new trends globally and becoming an alternative to the technical education system based on practical approach. The primary motivation for this research is to design and implement the visible light communications (VLC) educational toolkit, especially in the analog front-end part. It consisted of six kits (transimpedance amplifier, pre-amplifier, DC-offset remover, analog filter, and AGC) in which each kit has one practicum task. There are six tasks and one task for the project by integrating these kits. The undergraduate students can use this educational kit to investigate the physical layer in a VLC system. It provides a low-complexity educational kit, so this is becoming an alternative as supplement course offered in this field. Then, it has a simple design and user-friendly. The development of educational kit must be compiled on how to prepare undergraduate students in the engineering field in following the new trends globally and becoming an alternative to the technical education system based on practical approach. The primary motivation for this research is to design and implement the visible light communications (VLC) educational toolkit, especially in the analog front-end part. It consisted of six kits (transimpedance amplifier, pre-amplifier, DC-offset remover, analog filter, and AGC) in which each kit has one practicum task. There are six tasks and one task for the project by integrating these kits. The undergraduate students can use this educational kit to investigate the physical layer in a VLC system. It provides a low-complexity educational kit, so this is becoming an alternative as supplement course offered in this field. Then, it has a simple design and user-friendly.


2018 ◽  
Vol 116 (3) ◽  
pp. 950-959 ◽  
Author(s):  
Patrick Maffucci ◽  
Benedetta Bigio ◽  
Franck Rapaport ◽  
Aurélie Cobat ◽  
Alessandro Borghesi ◽  
...  

Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient’s exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11–65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.


Sign in / Sign up

Export Citation Format

Share Document