intrinsic dimensionality
Recently Published Documents


TOTAL DOCUMENTS

127
(FIVE YEARS 24)

H-INDEX

20
(FIVE YEARS 1)

2022 ◽  
Vol 8 ◽  
pp. e790
Author(s):  
Zsigmond Benkő ◽  
Marcell Stippinger ◽  
Roberta Rehus ◽  
Attila Bencze ◽  
Dániel Fabó ◽  
...  

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold adaptive Farahmand-Szepesvári-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected median-FSA estimator with kNN estimators: maximum likelihood (Levina-Bickel), the 2NN and two implementations of DANCo (R and MATLAB). We show that corrected median-FSA estimator beats the maximum likelihood estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.


2021 ◽  
Vol 17 (11) ◽  
pp. e1008591
Author(s):  
Ege Altan ◽  
Sara A. Solla ◽  
Lee E. Miller ◽  
Eric J. Perreault

It is generally accepted that the number of neurons in a given brain area far exceeds the number of neurons needed to carry any specific function controlled by that area. For example, motor areas of the human brain contain tens of millions of neurons that control the activation of tens or at most hundreds of muscles. This massive redundancy implies the covariation of many neurons, which constrains the population activity to a low-dimensional manifold within the space of all possible patterns of neural activity. To gain a conceptual understanding of the complexity of the neural activity within a manifold, it is useful to estimate its dimensionality, which quantifies the number of degrees of freedom required to describe the observed population activity without significant information loss. While there are many algorithms for dimensionality estimation, we do not know which are well suited for analyzing neural activity. The objective of this study was to evaluate the efficacy of several representative algorithms for estimating the dimensionality of linearly and nonlinearly embedded data. We generated synthetic neural recordings with known intrinsic dimensionality and used them to test the algorithms’ accuracy and robustness. We emulated some of the important challenges associated with experimental data by adding noise, altering the nature of the embedding of the low-dimensional manifold within the high-dimensional recordings, varying the dimensionality of the manifold, and limiting the amount of available data. We demonstrated that linear algorithms overestimate the dimensionality of nonlinear, noise-free data. In cases of high noise, most algorithms overestimated the dimensionality. We thus developed a denoising algorithm based on deep learning, the “Joint Autoencoder”, which significantly improved subsequent dimensionality estimation. Critically, we found that all algorithms failed when the intrinsic dimensionality was high (above 20) or when the amount of data used for estimation was low. Based on the challenges we observed, we formulated a pipeline for estimating the dimensionality of experimental neural data.


Author(s):  
Sandamal Weerasinghe ◽  
Tamas Abraham ◽  
Tansu Alpcan ◽  
Sarah M. Erfani ◽  
Christopher Leckie ◽  
...  

Nonlinear regression, although widely used in engineering, financial and security applications for automated decision making, is known to be vulnerable to training data poisoning. Targeted poisoning attacks may cause learning algorithms to fit decision functions with poor predictive performance. This paper presents a new analysis of local intrinsic dimensionality (LID) of nonlinear regression under such poisoning attacks within a Stackelberg game, leading to a practical defense. After adapting a gradient-based attack on linear regression that significantly impairs prediction capabilities to nonlinear settings, we consider a multi-step unsupervised black-box defense. The first step identifies samples that have the greatest influence on the learner's validation error; we then use the theory of local intrinsic dimensionality, which reveals the degree of being an outlier of data samples, to iteratively identify poisoned samples via a generative probabilistic model, and suppress their influence on the prediction function. Empirical validation demonstrates superior performance compared to a range of recent defenses.


2021 ◽  
pp. 31-44
Author(s):  
Sylvain Lespinats ◽  
Benoit Colange ◽  
Denys Dutykh

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shuo Zhou ◽  
Antoinette Tordesillas ◽  
Mehdi Pouragha ◽  
James Bailey ◽  
Howard Bondell

AbstractWe propose a new metric called s-LID based on the concept of Local Intrinsic Dimensionality to identify and quantify hierarchies of kinematic patterns in heterogeneous media. s-LID measures how outlying a grain’s motion is relative to its s nearest neighbors in displacement state space. To demonstrate the merits of s-LID over the conventional measure of strain, we apply it to data on individual grain motions in a set of deforming granular materials. Several new insights into the evolution of failure are uncovered. First,s-LID reveals a hierarchy of concurrent deformation bands that prevails throughout loading history. These structures vary not only in relative dominance but also spatial and kinematic scales. Second, in the nascent stages of the pre-failure regime, s-LID uncovers a set of system-spanning, criss-crossing bands: microbands for small s and embryonic-shearbands at large s, with the former being dominant. At the opposite extreme, in the failure regime, fully formed shearbands at large s dominate over the microbands. The novel patterns uncovered from s-LID contradict the common belief of a causal sequence where a subset of microbands coalesce and/or grow to form shearbands. Instead, s-LID suggests that the deformation of the sample in the lead-up to failure is governed by a complex symbiosis among these different coexisting structures, which amplifies and promotes the progressive dominance of the embryonic-shearbands over microbands. Third, we probed this transition from the microband-dominated regime to the shearband-dominated regime by systematically suppressing grain rotations. We found particle rotation to be an essential enabler of the transition to the shearband-dominated regime. When grain rotations are completely suppressed, this transition is prevented: microbands and shearbands coexist in relative parity.


2021 ◽  
Author(s):  
Elise Claire Amblard ◽  
Jonathan Bac ◽  
Alexander Chervov ◽  
Vassili Soumelis ◽  
Andrei Zinovyev

Background. Single-cell RNA-seq datasets are characterized by large ambient dimensionality, and their analyses, such as clustering or cell trajectory inference, can be affected by various manifestations of the dimensionality curse. One of these manifestations is the hubness phenomenon, i.e. existence of data points with surprisingly large incoming connectivity degree in the neighbourhood graph. Conventional approach to dampen the unwanted effects of high dimension consists in applying drastic dimensionality reduction, which is critical especially in scRNA-seq. It remains unexplored if this step can be avoided thus retaining more information than contained in the low-dimensional projections, by correcting directly an effect of the high dimension. Results. We investigate the phenomenon of hubness in scRNA-seq data, and its manifestation in spaces of increasing dimensionality. We also link increasing hubness to increased levels of dropout in sequencing data. We show that hub cells do not represent any visible technical or biological bias. The effect of various hubness reduction methods is investigated with respect to the visualization, clustering and trajectory inference tasks in scRNA-seq datasets. We show that applying hubness reduction generates neighbourhood graphs with properties more suitable for applying machine learning methods; and that hubness reduction outperforms other state-of-the-art methods for improving neighbourhood graphs. As a consequence, clustering, trajectory inference and visualisation perform better, especially for datasets characterized by large intrinsic dimensionality. Conclusion. Hubness is an important phenomenon in sequencing data. Reducing hubness can be beneficial for the analysis of scRNA-seq data characterized by large intrinsic dimensionality in which case it can be used as an alternative to drastic dimensionality reduction.


Sign in / Sign up

Export Citation Format

Share Document