scholarly journals Integrated random projection and dimensionality reduction by propagating light in photonic lattices

2021 ◽  
Vol 46 (19) ◽  
pp. 4936
Author(s):  
Mohammad-Ali Miri
Author(s):  
Shreya Arya ◽  
Jean-Daniel Boissonnat ◽  
Kunal Dutta ◽  
Martin Lotz

AbstractGiven a set P of n points and a constant k, we are interested in computing the persistent homology of the Čech filtration of P for the k-distance, and investigate the effectiveness of dimensionality reduction for this problem, answering an open question of Sheehy (The persistent homology of distance functions under random projection. In Cheng, Devillers (eds), 30th Annual Symposium on Computational Geometry, SOCG’14, Kyoto, Japan, June 08–11, p 328, ACM, 2014). We show that any linear transformation that preserves pairwise distances up to a $$(1\pm {\varepsilon })$$ ( 1 ± ε ) multiplicative factor, must preserve the persistent homology of the Čech filtration up to a factor of $$(1-{\varepsilon })^{-1}$$ ( 1 - ε ) - 1 . Our results also show that the Vietoris-Rips and Delaunay filtrations for the k-distance, as well as the Čech filtration for the approximate k-distance of Buchet et al. [J Comput Geom, 58:70–96, 2016] are preserved up to a $$(1\pm {\varepsilon })$$ ( 1 ± ε ) factor. We also prove extensions of our main theorem, for point sets (i) lying in a region of bounded Gaussian width or (ii) on a low-dimensional submanifold, obtaining embeddings having the dimension bounds of Lotz (Proc R Soc A Math Phys Eng Sci, 475(2230):20190081, 2019) and Clarkson (Tighter bounds for random projections of manifolds. In Teillaud (ed) Proceedings of the 24th ACM Symposium on Computational Geom- etry, College Park, MD, USA, June 9–11, pp 39–48, ACM, 2008) respectively. Our results also work in the terminal dimensionality reduction setting, where the distance of any point in the original ambient space, to any point in P, needs to be approximately preserved.


Author(s):  
Stanley R.M. Oliveira ◽  
Osmar R. Zaiane

While the sharing of data is known to be beneficial in data mining applications and widely acknowledged as advantageous in business, this information sharing can become controversial and thwarted by privacy regulations and other privacy concerns. Data clustering for instance could be more accurate if more information is available, hence the data sharing. Any solution needs to balance the clustering requirements and the privacy issues. Rather than simply hindering data owners from sharing information for data analysis, a solution could be designed to meet privacy requirements and guarantee valid data clustering results. To achieve this dual goal, this chapter introduces a method for privacy-preserving clustering, called Dimensionality Reduction-Based Transformation (DRBT). This method relies on the intuition behind random projection to protect the underlying attribute values subjected to cluster analysis. It is shown analytically and empirically that transforming a dataset using DRBT, a data owner can achieve privacy preservation and get accurate clustering with little overhead of communication cost. Such a method presents the following advantages: it is independent of distance-based clustering algorithms; it has a sound mathematical foundation; and it does not require CPU-intensive operations.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Mohammad Amin Shayegan ◽  
Saeed Aghabozorgi ◽  
Ram Gopal Raj

Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.


2018 ◽  
Author(s):  
Benjamin Schmidt

Digital libraries today distribute their contents in a way that limits the sort of work that can be done with them. Modern libraries are so large-often containing millions of books or articles-that the technical resources needed to work with them can be immense. Beginning researchers and students often cannot practically obtain more than a few thousand books at a time. Advanced researchers must use (often incomplete) metadata to decide which books are of interest for their projects; and libraries themselves lack ways for make their full-text holdings easily discoverable by researchers or integrated with other collections.


Sign in / Sign up

Export Citation Format

Share Document