scholarly journals A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent*

2021 ◽  
Vol 2021 (12) ◽  
pp. 124006
Author(s):  
Zhenyu Liao ◽  
Romain Couillet ◽  
Michael W Mahoney

Abstract This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples n, their dimension p, and the dimension of feature space N are all large and comparable. In this regime, the random RFF Gram matrix no longer converges to the well-known limiting Gaussian kernel matrix (as it does when N → ∞ alone), but it still has a tractable behavior that is captured by our analysis. This analysis also provides accurate estimates of training and test regression errors for large n, p, N. Based on these estimates, a precise characterization of two qualitatively different phases of learning, including the phase transition between them, is provided; and the corresponding double descent test error curve is derived from this phase transition behavior. These results do not depend on strong assumptions on the data distribution, and they perfectly match empirical results on real-world data sets.

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Yu Wang

Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH), to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.


2021 ◽  
Author(s):  
Alberto Vera ◽  
Siddhartha Banerjee ◽  
Samitha Samaranayake

Motivated by the needs of modern transportation service platforms, we study the problem of computing constrained shortest paths (CSP) at scale via preprocessing techniques. Our work makes two contributions in this regard: 1) We propose a scalable algorithm for CSP queries and show how its performance can be parametrized in terms of a new network primitive, the constrained highway dimension. This development extends recent work that established the highway dimension as the appropriate primitive for characterizing the performance of unconstrained shortest-path (SP) algorithms. Our main theoretical contribution is deriving conditions relating the two notions, thereby providing a characterization of networks where CSP and SP queries are of comparable hardness. 2) We develop practical algorithms for scalable CSP computation, augmenting our theory with additional network clustering heuristics. We evaluate these algorithms on real-world data sets to validate our theoretical findings. Our techniques are orders of magnitude faster than existing approaches while requiring only limited additional storage and preprocessing.


2018 ◽  
Vol 9 (21) ◽  
pp. 2887-2896 ◽  
Author(s):  
Shoumin Chen ◽  
Xuezhen Lin ◽  
Zhenghao Zhai ◽  
Ruyue Lan ◽  
Jin Li ◽  
...  

A class of poly(ionic liquid) microgels exhibiting CO2-switchable temperature-responsive volume phase transition behavior have been synthesized and used for CO2 fixation.


Author(s):  
Hoda Heidari ◽  
Andreas Krause

We study fairness in sequential decision making environments, where at each time step a learning algorithm receives data corresponding to a new individual (e.g. a new job application) and must make an irrevocable decision about him/her (e.g. whether to hire the applicant) based on observations made so far. In order to prevent cases of disparate treatment, our time-dependent notion of fairness requires algorithmic decisions to be consistent: if two individuals are similar in the feature space and arrive during the same time epoch, the algorithm must assign them to similar outcomes. We propose a general framework for post-processing predictions made by a black-box learning model, that guarantees the resulting sequence of outcomes is consistent. We show theoretically that imposing consistency will not significantly slow down learning. Our experiments on two real-world data sets illustrate and confirm this finding in practice.


2016 ◽  
Vol 28 (4) ◽  
pp. 716-742 ◽  
Author(s):  
Saurabh Paul ◽  
Petros Drineas

We introduce single-set spectral sparsification as a deterministic sampling–based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.


2000 ◽  
Vol 11 (1) ◽  
pp. 81-88 ◽  
Author(s):  
Fumitoshi Hirayama ◽  
Masatoshi Honjo ◽  
Hidetoshi Arima ◽  
Kazuto Okimoto ◽  
Kaneto Uekama

2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Mehmet Şimşek ◽  
Henning Meyerhenke

Abstract Influence Maximization (IM) aims at finding the most influential users in a social network, that is, users who maximize the spread of an opinion within a certain propagation model. Previous work investigated the correlation between influence spread and nodal centrality measures to bypass more expensive IM simulations. The results were promising but incomplete, since these studies investigated the performance (i.e. the ability to identify influential users) of centrality measures only in restricted settings, for example, in undirected/unweighted networks and/or within a propagation model less common for IM. In this article, we first show that good results within the Susceptible-Infected-Removed propagation model for unweighted and undirected networks do not necessarily transfer to directed or weighted networks under the popular Independent Cascade (IC) propagation model. Then, we identify a set of centrality measures with good performance for weighted and directed networks within the IC model. Our main contribution is a new way to combine the centrality measures in a closed formula to yield even better results. Additionally, we also extend gravitational centrality (GC) with the proposed combined centrality measures. Our experiments on 50 real-world data sets show that our proposed centrality measures outperform well-known centrality measures and the state-of-the art GC measure significantly.


2007 ◽  
Vol 19 (7) ◽  
pp. 1919-1938 ◽  
Author(s):  
Jooyoung Park ◽  
Daesung Kang ◽  
Jongho Kim ◽  
James T. Kwok ◽  
Ivor W. Tsang

The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to pattern denoising. Combining the geodesic projection to the spherical decision boundary resulting from the SVDD, together with solving the preimage problem, we propose a new method for pattern denoising. We first solve SVDD for the training data and then for each noisy test pattern, obtain its denoised feature by moving its feature vector along the geodesic on the manifold to the nearest decision boundary of the SVDD ball. Finally we find the location of the denoised pattern by obtaining the pre-image of the denoised feature. The applicability of the proposed method is illustrated by a number of toy and real-world data sets.


2011 ◽  
Vol 8 (4) ◽  
pp. 1143-1157 ◽  
Author(s):  
Xinyue Liu ◽  
Xing Yong ◽  
Hongfei Lin

Similarity matrix is critical to the performance of spectral clustering. Mercer kernels have become popular largely due to its successes in applying kernel methods such as kernel PCA. A novel spectral clustering method is proposed based on local neighborhood in kernel space (SC-LNK), which assumes that each data point can be linearly reconstructed from its neighbors. The SC-LNK algorithm tries to project the data to a feature space by the Mercer kernel, and then learn a sparse matrix using linear reconstruction as the similarity graph for spectral clustering. Experiments have been performed on synthetic and real world data sets and have shown that spectral clustering based on linear reconstruction in kernel space outperforms the conventional spectral clustering and the other two algorithms, especially in real world data sets.


Sign in / Sign up

Export Citation Format

Share Document