curse of dimensionality
Recently Published Documents


TOTAL DOCUMENTS

320
(FIVE YEARS 87)

H-INDEX

32
(FIVE YEARS 6)

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1690
Author(s):  
Charline Le Lan ◽  
Laurent Dinh

Thanks to the tractability of their likelihood, several deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities through the lens of reparametrization and show that these quantities carry less meaningful information than previously thought, beyond estimation issues or the curse of dimensionality. We conclude that the use of these likelihoods for anomaly detection relies on strong and implicit hypotheses, and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection.


2021 ◽  
Vol 2021 (12) ◽  
pp. 124009
Author(s):  
Behrooz Ghorbani ◽  
Song Mei ◽  
Theodor Misiakiewicz ◽  
Andrea Montanari

Abstract For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.


2021 ◽  
Author(s):  
Liang Chen

Abstract In this paper, we theoretically propose a new hashing scheme to establish the sparse Fourier transform in high-dimension space. The estimation of the algorithm complexity shows that this sparse Fourier transform can overcome the curse of dimensionality. To the best of our knowledge, this is the first polynomial-time algorithm to recover the high-dimensional continuous frequencies.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sandeep Kumar Hegde ◽  
Monica R. Mundada

PurposeChronic diseases are considered as one of the serious concerns and threats to public health across the globe. Diseases such as chronic diabetes mellitus (CDM), cardio vasculardisease (CVD) and chronic kidney disease (CKD) are major chronic diseases responsible for millions of death. Each of these diseases is considered as a risk factor for the other two diseases. Therefore, noteworthy attention is being paid to reduce the risk of these diseases. A gigantic amount of medical data is generated in digital form from smart healthcare appliances in the current era. Although numerous machine learning (ML) algorithms are proposed for the early prediction of chronic diseases, these algorithmic models are neither generalized nor adaptive when the model is imposed on new disease datasets. Hence, these algorithms have to process a huge amount of disease data iteratively until the model converges. This limitation may make it difficult for ML models to fit and produce imprecise results. A single algorithm may not yield accurate results. Nonetheless, an ensemble of classifiers built from multiple models, that works based on a voting principle has been successfully applied to solve many classification tasks. The purpose of this paper is to make early prediction of chronic diseases using hybrid generative regression based deep intelligence network (HGRDIN) model.Design/methodology/approachIn the proposed paper generative regression (GR) model is used in combination with deep neural network (DNN) for the early prediction of chronic disease. The GR model will obtain prior knowledge about the labelled data by analyzing the correlation between features and class labels. Hence, the weight assignment process of DNN is influenced by the relationship between attributes rather than random assignment. The knowledge obtained through these processes is passed as input to the DNN network for further prediction. Since the inference about the input data instances is drawn at the DNN through the GR model, the model is named as hybrid generative regression-based deep intelligence network (HGRDIN).FindingsThe credibility of the implemented approach is rigorously validated using various parameters such as accuracy, precision, recall, F score and area under the curve (AUC) score. During the training phase, the proposed algorithm is constantly regularized using the elastic net regularization technique and also hyper-tuned using the various parameters such as momentum and learning rate to minimize the misprediction rate. The experimental results illustrate that the proposed approach predicted the chronic disease with a minimal error by avoiding the possible overfitting and local minima problems. The result obtained with the proposed approach is also compared with the various traditional approaches.Research limitations/implicationsUsually, the diagnostic data are multi-dimension in nature where the performance of the ML algorithm will degrade due to the data overfitting, curse of dimensionality issues. The result obtained through the experiment has achieved an average accuracy of 95%. Hence, analysis can be made further to improve predictive accuracy by overcoming the curse of dimensionality issues.Practical implicationsThe proposed ML model can mimic the behavior of the doctor's brain. These algorithms have the capability to replace clinical tasks. The accurate result obtained through the innovative algorithms can free the physician from the mundane care and practices so that the physician can focus more on the complex issues.Social implicationsUtilizing the proposed predictive model at the decision-making level for the early prediction of the disease is considered as a promising change towards the healthcare sector. The global burden of chronic disease can be reduced at an exceptional level through these approaches.Originality/valueIn the proposed HGRDIN model, the concept of transfer learning approach is used where the knowledge acquired through the GR process is applied on DNN that identified the possible relationship between the dependent and independent feature variables by mapping the chronic data instances to its corresponding target class before it is being passed as input to the DNN network. Hence, the result of the experiments illustrated that the proposed approach obtained superior performance in terms of various validation parameters than the existing conventional techniques.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Visar Berisha ◽  
Chelsea Krantsevich ◽  
P. Richard Hahn ◽  
Shira Hahn ◽  
Gautam Dasarathy ◽  
...  

AbstractDigital health data are multimodal and high-dimensional. A patient’s health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients’ lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting—their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models—a phenomenon known as “the curse of dimensionality” in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.


2021 ◽  
Vol 80 (3) ◽  
pp. 34-48
Author(s):  
Ksenia Mayorova ◽  
◽  
Nikita Nikita ◽  

In this paper, we apply a set of machine learning and econometrics models, namely: Elastic Net, Random Forest, XGBoost, and SSVS to nowcasting (estimate for the current period) the dollar volumes of Russian exports and imports by a commodity group. We use lags in the volumes of export and import commodity groups, and exchange prices for some goods and other variables, due to which the curse of dimensionality becomes quite acute. The models we use have proven themselves well in forecasting in the presence of the curse of dimensionality, when the number of model parameters exceeds the number of observations. The best-performing model appears to be the weighted machine learning model, which outperforms the ARIMA benchmark model in nowcasting the volume of both exports and imports. According to the Diebold– Mariano test, in the case of the largest commodity groups our model often manages to obtain significantly more accurate nowcasts relative to the ARIMA model. The resulting estimates turn out to be quite close to the Bank of Russia’s historical forecasts built under comparable conditions.


Author(s):  
Lukas Gonon ◽  
Christoph Schwab

AbstractWe study the expression rates of deep neural networks (DNNs for short) for option prices written on baskets of $d$ d risky assets whose log-returns are modelled by a multivariate Lévy process with general correlation structure of jumps. We establish sufficient conditions on the characteristic triplet of the Lévy process $X$ X that ensure $\varepsilon $ ε error of DNN expressed option prices with DNNs of size that grows polynomially with respect to ${\mathcal{O}}(\varepsilon ^{-1})$ O ( ε − 1 ) , and with constants implied in ${\mathcal{O}}(\, \cdot \, )$ O ( ⋅ ) which grow polynomially in $d$ d , thereby overcoming the curse of dimensionality (CoD) and justifying the use of DNNs in financial modelling of large baskets in markets with jumps.In addition, we exploit parabolic smoothing of Kolmogorov partial integro-differential equations for certain multivariate Lévy processes to present alternative architectures of ReLU (“rectified linear unit”) DNNs that provide $\varepsilon $ ε expression error in DNN size ${\mathcal{O}}(|\log (\varepsilon )|^{a})$ O ( | log ( ε ) | a ) with exponent $a$ a proportional to $d$ d , but with constants implied in ${\mathcal{O}}(\, \cdot \, )$ O ( ⋅ ) growing exponentially with respect to $d$ d . Under stronger, dimension-uniform non-degeneracy conditions on the Lévy symbol, we obtain algebraic expression rates of option prices in exponential Lévy models which are free from the curse of dimensionality. In this case, the ReLU DNN expression rates of prices depend on certain sparsity conditions on the characteristic Lévy triplet. We indicate several consequences and possible extensions of the presented results.


2021 ◽  
Author(s):  
Jue Wang

In multiclass classification, one faces greater uncertainty when the data fall near the decision boundary. To reduce the uncertainty, one can wait and collect more data, but this invariably delays the decision. How can one make an accurate classification as quickly as possible? The solution requires a multiclass generalization of Wald’s sequential hypothesis testing, but the standard formulation is intractable because of the curse of dimensionality in dynamic programming. In “Optimal Sequential Multiclass Diagnosis,” Wang shows that, in a broad class of practical problems, the reachable state space is often restricted on, or near, a set of low-dimensional, time-dependent manifolds. After understanding the key drivers of sparsity, the author develops a new solution framework that uses a low-dimensional statistic to reconstruct the high-dimensional state. This framework circumvents the curse of dimensionality, allowing efficient computation of the optimal or near-optimal policies for quickest classification with large numbers of classes.


2021 ◽  
Vol 11 (16) ◽  
pp. 7766
Author(s):  
Dewang Chen ◽  
Jijie Cai ◽  
Yunhu Huang ◽  
Yisheng Lv

Fuzzy systems (FSs) are popular and interpretable machine learning methods, represented by the adaptive neuro-fuzzy inference system (ANFIS). However, they have difficulty dealing with high-dimensional data due to the curse of dimensionality. To effectively handle high-dimensional data and ensure optimal performance, this paper presents a deep neural fuzzy system (DNFS) based on the subtractive clustering-based ANFIS (SC-ANFIS). Inspired by deep learning, the SC-ANFIS is proposed and adopted as a submodule to construct the DNFS in a bottom-up way. Through the ensemble learning and hierarchical learning of submodules, DNFS can not only achieve faster convergence, but also complete the computation in a reasonable time with high accuracy and interpretability. By adjusting the deep structure and the parameters of the DNFS, the performance can be improved further. This paper also performed a profound study of the structure and the combination of the submodule inputs for the DNFS. Experimental results on five regression datasets with various dimensionality demonstrated that the proposed DNFS can not only solve the curse of dimensionality, but also achieve higher accuracy, less complexity, and better interpretability than previous FSs. The superiority of the DNFS is also validated over other recent algorithms especially when the dimensionality of the data is higher. Furthermore, the DNFS built with five inputs for each submodule and two inputs shared between adjacent submodules had the best performance. The performance of the DNFS can be improved by distributing the features with high correlation with the output to each submodule. Given the results of the current study, it is expected that the DNFS will be used to solve general high-dimensional regression problems efficiently with high accuracy and better interpretability.


Sign in / Sign up

Export Citation Format

Share Document