scholarly journals Robustness meets algorithms

2021 ◽  
Vol 64 (5) ◽  
pp. 107-115
Author(s):  
Ilias Diakonikolas ◽  
Gautam Kamath ◽  
Daniel M. Kane ◽  
Jerry Li ◽  
Ankur Moitra ◽  
...  

In every corner of machine learning and statistics, there is a need for estimators that work not just in an idealized model, but even when their assumptions are violated. Unfortunately, in high dimensions, being provably robust and being efficiently computable are often at odds with each other. We give the first efficient algorithm for estimating the parameters of a high-dimensional Gaussian that is able to tolerate a constant fraction of corruptions that is independent of the dimension. Prior to our work, all known estimators either needed time exponential in the dimension to compute or could tolerate only an inverse-polynomial fraction of corruptions. Not only does our algorithm bridge the gap between robustness and algorithms, but also it turns out to be highly practical in a variety of settings.

PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0246039
Author(s):  
Shilan S. Hameed ◽  
Rohayanti Hassan ◽  
Wan Haslina Hassan ◽  
Fahmi F. Muhammadsharif ◽  
Liza Abdul Latiff

The selection and classification of genes is essential for the identification of related genes to a specific disease. Developing a user-friendly application with combined statistical rigor and machine learning functionality to help the biomedical researchers and end users is of great importance. In this work, a novel stand-alone application, which is based on graphical user interface (GUI), is developed to perform the full functionality of gene selection and classification in high dimensional datasets. The so-called HDG-select application is validated on eleven high dimensional datasets of the format CSV and GEO soft. The proposed tool uses the efficient algorithm of combined filter-GBPSO-SVM and it was made freely available to users. It was found that the proposed HDG-select outperformed other tools reported in literature and presented a competitive performance, accessibility, and functionality.


Entropy ◽  
2020 ◽  
Vol 22 (1) ◽  
pp. 82 ◽  
Author(s):  
Alexander N. Gorban ◽  
Valery A. Makarov ◽  
Ivan Y. Tyukin

High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the “curse of dimensionality” states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the “blessing of dimensionality”, has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.


2011 ◽  
Vol 11 (3) ◽  
pp. 272
Author(s):  
Ivan Gavrilyuk ◽  
Boris Khoromskij ◽  
Eugene Tyrtyshnikov

Abstract In the recent years, multidimensional numerical simulations with tensor-structured data formats have been recognized as the basic concept for breaking the "curse of dimensionality". Modern applications of tensor methods include the challenging high-dimensional problems of material sciences, bio-science, stochastic modeling, signal processing, machine learning, and data mining, financial mathematics, etc. The guiding principle of the tensor methods is an approximation of multivariate functions and operators with some separation of variables to keep the computational process in a low parametric tensor-structured manifold. Tensors structures had been wildly used as models of data and discussed in the contexts of differential geometry, mechanics, algebraic geometry, data analysis etc. before tensor methods recently have penetrated into numerical computations. On the one hand, the existing tensor representation formats remained to be of a limited use in many high-dimensional problems because of lack of sufficiently reliable and fast software. On the other hand, for moderate dimensional problems (e.g. in "ab-initio" quantum chemistry) as well as for selected model problems of very high dimensions, the application of traditional canonical and Tucker formats in combination with the ideas of multilevel methods has led to the new efficient algorithms. The recent progress in tensor numerical methods is achieved with new representation formats now known as "tensor-train representations" and "hierarchical Tucker representations". Note that the formats themselves could have been picked up earlier in the literature on the modeling of quantum systems. Until 2009 they lived in a closed world of those quantum theory publications and never trespassed the territory of numerical analysis. The tremendous progress during the very recent years shows the new tensor tools in various applications and in the development of these tools and study of their approximation and algebraic properties. This special issue treats tensors as a base for efficient numerical algorithms in various modern applications and with special emphases on the new representation formats.


2020 ◽  
Vol 10 (5) ◽  
pp. 1797 ◽  
Author(s):  
Mera Kartika Delimayanti ◽  
Bedy Purnama ◽  
Ngoc Giang Nguyen ◽  
Mohammad Reza Faisal ◽  
Kunti Robiatul Mahmudah ◽  
...  

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


Sign in / Sign up

Export Citation Format

Share Document