Robustness meets algorithms

Ilias Diakonikolas; Gautam Kamath; Daniel M. Kane; Jerry Li; Ankur Moitra; Alistair Stewart

doi:10.1145/3453935

Robustness meets algorithms

Communications of the ACM ◽

10.1145/3453935 ◽

2021 ◽

Vol 64 (5) ◽

pp. 107-115

Author(s):

Ilias Diakonikolas ◽

Gautam Kamath ◽

Daniel M. Kane ◽

Jerry Li ◽

Ankur Moitra ◽

...

Keyword(s):

Machine Learning ◽

Efficient Algorithm ◽

High Dimensional ◽

High Dimensions ◽

Idealized Model

In every corner of machine learning and statistics, there is a need for estimators that work not just in an idealized model, but even when their assumptions are violated. Unfortunately, in high dimensions, being provably robust and being efficiently computable are often at odds with each other. We give the first efficient algorithm for estimating the parameters of a high-dimensional Gaussian that is able to tolerate a constant fraction of corruptions that is independent of the dimension. Prior to our work, all known estimators either needed time exponential in the dimension to compute or could tolerate only an inverse-polynomial fraction of corruptions. Not only does our algorithm bridge the gap between robustness and algorithms, but also it turns out to be highly practical in a variety of settings.

Download Full-text

HDG-select: A novel GUI based application for gene selection and classification in high dimensional datasets

PLoS ONE ◽

10.1371/journal.pone.0246039 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0246039

Author(s):

Shilan S. Hameed ◽

Rohayanti Hassan ◽

Wan Haslina Hassan ◽

Fahmi F. Muhammadsharif ◽

Liza Abdul Latiff

Keyword(s):

Machine Learning ◽

User Interface ◽

Graphical User Interface ◽

Efficient Algorithm ◽

Gene Selection ◽

High Dimensional ◽

Competitive Performance ◽

User Friendly ◽

High Dimensional Datasets

The selection and classification of genes is essential for the identification of related genes to a specific disease. Developing a user-friendly application with combined statistical rigor and machine learning functionality to help the biomedical researchers and end users is of great importance. In this work, a novel stand-alone application, which is based on graphical user interface (GUI), is developed to perform the full functionality of gene selection and classification in high dimensional datasets. The so-called HDG-select application is validated on eleven high dimensional datasets of the format CSV and GEO soft. The proposed tool uses the efficient algorithm of combined filter-GBPSO-SVM and it was made freely available to users. It was found that the proposed HDG-select outperformed other tools reported in literature and presented a competitive performance, accessibility, and functionality.

Download Full-text

High-Dimensional Brain in a High-Dimensional World: Blessing of Dimensionality

Entropy ◽

10.3390/e22010082 ◽

2020 ◽

Vol 22 (1) ◽

pp. 82 ◽

Cited By ~ 6

Author(s):

Alexander N. Gorban ◽

Valery A. Makarov ◽

Ivan Y. Tyukin

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

High Dimensional Data ◽

Curse Of Dimensionality ◽

The Other ◽

High Dimensional ◽

High Dimensions ◽

Applications Of Machine Learning ◽

High Dimensional Datasets ◽

Artificial Intelligence Systems

High-dimensional data and high-dimensional representations of reality are inherent features of modern Artificial Intelligence systems and applications of machine learning. The well-known phenomenon of the “curse of dimensionality” states: many problems become exponentially difficult in high dimensions. Recently, the other side of the coin, the “blessing of dimensionality”, has attracted much attention. It turns out that generic high-dimensional datasets exhibit fairly simple geometric properties. Thus, there is a fundamental tradeoff between complexity and simplicity in high dimensional spaces. Here we present a brief explanatory review of recent ideas, results and hypotheses about the blessing of dimensionality and related simplifying effects relevant to machine learning and neuroscience.

Download Full-text

Preface to the special issue, CMAM 2011, no. 3.

Computational Methods in Applied Mathematics ◽

10.2478/cmam-2011-0014 ◽

2011 ◽

Vol 11 (3) ◽

pp. 272

Author(s):

Ivan Gavrilyuk ◽

Boris Khoromskij ◽

Eugene Tyrtyshnikov

Keyword(s):

Separation Of Variables ◽

Numerical Algorithms ◽

Multilevel Methods ◽

High Dimensional ◽

Special Issue ◽

High Dimensions ◽

Guiding Principle ◽

Tensor Methods ◽

Closed World ◽

The One

Abstract In the recent years, multidimensional numerical simulations with tensor-structured data formats have been recognized as the basic concept for breaking the "curse of dimensionality". Modern applications of tensor methods include the challenging high-dimensional problems of material sciences, bio-science, stochastic modeling, signal processing, machine learning, and data mining, financial mathematics, etc. The guiding principle of the tensor methods is an approximation of multivariate functions and operators with some separation of variables to keep the computational process in a low parametric tensor-structured manifold. Tensors structures had been wildly used as models of data and discussed in the contexts of differential geometry, mechanics, algebraic geometry, data analysis etc. before tensor methods recently have penetrated into numerical computations. On the one hand, the existing tensor representation formats remained to be of a limited use in many high-dimensional problems because of lack of sufficiently reliable and fast software. On the other hand, for moderate dimensional problems (e.g. in "ab-initio" quantum chemistry) as well as for selected model problems of very high dimensions, the application of traditional canonical and Tucker formats in combination with the ideas of multilevel methods has led to the new efficient algorithms. The recent progress in tensor numerical methods is achieved with new representation formats now known as "tensor-train representations" and "hierarchical Tucker representations". Note that the formats themselves could have been picked up earlier in the literature on the modeling of quantum systems. Until 2009 they lived in a closed world of those quantum theory publications and never trespassed the territory of numerical analysis. The tremendous progress during the very recent years shows the new tensor tools in various applications and in the development of these tools and study of their approximation and algebraic properties. This special issue treats tensors as a base for efficient numerical algorithms in various modern applications and with special emphases on the new representation formats.

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text

Time Series-Analysis Based Engineering of High-Dimensional Wide-Area Stability Indices for Machine Learning

IEEE Access ◽

10.1109/access.2021.3099459 ◽

2021 ◽

pp. 1-1

Author(s):

Raoult Teukam Dabou ◽

Innocent Kamwa ◽

C. Y. Chung ◽

C. F. Mugombozi

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Analysis ◽

Wide Area ◽

High Dimensional ◽

Series Analysis ◽

Stability Indices

Download Full-text

Scalable Machine Learning on High-Dimensional Vectors

Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics ◽

10.1145/3405962.3405989 ◽

2020 ◽

Author(s):

Karima Echihabi ◽

Kostas Zoumpatianos ◽

Themis Palpanas

Keyword(s):

Machine Learning ◽

High Dimensional

Download Full-text

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Applied Sciences ◽

10.3390/app11020472 ◽

2021 ◽

Vol 11 (2) ◽

pp. 472

Author(s):

Hyeongmin Cho ◽

Sangkyun Lee

Keyword(s):

Machine Learning ◽

Data Quality ◽

Large Scale ◽

High Dimensional Data ◽

Quality Measures ◽

Training Data ◽

Measure Data ◽

High Dimensional ◽

Small Scale ◽

Class Separability

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text