scholarly journals Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

2020 ◽  
Vol 7 (2) ◽  
pp. 190714 ◽  
Author(s):  
Omar Shetta ◽  
Mahesan Niranjan

The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Joshua T. Vogelstein ◽  
Eric W. Bridgeford ◽  
Minh Tang ◽  
Da Zheng ◽  
Christopher Douville ◽  
...  

AbstractTo solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gaweł I. Kuś ◽  
Sybrand van der Zwaag ◽  
Miguel A. Bessa

AbstractGaussian processes are well-established Bayesian machine learning algorithms with significant merits, despite a strong limitation: lack of scalability. Clever solutions address this issue by inducing sparsity through low-rank approximations, often based on the Nystrom method. Here, we propose a different method to achieve better scalability and higher accuracy using quantum computing, outperforming classical Bayesian neural networks for large datasets significantly. Unlike other approaches to quantum machine learning, the computationally expensive linear algebra operations are not just replaced with their quantum counterparts. Instead, we start from a recent study that proposed a quantum circuit for implementing quantum Gaussian processes and then we use quantum phase estimation to induce a low-rank approximation analogous to that in classical sparse Gaussian processes. We provide evidence through numerical tests, mathematical error bound estimation, and complexity analysis that the method can address the “curse of dimensionality,” where each additional input parameter no longer leads to an exponential growth of the computational cost. This is also demonstrated by applying the algorithm in a practical setting and using it in the data-driven design of a recently proposed metamaterial. The algorithm, however, requires significant quantum computing hardware improvements before quantum advantage can be achieved.


Electronics ◽  
2019 ◽  
Vol 8 (6) ◽  
pp. 634 ◽  
Author(s):  
Mandar Bivalkar ◽  
Dharmendra Singh ◽  
Hirokazu Kobayashi

In through wall imaging, clutter plays an important role in the detection of objects behind the wall. In the literature, extensive studies have been carried out to eliminate clutter in the case of targets with the same dielectric. Existing clutter reduction techniques, such as the sub-space approach, differential approach, entropy-based time gating, etc., are able to detect a single target or two targets with the same dielectric behind the wall. In a real-time scenario, it is not necessary that targets with the same dielectric will be present behind the wall. Very few studies are available for the detection of targets with different dielectrics; here we termed it “contrast target detection” in the same scene. Recently, low-rank approximation (LRA) was proposed to reduce random noise in the data. In this paper, a novel method based on entropy thresholding for low-rank approximation is introduced for contrast target detection. It was observed that our proposed method gives satisfactory results.


Author(s):  
Dominik Alfke ◽  
Martin Stoll

AbstractGraph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets. We propose a new GCN variant whose three-part filter space is targeted at dense graphs. Our examples include graphs generated from 3D point clouds with an increased focus on non-local information, as well as hypergraphs based on categorical data of real-world problems. These graphs differ from the common sparse benchmark graphs in terms of the spectral properties of their graph Laplacian. Most notably we observe large eigengaps, which are unfavorable for popular existing GCN architectures. Our method overcomes these issues by utilizing the pseudoinverse of the Laplacian. Another key ingredient is a low-rank approximation of the convolutional matrix, ensuring computational efficiency and increasing accuracy at the same time. We outline how the necessary eigeninformation can be computed efficiently in each applications and discuss the appropriate choice of the only metaparameter, the approximation rank. We finally showcase our method’s performance regarding runtime and accuracy in various experiments with real-world datasets.


2020 ◽  
Vol 14 (12) ◽  
pp. 2791-2798
Author(s):  
Xiaoqun Qiu ◽  
Zhen Chen ◽  
Saifullah Adnan ◽  
Hongwei He

2020 ◽  
Vol 6 ◽  
pp. 922-933
Author(s):  
M. Amine Hadj-Youcef ◽  
Francois Orieux ◽  
Alain Abergel ◽  
Aurelia Fraysse

Sign in / Sign up

Export Citation Format

Share Document