scholarly journals An Analysis and Application of Fast Nonnegative Orthogonal Matching Pursuit for Image Categorization in Deep Networks

2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Bo Wang ◽  
Jichang Guo ◽  
Yan Zhang

Nonnegative orthogonal matching pursuit (NOMP) has been proven to be a more stable encoder for unsupervised sparse representation learning. However, previous research has shown that NOMP is suboptimal in terms of computational cost, as the coefficients selection and refinement using nonnegative least squares (NNLS) have been divided into two separate steps. It is found that this problem severely reduces the efficiency of encoding for large-scale image patches. In this work, we study fast nonnegative OMP (FNOMP) as an efficient encoder which can be accelerated by the implementation ofQRfactorization and iterations of coefficients in deep networks for full-size image categorization task. It is analyzed and demonstrated that using relatively simple gain-shape vector quantization for training dictionary, FNOMP not only performs more efficiently than NOMP for encoding but also significantly improves the classification accuracy compared to OMP based algorithm. In addition, FNOMP based algorithm is superior to other state-of-the-art methods on several publicly available benchmarks, that is, Oxford Flowers, UIUC-Sports, and Caltech101.

2015 ◽  
Vol 24 (1) ◽  
pp. 135-143 ◽  
Author(s):  
Omer F. Alcin ◽  
Abdulkadir Sengur ◽  
Jiang Qian ◽  
Melih C. Ince

AbstractExtreme learning machine (ELM) is a recent scheme for single hidden layer feed forward networks (SLFNs). It has attracted much interest in the machine intelligence and pattern recognition fields with numerous real-world applications. The ELM structure has several advantages, such as its adaptability to various problems with a rapid learning rate and low computational cost. However, it has shortcomings in the following aspects. First, it suffers from the irrelevant variables in the input data set. Second, choosing the optimal number of neurons in the hidden layer is not well defined. In case the hidden nodes are greater than the training data, the ELM may encounter the singularity problem, and its solution may become unstable. To overcome these limitations, several methods have been proposed within the regularization framework. In this article, we considered a greedy method for sparse approximation of the output weight vector of the ELM network. More specifically, the orthogonal matching pursuit (OMP) algorithm is embedded to the ELM. This new technique is named OMP-ELM. OMP-ELM has several advantages over regularized ELM methods, such as lower complexity and immunity to the singularity problem. Experimental works on nine commonly used regression problems indicate that the investigated OMP-ELM method confirms these advantages. Moreover, OMP-ELM is compared with the ELM method, the regularized ELM scheme, and artificial neural networks.


Author(s):  
Dahun Kim ◽  
Donghyeon Cho ◽  
In So Kweon

Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all. Recently, this worthwhile stream of study extends to video domain where the cost of human labeling is even more expensive. However, the most of existing methods are still based on 2D CNN architectures that can not directly capture spatio-temporal information for video applications. In this paper, we introduce a new self-supervised task called as Space-Time Cubic Puzzles to train 3D CNNs using large scale video dataset. This task requires a network to arrange permuted 3D spatio-temporal crops. By completing Space-Time Cubic Puzzles, the network learns both spatial appearance and temporal relation of video frames, which is our final goal. In experiments, we demonstrate that our learned 3D representation is well transferred to action recognition tasks, and outperforms state-of-the-art 2D CNN-based competitors on UCF101 and HMDB51 datasets.


2021 ◽  
Author(s):  
Kenneth Atz ◽  
Clemens Isert ◽  
Markus N. A. Böcker ◽  
José Jiménez-Luna ◽  
Gisbert Schneider

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.


Author(s):  
Zhizhong Han ◽  
Mingyang Shang ◽  
Xiyang Wang ◽  
Yu-Shen Liu ◽  
Matthias Zwicker

Jointly learning representations of 3D shapes and text is crucial to support tasks such as cross-modal retrieval or shape captioning. A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y2Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y2Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled “Y” like sequence-tosequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y2Seq2Seq outperforms the state-of-the-art methods.


2018 ◽  
Vol 45 (3) ◽  
pp. 304-321
Author(s):  
Nazanin Dehghani ◽  
Masoud Asadpour

Twitter is a popular microblogging service that has become a great medium for exploring emerging events and breaking news. Unfortunately, the explosive rate of information entering Twitter makes the users experience information overload. Since a great deal of tweets revolve around news events, summarising the storyline of these events can be advantageous to users, allowing them to conveniently access relevant and key information scattered over numerous tweets and, consequently, draw concise conclusions. A storyline shows the evolution of a story through time and sketches the correlations among its significant events. In this article, we propose a novel framework for generating a storyline of news events from a social point of view. Utilising powerful concepts from graph theory, we identify the significant events, summarise them and generate a coherent storyline of their evolution with reasonable computational cost for large datasets. Our approach models a storyline as a directed tree of socially salient events evolving over time in which nodes represent main events and edges capture the semantic relations between related events. We evaluate our proposed method against human-generated storylines, as well as the previous state-of-the-art storyline generation algorithm, on two large-scale datasets, one consisting of English tweets and the other one consisting of Persian tweets. We find that the results of our method are superior to the previous best algorithm and can be comparable with human-generated storylines.


2019 ◽  
Vol 38 (2) ◽  
pp. 441-456 ◽  
Author(s):  
Baokang Yan ◽  
Bin Wang ◽  
Fengxing Zhou ◽  
Weigang Li ◽  
Bo Xu

In order to extract fault impulse feature of large-scale rotating machinery from strong background noise, a sparse feature extraction method based on sparse decomposition combined multiresolution generalized S transform is proposed in this paper. In this method, multiresolution generalized S transform is employed to find the optimal atom for every iteration, which firstly takes in to account the generalized S transform with discretized adjustment factors, then builds an atom corresponding to the maximum energy. The multiresolution generalized S transform has better accuracy compared to generalized S transform and faster searching speed compared to the orthogonal matching pursuit method in selecting the optimal atom. Then, the orthogonal matching pursuit method is used to decompose the signal into several optimal atoms. The proposed method is applied to analyze the simulated signal and vibration signals collected from experimental failure rolling bearings. The results prove that the proposed method has better performances such as high precision and fast decomposition speed than the traditional orthogonal matching pursuit method method and local mean decomposition method.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Aryan Mobiny ◽  
Pengyu Yuan ◽  
Supratik K. Moulik ◽  
Naveen Garg ◽  
Carol C. Wu ◽  
...  

AbstractDeep neural networks (DNNs) have achieved state-of-the-art performance in many important domains, including medical diagnosis, security, and autonomous driving. In domains where safety is highly critical, an erroneous decision can result in serious consequences. While a perfect prediction accuracy is not always achievable, recent work on Bayesian deep networks shows that it is possible to know when DNNs are more likely to make mistakes. Knowing what DNNs do not know is desirable to increase the safety of deep learning technology in sensitive applications; Bayesian neural networks attempt to address this challenge. Traditional approaches are computationally intractable and do not scale well to large, complex neural network architectures. In this paper, we develop a theoretical framework to approximate Bayesian inference for DNNs by imposing a Bernoulli distribution on the model weights. This method called Monte Carlo DropConnect (MC-DropConnect) gives us a tool to represent the model uncertainty with little change in the overall model structure or computational cost. We extensively validate the proposed algorithm on multiple network architectures and datasets for classification and semantic segmentation tasks. We also propose new metrics to quantify uncertainty estimates. This enables an objective comparison between MC-DropConnect and prior approaches. Our empirical results demonstrate that the proposed framework yields significant improvement in both prediction accuracy and uncertainty estimation quality compared to the state of the art.


2020 ◽  
Vol 34 (04) ◽  
pp. 6688-6695
Author(s):  
Ming Yin ◽  
Weitian Huang ◽  
Junbin Gao

Clustering multi-view data has been a fundamental research topic in the computer vision community. It has been shown that a better accuracy can be achieved by integrating information of all the views than just using one view individually. However, the existing methods often struggle with the issues of dealing with the large-scale datasets and the poor performance in reconstructing samples. This paper proposes a novel multi-view clustering method by learning a shared generative latent representation that obeys a mixture of Gaussian distributions. The motivation is based on the fact that the multi-view data share a common latent embedding despite the diversity among the various views. Specifically, benefitting from the success of the deep generative learning, the proposed model can not only extract the nonlinear features from the views, but render a powerful ability in capturing the correlations among all the views. The extensive experimental results on several datasets with different scales demonstrate that the proposed method outperforms the state-of-the-art methods under a range of performance criteria.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4902
Author(s):  
Fanhua Shang ◽  
Bingkun Wei ◽  
Yuanyuan Liu ◽  
Hongying Liu ◽  
Shuang Wang ◽  
...  

In recent years, a series of matching pursuit and hard thresholding algorithms have been proposed to solve the sparse representation problem with ℓ0-norm constraint. In addition, some stochastic hard thresholding methods were also proposed, such as stochastic gradient hard thresholding (SG-HT) and stochastic variance reduced gradient hard thresholding (SVRGHT). However, each iteration of all the algorithms requires one hard thresholding operation, which leads to a high per-iteration complexity and slow convergence, especially for high-dimensional problems. To address this issue, we propose a new stochastic recursive gradient support pursuit (SRGSP) algorithm, in which only one hard thresholding operation is required in each outer-iteration. Thus, SRGSP has a significantly lower computational complexity than existing methods such as SG-HT and SVRGHT. Moreover, we also provide the convergence analysis of SRGSP, which shows that SRGSP attains a linear convergence rate. Our experimental results on large-scale synthetic and real-world datasets verify that SRGSP outperforms state-of-the-art related methods for tackling various sparse representation problems. Moreover, we conduct many experiments on two real-world sparse representation applications such as image denoising and face recognition, and all the results also validate that our SRGSP algorithm obtains much better performance than other sparse representation learning optimization methods in terms of PSNR and recognition rates.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3780 ◽  
Author(s):  
Mustansar Fiaz ◽  
Arif Mahmood ◽  
Ki Yeol Baek ◽  
Sehar Shahzad Farooq ◽  
Soon Ki Jung

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.


Sign in / Sign up

Export Citation Format

Share Document