Orthogonality-Promoting Dictionary Learning via Bayesian Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014472 ◽

2019 ◽

Vol 33 ◽

pp. 4472-4479

Author(s):

Lei Luo ◽

Jie Xu ◽

Cheng Deng ◽

Heng Huang

Keyword(s):

Bayesian Inference ◽

Dictionary Learning ◽

Expectation Maximization Algorithm ◽

Training Data ◽

Training Set ◽

Learning Framework ◽

Learning Tasks ◽

Sparsity Level ◽

Better Than ◽

Non Parametric

Dictionary Learning (DL) plays a crucial role in numerous machine learning tasks. It targets at finding the dictionary over which the training set admits a maximally sparse representation. Most existing DL algorithms are based on solving an optimization problem, where the noise variance and sparsity level should be known as the prior knowledge. However, in practice applications, it is difficult to obtain these knowledge. Thus, non-parametric Bayesian DL has recently received much attention of researchers due to its adaptability and effectiveness. Although many hierarchical priors have been used to promote the sparsity of the representation in non-parametric Bayesian DL, the problem of redundancy for the dictionary is still overlooked, which greatly decreases the performance of sparse coding. To address this problem, this paper presents a novel robust dictionary learning framework via Bayesian inference. In particular, we employ the orthogonality-promoting regularization to mitigate correlations among dictionary atoms. Such a regularization, encouraging the dictionary atoms to be close to being orthogonal, can alleviate overfitting to training data and improve the discrimination of the model. Moreover, we impose Scale mixture of the Vector variate Gaussian (SMVG) distribution on the noise to capture its structure. A Regularized Expectation Maximization Algorithm is developed to estimate the posterior distribution of the representation and dictionary with orthogonality-promoting regularization. Numerical results show that our method can learn the dictionary with an accuracy better than existing methods, especially when the number of training signals is limited.

Download Full-text

Multi-generation genomic prediction of maize yield using parametric and non-parametric sparse selection indices

Heredity ◽

10.1038/s41437-021-00474-1 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Yoseph Beyene ◽

Manje Gowda ◽

Jose Crossa ◽

Paulino Pérez-Rodríguez ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Selection Index ◽

Maize Yield ◽

Additive Models ◽

Training Data ◽

Training Set ◽

Gaussian Kernels ◽

Non Parametric

AbstractGenomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.

Download Full-text

Self-training and learning the waveform features of microseismic data using an adaptive dictionary

Geophysics ◽

10.1190/geo2019-0213.1 ◽

2020 ◽

Vol 85 (3) ◽

pp. KS51-KS61 ◽

Cited By ~ 1

Author(s):

Hang Wang ◽

Quan Zhang ◽

Guoyin Zhang ◽

Jinwei Fang ◽

Yangkang Chen

Keyword(s):

Dictionary Learning ◽

Single Channel ◽

Signal To Noise Ratio ◽

Random Noise ◽

Microseismic Monitoring ◽

Training Data ◽

Learning Framework ◽

Microseismic Data ◽

Useful Signal ◽

Fracturing Process

Microseismic monitoring is an indispensable technique in characterizing the physical processes that are caused by extraction or injection of fluids during the hydraulic fracturing process. Microseismic data, however, are often contaminated with strong random noise and have a low signal-to-noise ratio (S/N). The low S/N in most microseismic data severely affects the accuracy and reliability of the source localization and source-mechanism inversion results. We have developed a new denoising framework to enhance the quality of microseismic data. We use the method of adaptive sparse dictionaries to learn the waveform features of the microseismic data by iteratively updating the dictionary atoms and sparse coefficients in an unsupervised way. Unlike most existing dictionary learning applications in the seismic community, we learn the features from 1D microseismic data, thereby to learn 1D features of the waveforms. We develop a sparse dictionary learning framework and then prepare the training patches and implement the algorithm to obtain favorable denoising performance. We use extensive numerical examples and real microseismic data examples to demonstrate the validity of our method. Results show that the features of microseismic waveforms can be learned to distinguish signal patches and noise patches even from a single channel of microseismic data. However, more training data can make the learned features smoother and better at representing useful signal components.

Download Full-text

CAWA: An Attention-Network for Credit Attribution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6367 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8472-8479

Author(s):

Saurav Manchanda ◽

George Karypis

Keyword(s):

State Of The Art ◽

Text Summarization ◽

Training Data ◽

Learning Framework ◽

Distant Supervision ◽

Sentence Level ◽

Class Labels ◽

The Individual ◽

Traditional Approaches ◽

Better Than

Credit attribution is the task of associating individual parts in a document with their most appropriate class labels. It is an important task with applications to information retrieval and text summarization. When labeled training data is available, traditional approaches for sequence tagging can be used for credit attribution. However, generating such labeled datasets is expensive and time-consuming. In this paper, we present Credit Attribution With Attention (CAWA), a neural-network-based approach, that instead of using sentence-level labeled data, uses the set of class labels that are associated with an entire document as a source of distant-supervision. CAWA combines an attention mechanism with a multilabel classifier into an end-to-end learning framework to perform credit attribution. CAWA labels the individual sentences from the input document using the resultant attention-weights. CAWA improves upon the state-of-the-art credit attribution approach by not constraining a sentence to belong to just one class, but modeling each sentence as a distribution over all classes, leading to better modeling of semantically-similar classes. Experiments on the credit attribution task on a variety of datasets show that the sentence class labels generated by CAWA outperform the competing approaches. Additionally, on the multilabel text classification task, CAWA performs better than the competing credit attribution approaches1.

Download Full-text

Efficient Convolutional Dictionary Learning Using Preconditioned ADMM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510095 ◽

2021 ◽

Vol 35 (09) ◽

pp. 2151009

Author(s):

Xuesong Zhang ◽

Baoping Li ◽

Jing Jiang

Keyword(s):

Dictionary Learning ◽

Matrix Multiplication ◽

Matrix Inversion ◽

Training Data ◽

Optimization Strategy ◽

Training Set ◽

Translation Invariant ◽

Moderate Sample Size ◽

The Arts ◽

Alternating Direction

Given training data, convolutional dictionary learning (CDL) seeks a translation-invariant sparse representation, which is characterized by a set of convolutional kernels. However, even a small training set with moderate sample size can render the optimization process both computationally challenging and memory starving. Under a biconvex optimization strategy for CDL, we propose to diagonally precondition the system matrices in the filter learning sub-problem that can be solved by the alternating direction method of multipliers (ADMM). This method leads to the substitution of matrix inversion ([Formula: see text] and matrix multiplication ([Formula: see text] involved in ADMM with an element-wise operation ([Formula: see text], which significantly reduces the computational complexity as well as the memory requirement. Numerical experiments validate the performance advantage of the proposed method over the state-of-the-arts. Code is available at https://github.com/baopingli/Efficient-Convolutional-Dictionary-Learning-using-PADMM .

Download Full-text

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs

Bioinformatics ◽

10.1093/bioinformatics/btaa045 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2690-2696

Author(s):

Jarkko Toivonen ◽

Pratyush K Das ◽

Jussi Taipale ◽

Esko Ukkonen

Keyword(s):

Markov Models ◽

Expectation Maximization Algorithm ◽

Software Tool ◽

Specific Weight ◽

Training Data ◽

Supplementary Information ◽

Markov Modeling ◽

Binding Motifs ◽

The Difference ◽

Probability Matrices

Abstract Motivation Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average. Availability and implementation Software implementation is available from https://github.com/jttoivon/moder2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SCOUR: a stepwise machine learning framework for predicting metabolite-dependent regulatory interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Justin Y. Lee ◽

Britney Nguyen ◽

Carlos Orosco ◽

Mark P. Styczynski

Keyword(s):

Machine Learning ◽

Metabolic Networks ◽

Sampling Frequency ◽

Low Noise ◽

Training Data ◽

High Noise ◽

Regulatory Interactions ◽

Learning Framework ◽

Metabolic Systems ◽

Noise Data

Abstract Background The topology of metabolic networks is both well-studied and remarkably well-conserved across many species. The regulation of these networks, however, is much more poorly characterized, though it is known to be divergent across organisms—two characteristics that make it difficult to model metabolic networks accurately. While many computational methods have been built to unravel transcriptional regulation, there have been few approaches developed for systems-scale analysis and study of metabolic regulation. Here, we present a stepwise machine learning framework that applies established algorithms to identify regulatory interactions in metabolic systems based on metabolic data: stepwise classification of unknown regulation, or SCOUR. Results We evaluated our framework on both noiseless and noisy data, using several models of varying sizes and topologies to show that our approach is generalizable. We found that, when testing on data under the most realistic conditions (low sampling frequency and high noise), SCOUR could identify reaction fluxes controlled only by the concentration of a single metabolite (its primary substrate) with high accuracy. The positive predictive value (PPV) for identifying reactions controlled by the concentration of two metabolites ranged from 32 to 88% for noiseless data, 9.2 to 49% for either low sampling frequency/low noise or high sampling frequency/high noise data, and 6.6–27% for low sampling frequency/high noise data, with results typically sufficiently high for lab validation to be a practical endeavor. While the PPVs for reactions controlled by three metabolites were lower, they were still in most cases significantly better than random classification. Conclusions SCOUR uses a novel approach to synthetically generate the training data needed to identify regulators of reaction fluxes in a given metabolic system, enabling metabolomics and fluxomics data to be leveraged for regulatory structure inference. By identifying and triaging the most likely candidate regulatory interactions, SCOUR can drastically reduce the amount of time needed to identify and experimentally validate metabolic regulatory interactions. As high-throughput experimental methods for testing these interactions are further developed, SCOUR will provide critical impact in the development of predictive metabolic models in new organisms and pathways.

Download Full-text

NLOS Multipath Classification of GNSS Signal Correlation Output Using Machine Learning

Sensors ◽

10.3390/s21072503 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2503

Author(s):

Taro Suzuki ◽

Yoshiharu Amano

Keyword(s):

Machine Learning ◽

Satellite System ◽

Training Data ◽

Support Vector ◽

Positioning Errors ◽

Automated Method ◽

Global Navigation Satellite ◽

Better Than ◽

Signal Correlation

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text

Sensitivity-informed Bayesian Inference for Home PLC Network Models with Unknown Parameters

Energies ◽

10.3390/en14092402 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2402

Author(s):

David S. Ching ◽

Cosmin Safta ◽

Thomas A. Reichardt

Keyword(s):

Bayesian Inference ◽

Transfer Function ◽

Network Topology ◽

Random Search ◽

Network Models ◽

Training Data ◽

Unknown Parameters ◽

Network Parameter ◽

Dimensional Parameter ◽

Discrete Random Variables

Bayesian inference is used to calibrate a bottom-up home PLC network model with unknown loads and wires at frequencies up to 30 MHz. A network topology with over 50 parameters is calibrated using global sensitivity analysis and transitional Markov Chain Monte Carlo (TMCMC). The sensitivity-informed Bayesian inference computes Sobol indices for each network parameter and applies TMCMC to calibrate the most sensitive parameters for a given network topology. A greedy random search with TMCMC is used to refine the discrete random variables of the network. This results in a model that can accurately compute the transfer function despite noisy training data and a high dimensional parameter space. The model is able to infer some parameters of the network used to produce the training data, and accurately computes the transfer function under extrapolative scenarios.

Download Full-text

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

Remote Sensing ◽

10.3390/rs13030368 ◽

2021 ◽

Vol 13 (3) ◽

pp. 368

Author(s):

Christopher A. Ramezan ◽

Timothy A. Warner ◽

Aaron E. Maxwell ◽

Bradley S. Price

Keyword(s):

Machine Learning ◽

Sample Size ◽

Remotely Sensed ◽

Training Data ◽

Supervised Machine Learning ◽

Sample Sizes ◽

Remotely Sensed Data ◽

Large Area ◽

Training Set ◽

Set Size

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.

Download Full-text