A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

Structural Change Point Detection Method of Time Series Using Sequential Probability Ratio Test

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.128.583 ◽

2008 ◽

Vol 128 (4) ◽

pp. 583-592 ◽

Cited By ~ 4

Author(s):

Hiromichi Kawano ◽

Tetsuo Hattori ◽

Ken Nishimatsu

Keyword(s):

Time Series ◽

Structural Change ◽

Change Point ◽

Detection Method ◽

Sequential Probability Ratio Test ◽

Change Point Detection ◽

Ratio Test ◽

Probability Ratio ◽

Sequential Probability ◽

Point Detection

Download Full-text

Cooperative Spectrum Sensing Based on Generalized Likelihood Ratio Test for Cognitive Radio Channels with Unknown Primary User’s Power and Colored Noise

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180730092433 ◽

2018 ◽

Vol 8 (3) ◽

pp. 217-227 ◽

Cited By ~ 2

Author(s):

Shahriar Shirvani Moghaddam ◽

Ameneh Habibzadeh

Keyword(s):

Cognitive Radio ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Spectrum Sensing ◽

Colored Noise ◽

Cooperative Spectrum Sensing ◽

Unknown Primary ◽

Generalized Likelihood Ratio Test ◽

Ratio Test ◽

Generalized Likelihood Ratio

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Generalized Likelihood Ratio Test for GNSS Spoofing Detection in Devices With IMU

IEEE Transactions on Information Forensics and Security ◽

10.1109/tifs.2021.3083414 ◽

2021 ◽

pp. 1-1

Author(s):

Marco Ceccato ◽

Francesco Formaggio ◽

Nicola Laurenti ◽

Stefano Tomasin

Keyword(s):

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Generalized Likelihood Ratio Test ◽

Ratio Test ◽

Generalized Likelihood Ratio ◽

Spoofing Detection

Download Full-text

Adaptive Waveform Design with Multipath Exploitation Radar in Heterogeneous Environments

Remote Sensing ◽

10.3390/rs13091628 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1628

Author(s):

Seden Hazal Gulen Yilmaz ◽

Chiara Zarro ◽

Harun Taha Hayvaci ◽

Silvia Liberata Ullo

Keyword(s):

Secondary Data ◽

Waveform Design ◽

Second Step ◽

Point Estimate ◽

Generalized Likelihood Ratio Test ◽

Final Decision ◽

Ratio Test ◽

Heterogeneous Environments ◽

Range Cell ◽

Extensive Performance

The problem of detecting point like targets over a glistening surface is investigated in this manuscript, and the design of an optimal waveform through a two-step process for a multipath exploitation radar is proposed. In the first step, a non-adaptive waveform is transmitted anda constrained Generalized Likelihood Ratio Test (GLRT) detector is deduced at reception which exploits multipath returns in the range cell under test by modelling the target echo as a superposition of the direct plus the multipath returns. Under the hypothesis of heterogeneous environments, thus by assuming a compound-Gaussian distribution for the clutter return, this latter is estimated in the range cell under test through the secondary data, which are collected from the out-of-bin cells. The Fixed Point Estimate (FPE) algorithm is applied in the clutter estimation, then used to design the adaptive waveform for transmission in the second step of the algorithm, in order to suppress the clutter coming from the adjacent cells. The proposed GLRT is also used at the end of the second transmission for the final decision. Extensive performance evaluation of the proposed detector and adaptive waveform for various multipath scenarios is presented. The performance analysis prove that the proposed method improves the Signal-to-Clutter Ratio (SCR) of the received signal, and the detection performance with multipath exploitation.

Download Full-text

Chi-Squared Distance Metric Learning for Histogram Data

Mathematical Problems in Engineering ◽

10.1155/2015/352849 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Wei Yang ◽

Luhui Xu ◽

Xiaopan Chen ◽

Fengbin Zheng ◽

Yang Liu

Keyword(s):

Nearest Neighbor ◽

State Of The Art ◽

Metric Learning ◽

Nearest Neighbors ◽

Distance Metric Learning ◽

Distance Metric ◽

Projected Gradient Method ◽

Proper Distance ◽

Chi Squared ◽

Real World Datasets

Learning a proper distance metric for histogram data plays a crucial role in many computer vision tasks. The chi-squared distance is a nonlinear metric and is widely used to compare histograms. In this paper, we show how to learn a general form of chi-squared distance based on the nearest neighbor model. In our method, the margin of sample is first defined with respect to the nearest hits (nearest neighbors from the same class) and the nearest misses (nearest neighbors from the different classes), and then the simplex-preserving linear transformation is trained by maximizing the margin while minimizing the distance between each sample and its nearest hits. With the iterative projected gradient method for optimization, we naturally introduce thel2,1norm regularization into the proposed method for sparse metric learning. Comparative studies with the state-of-the-art approaches on five real-world datasets verify the effectiveness of the proposed method.

Download Full-text

Encoding Two-Dimensional Range Top-k Queries

Algorithmica ◽

10.1007/s00453-021-00856-1 ◽

2021 ◽

Author(s):

Seungbum Jo ◽

Rahul Lingala ◽

Srinivasa Rao Satti

Keyword(s):

Lower Bound ◽

Lower Bounds ◽

Upper Bound ◽

Total Order ◽

Two Dimensional ◽

Information Theoretic ◽

Cartesian Tree ◽

Dimensional Range

AbstractWe consider the problem of encoding two-dimensional arrays, whose elements come from a total order, for answering $${\text{Top-}}{k}$$ Top- k queries. The aim is to obtain encodings that use space close to the information-theoretic lower bound, which can be constructed efficiently. For an $$m \times n$$ m × n array, with $$m \le n$$ m ≤ n , we first propose an encoding for answering 1-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, whose query range is restricted to $$[1 \dots m][1 \dots a]$$ [ 1 ⋯ m ] [ 1 ⋯ a ] , for $$1 \le a \le n$$ 1 ≤ a ≤ n . Next, we propose an encoding for answering for the general (4-sided) $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries that takes $$(m\lg {{(k+1)n \atopwithdelims ()n}}+2nm(m-1)+o(n))$$ ( m lg ( k + 1 ) n n + 2 n m ( m - 1 ) + o ( n ) ) bits, which generalizes the joint Cartesian tree of Golin et al. [TCS 2016]. Compared with trivial $$O(nm\lg {n})$$ O ( n m lg n ) -bit encoding, our encoding takes less space when $$m = o(\lg {n})$$ m = o ( lg n ) . In addition to the upper bound results for the encodings, we also give lower bounds on encodings for answering 1 and 4-sided $${\textsf {Top}}{\text {-}}k{}$$ Top - k queries, which show that our upper bound results are almost optimal.

Download Full-text