Accelerating truss decomposition on heterogeneous processors

Truss decomposition is to divide a graph into a hierarchy of subgraphs, or trusses. A subgraph is a k -truss ( k ≥ 2) if each edge is in at least k --- 2 triangles in the subgraph. Existing algorithms work by first counting the number of triangles each edge is in and then iteratively incrementing k to peel off the edges that will not appear in ( k + 1)-truss. Due to the data and computation intensity, truss decomposition on billion-edge graphs takes hours to complete on a commodity computer. We propose to accelerate in-memory truss decomposition by (1) compacting intermediate results to optimize memory access, (2) dynamically adjusting the computation based on data characteristics, and (3) parallelizing the algorithm on both the multicore CPU and the GPU. In particular, we optimize the triangle enumeration with data skew handling, and determine at runtime whether to pursue peeling or direct triangle counting to obtain a certain k -truss. We further develop a CPU-GPU co-processing strategy in which the CPU first computes intermediate results and sends the compacted results to the GPU for further computation. Our experiments on real-world datasets show that our implementations outperform the state of the art by up to an order of magnitude. Our source code is publicly available at https://github.com/RapidsAtHKUST/AccTrussDecomposition.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

Image Restoration by Learning Morphological Opening-Closing Network

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0103 ◽

2020 ◽

Vol 4 (1) ◽

pp. 87-107

Author(s):

Ranjan Mondal ◽

Moni Shankar Dey ◽

Bhabatosh Chanda

Keyword(s):

Neural Network ◽

Image Restoration ◽

State Of The Art ◽

Source Code ◽

Back Propagation ◽

Image Features ◽

Main Difficulty ◽

The Right ◽

Right Order ◽

Morphological Opening

AbstractMathematical morphology is a powerful tool for image processing tasks. The main difficulty in designing mathematical morphological algorithm is deciding the order of operators/filters and the corresponding structuring elements (SEs). In this work, we develop morphological network composed of alternate sequences of dilation and erosion layers, which depending on learned SEs, may form opening or closing layers. These layers in the right order along with linear combination (of their outputs) are useful in extracting image features and processing them. Structuring elements in the network are learned by back-propagation method guided by minimization of the loss function. Efficacy of the proposed network is established by applying it to two interesting image restoration problems, namely de-raining and de-hazing. Results are comparable to that of many state-of-the-art algorithms for most of the images. It is also worth mentioning that the number of network parameters to handle is much less than that of popular convolutional neural network for similar tasks. The source code can be found here https://github.com/ranjanZ/Mophological-Opening-Closing-Net

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

Small footprint optoelectrodes using ring resonators for passive light localization

Microsystems & Nanoengineering ◽

10.1038/s41378-021-00263-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Vittorino Lanzio ◽

Gregory Telian ◽

Alexander Koshelev ◽

Paolo Micheletti ◽

Gianni Presti ◽

...

Keyword(s):

State Of The Art ◽

Three Dimensional ◽

Network Activity ◽

Ring Resonators ◽

Spatiotemporal Resolution ◽

Optical Ring Resonators ◽

Light Localization ◽

Order Of Magnitude ◽

Small Footprint

AbstractThe combination of electrophysiology and optogenetics enables the exploration of how the brain operates down to a single neuron and its network activity. Neural probes are in vivo invasive devices that integrate sensors and stimulation sites to record and manipulate neuronal activity with high spatiotemporal resolution. State-of-the-art probes are limited by tradeoffs involving their lateral dimension, number of sensors, and ability to access independent stimulation sites. Here, we realize a highly scalable probe that features three-dimensional integration of small-footprint arrays of sensors and nanophotonic circuits to scale the density of sensors per cross-section by one order of magnitude with respect to state-of-the-art devices. For the first time, we overcome the spatial limit of the nanophotonic circuit by coupling only one waveguide to numerous optical ring resonators as passive nanophotonic switches. With this strategy, we achieve accurate on-demand light localization while avoiding spatially demanding bundles of waveguides and demonstrate the feasibility with a proof-of-concept device and its scalability towards high-resolution and low-damage neural optoelectrodes.

Download Full-text

Chi-Squared Distance Metric Learning for Histogram Data

Mathematical Problems in Engineering ◽

10.1155/2015/352849 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Wei Yang ◽

Luhui Xu ◽

Xiaopan Chen ◽

Fengbin Zheng ◽

Yang Liu

Keyword(s):

Nearest Neighbor ◽

State Of The Art ◽

Metric Learning ◽

Nearest Neighbors ◽

Distance Metric Learning ◽

Distance Metric ◽

Projected Gradient Method ◽

Proper Distance ◽

Chi Squared ◽

Real World Datasets

Learning a proper distance metric for histogram data plays a crucial role in many computer vision tasks. The chi-squared distance is a nonlinear metric and is widely used to compare histograms. In this paper, we show how to learn a general form of chi-squared distance based on the nearest neighbor model. In our method, the margin of sample is first defined with respect to the nearest hits (nearest neighbors from the same class) and the nearest misses (nearest neighbors from the different classes), and then the simplex-preserving linear transformation is trained by maximizing the margin while minimizing the distance between each sample and its nearest hits. With the iterative projected gradient method for optimization, we naturally introduce thel2,1norm regularization into the proposed method for sparse metric learning. Comparative studies with the state-of-the-art approaches on five real-world datasets verify the effectiveness of the proposed method.

Download Full-text

The smallest extraction problem

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476293 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2445-2458

Author(s):

Valerio Cetorelli ◽

Paolo Atzeni ◽

Valter Crescenzi ◽

Franco Milicchio

Keyword(s):

Unsupervised Learning ◽

Optimization Problem ◽

Learning Algorithm ◽

State Of The Art ◽

Data Extraction ◽

Source Code ◽

Web Data ◽

Web Data Extraction ◽

New Family ◽

Context Free

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.

Download Full-text

Two Orders of Magnitude Improvement in the Detection Limit of Droplet-Based Micro-Magnetofluidics with Planar Hall Effect Sensors

Engineering Proceedings ◽

10.3390/i3s2021dresden-10105 ◽

2021 ◽

Vol 6 (1) ◽

pp. 47

Author(s):

Julian Schütt ◽

Rico Illing ◽

Oleksii Volkov ◽

Tobias Kosub ◽

Pablo Nicolás Granell ◽

...

Keyword(s):

Detection Limit ◽

Hall Effect ◽

State Of The Art ◽

Limit Of Detection ◽

Research Field ◽

High Throughput Analysis ◽

Planar Hall Effect ◽

Biologically Relevant ◽

Sensing Platform ◽

Order Of Magnitude

The detection, manipulation, and tracking of magnetic nanoparticles is of major importance in the fields of biology, biotechnology, and biomedical applications as labels as well as in drug delivery, (bio-)detection, and tissue engineering. In this regard, the trend goes towards improvements of existing state-of-the-art methodologies in the spirit of timesaving, high-throughput analysis at ultra-low volumes. Here, microfluidics offers vast advantages to address these requirements, as it deals with the control and manipulation of liquids in confined microchannels. This conjunction of microfluidics and magnetism, namely micro-magnetofluidics, is a dynamic research field, which requires novel sensor solutions to boost the detection limit of tiny quantities of magnetized objects. We present a sensing strategy relying on planar Hall effect (PHE) sensors in droplet-based micro-magnetofluidics for the detection of a multiphase liquid flow, i.e., superparamagnetic aqueous droplets in an oil carrier phase. The high resolution of the sensor allows the detection of nanoliter-sized superparamagnetic droplets with a concentration of 0.58 mg cm−3, even when they are only biased in a geomagnetic field. The limit of detection can be boosted another order of magnitude, reaching 0.04 mg cm−³ (1.4 million particles in a single 100 nL droplet) when a magnetic field of 5 mT is applied to bias the droplets. With this performance, our sensing platform outperforms the state-of-the-art solutions in droplet-based micro-magnetofluidics by a factor of 100. This allows us to detect ferrofluid droplets in clinically and biologically relevant concentrations, and even in lower concentrations, without the need of externally applied magnetic fields.

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6077 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6127-6136

Author(s):

Chao Wang ◽

Hengshu Zhu ◽

Chen Zhu ◽

Chuan Qin ◽

Hui Xiong

Keyword(s):

Bayesian Approach ◽

Posterior Probability ◽

State Of The Art ◽

User Preferences ◽

Implicit Feedback ◽

Pairwise Preference ◽

Entire List ◽

Collaborative Ranking ◽

Real World Datasets ◽

Made In

The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate “ties” due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to √M/N, where M and N are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

Download Full-text