Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

Small-variance asymptotics is emerging as a useful technique for inference in large-scale Bayesian non-parametric mixture models. This paper analyzes the online learning of robot manipulation tasks with Bayesian non-parametric mixture models under small-variance asymptotics. The analysis yields a scalable online sequence clustering (SOSC) algorithm that is non-parametric in the number of clusters and the subspace dimension of each cluster. SOSC groups the new datapoint in low-dimensional subspaces by online inference in a non-parametric mixture of probabilistic principal component analyzers (MPPCA) based on a Dirichlet process, and captures the state transition and state duration information online in a hidden semi-Markov model (HSMM) based on a hierarchical Dirichlet process. A task-parameterized formulation of our approach autonomously adapts the model to changing environmental situations during manipulation. We apply the algorithm in a teleoperation setting to recognize the intention of the operator and remotely adjust the movement of the robot using the learned model. The generative model is used to synthesize both time-independent and time-dependent behaviors by relying on the principles of shared and autonomous control. Experiments with the Baxter robot yield parsimonious clusters that adapt online with new demonstrations and assist the operator in performing remote manipulation tasks.

Download Full-text

Vertex priority based butterfly counting for large-scale bipartite networks

Proceedings of the VLDB Endowment ◽

10.14778/3339490.3339497 ◽

2019 ◽

Vol 12 (10) ◽

pp. 1139-1152 ◽

Cited By ~ 7

Author(s):

Kai Wang ◽

Xuemin Lin ◽

Lu Qin ◽

Wenjie Zhang ◽

Ying Zhang

Keyword(s):

Large Scale ◽

Bipartite Networks

Download Full-text

NON-PARAMETRIC METHODS OF ANALYSIS APPLIED TO LARGE-SCALE CLOUD-SEEDING EXPERIMENTS

Journal of Meteorology ◽

10.1175/1520-0469(1961)018<0692:npmoaa>2.0.co;2 ◽

1961 ◽

Vol 18 (5) ◽

pp. 692-694 ◽

Cited By ~ 4

Author(s):

E. E. Adderley

Keyword(s):

Large Scale ◽

Parametric Methods ◽

Cloud Seeding ◽

Methods Of Analysis ◽

Non Parametric

Download Full-text

Reference interval for immature platelet fraction on Sysmex XN hematology analyzer: a comparison study with Sysmex XE-2100

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2014-0839 ◽

2015 ◽

Vol 53 (7) ◽

Cited By ~ 13

Author(s):

Young Jin Ko ◽

Mina Hur ◽

Hanah Kim ◽

Sang Gyeu Choi ◽

Hee-Won Moon ◽

...

Keyword(s):

Large Scale ◽

Reference Interval ◽

Reference Intervals ◽

Modular System ◽

Future Research ◽

Healthy Individuals ◽

Platelet Counts ◽

Immature Platelet Fraction ◽

Hematology Analyzer ◽

Non Parametric

AbstractRecently introduced hematology analyzer, the Sysmex XN modular system (Sysmex, Kobe, Japan), has newly adopted a florescent channel to detect platelets and immature platelet fraction (IPF). This study aimed to establish new reference intervals for %-IPF and absolute number of IPF (A-IPF) on Sysmex XN. Platelet counts, %-IPF, and A-IPF were also compared between Sysmex XN and XE-2100 systems (Sysmex).Except outliers, blood samples from 2104 healthy individuals and 140 umbilical cord blood were analyzed using both Sysmex XN and XE-2100. The results of two systems were compared using Bland-Altman plot. The reference intervals for %-IPF and A-IPF were defined using non-parametric percentile methods according to the Clinical and Laboratory Standard Institute guideline (C28-A3).The platelet counts, %-IPF, and A-IPF showed non-parametric distributions. The mean difference between Sysmex XN and XE-2100 in healthy individuals revealed a positive bias in platelets (+8.0×10This large-scale study demonstrates a clear difference of platelet counts and IPF between Sysmex XN and XE-2100. The new reference intervals for IPF on Sysmex XN would provide fundamental data for clinical practice and future research.

Download Full-text

A Scalable, Non-Parametric Method for Detecting Performance Anomaly in Large Scale Computing

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2015.2475741 ◽

2016 ◽

Vol 27 (7) ◽

pp. 1902-1914 ◽

Cited By ~ 7

Author(s):

Li Yu ◽

Zhiling Lan

Keyword(s):

Large Scale ◽

Parametric Method ◽

Performance Anomaly ◽

Large Scale Computing ◽

Non Parametric

Download Full-text

Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics

Royal Society Open Science ◽

10.1098/rsos.140436 ◽

2015 ◽

Vol 2 (8) ◽

pp. 140436 ◽

Cited By ~ 20

Author(s):

Wasiu A. Akanni ◽

Mark Wilkinson ◽

Christopher J. Creevey ◽

Peter G. Foster ◽

Davide Pisani

Keyword(s):

Maximum Likelihood ◽

Large Scale ◽

Empirical Studies ◽

Input Tree ◽

Unbiased Test ◽

Tree Topologies ◽

Phylogenetic Framework ◽

Alternative Approaches ◽

Supertree Methods ◽

Non Parametric

Since their advent, supertrees have been increasingly used in large-scale evolutionary studies requiring a phylogenetic framework and substantial efforts have been devoted to developing a wide variety of supertree methods (SMs). Recent advances in supertree theory have allowed the implementation of maximum likelihood (ML) and Bayesian SMs, based on using an exponential distribution to model incongruence between input trees and the supertree. Such approaches are expected to have advantages over commonly used non-parametric SMs, e.g. matrix representation with parsimony (MRP). We investigated new implementations of ML and Bayesian SMs and compared these with some currently available alternative approaches. Comparisons include hypothetical examples previously used to investigate biases of SMs with respect to input tree shape and size, and empirical studies based either on trees harvested from the literature or on trees inferred from phylogenomic scale data. Our results provide no evidence of size or shape biases and demonstrate that the Bayesian method is a viable alternative to MRP and other non-parametric methods. Computation of input tree likelihoods allows the adoption of standard tests of tree topologies (e.g. the approximately unbiased test). The Bayesian approach is particularly useful in providing support values for supertree clades in the form of posterior probabilities.

Download Full-text

Efficient Non-parametric Bayesian Hawkes Processes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/597 ◽

2019 ◽

Cited By ~ 2

Author(s):

Rui Zhang ◽

Christian Walder ◽

Marian-Andrei Rizoiu ◽

Lexing Xie

Keyword(s):

Time Complexity ◽

Large Scale ◽

Goodness Of Fit ◽

Linear Time ◽

Synthetic Data ◽

Hawkes Process ◽

Hawkes Processes ◽

Branching Structures ◽

Current State ◽

Non Parametric

In this paper, we develop an efficient non-parametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the stationarity of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms --- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization --- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.

Download Full-text

An Inductive Logistic Matrix Factorization Model for Predicting Drug-Metabolite Association With Vicus Regularization

Frontiers in Microbiology ◽

10.3389/fmicb.2021.650366 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yuanyuan Ma ◽

Lifang Liu ◽

Qianjun Chen ◽

Yingjun Ma

Keyword(s):

Matrix Factorization ◽

Large Scale ◽

Matrix Completion ◽

Original Data ◽

Feature Representation ◽

Bipartite Networks ◽

Spectral Matrix ◽

Drug Metabolite ◽

Low Dimensional ◽

Drugs And Metabolites

Metabolites are closely related to human disease. The interaction between metabolites and drugs has drawn increasing attention in the field of pharmacomicrobiomics. However, only a small portion of the drug-metabolite interactions were experimentally observed due to the fact that experimental validation is labor-intensive, costly, and time-consuming. Although a few computational approaches have been proposed to predict latent associations for various bipartite networks, such as miRNA-disease, drug-target interaction networks, and so on, to our best knowledge the associations between drugs and metabolites have not been reported on a large scale. In this study, we propose a novel algorithm, namely inductive logistic matrix factorization (ILMF) to predict the latent associations between drugs and metabolites. Specifically, the proposed ILMF integrates drug–drug interaction, metabolite–metabolite interaction, and drug-metabolite interaction into this framework, to model the probability that a drug would interact with a metabolite. Moreover, we exploit inductive matrix completion to guide the learning of projection matrices U and V that depend on the low-dimensional feature representation matrices of drugs and metabolites: Fm and Fd. These two matrices can be obtained by fusing multiple data sources. Thus, FdU and FmV can be viewed as drug-specific and metabolite-specific latent representations, different from classical LMF. Furthermore, we utilize the Vicus spectral matrix that reveals the refined local geometrical structure inherent in the original data to encode the relationships between drugs and metabolites. Extensive experiments are conducted on a manually curated “DrugMetaboliteAtlas” dataset. The experimental results show that ILMF can achieve competitive performance compared with other state-of-the-art approaches, which demonstrates its effectiveness in predicting potential drug-metabolite associations.

Download Full-text

Solving the where problem in neuroanatomy: a generative framework with learned mappings to register multimodal, incomplete data into a reference brain

10.1101/2020.03.22.002618 ◽

2020 ◽

Author(s):

Daniel Tward ◽

Xu Li ◽

Bingxing Huo ◽

Brian Lee ◽

Michael I Miller ◽

...

Keyword(s):

Missing Data ◽

Individual Variation ◽

Large Scale ◽

Ex Vivo ◽

Data Sets ◽

Full Generality ◽

Whole Brain ◽

Contrast Mechanisms ◽

Non Parametric

ABSTRACTMapping information from different brains gathered using different modalities into a common coordinate space corresponding to a reference brain is an aspirational goal in modern neuroscience, analogous in importance to mapping genomic data to a reference genome. While brain-atlas mapping workflows exist for single-modality data (3D MRI or STPT image volumes), generally speaking data sets need to be combined across modalities with different contrast mechanisms and scale, in the presence of missing data as well as signals not present in the reference. This has so far been an unsolved problem. We have solved this problem in its full generality by developing and implementing a rigorous, non-parametric generative framework, that learns unknown mappings between contrast mechanisms from data and infers missing data. Our methodology permits rigorous quantification of the local sale changes between different individual brains, which has so far been neglected. We are also able to quantitatively characterize the individual variation in shape. Our work establishes a quantitative, scalable and streamlined workflow for unifying a broad spectrum of multi-modal whole-brain light microscopic data volumes into a coordinate-based atlas framework, a step that is a prerequisite for large scale integration of whole brain data sets in modern neuroscience.SummaryA current focus of research in neuroscience is to enumerate, map and annotate neuronal cell types in whole vertebrate brains using different modalities of data acquisition. A key challenge remains: can the large multiplicities of molecular anatomical data sets from many different modalities, and at widely different scales, be all assembled into a common reference space? Solving this problem is as important for modern neuroscience as mapping to reference genomes was for molecular biology. While workable brain-to-atlas mapping workflows exist for single modalities (e.g. mapping serial two photon (STP) brains to STP references) and largely for clean data, this is generally not a solved problem for mapping across contrast modalities, where data sets can be partial, and often carry signal not present in the reference brain (e.g. tracer injections). Presenting these types of anatomical data into a common reference frame for all to use is an aspirational goal for the neuroscience community. However so far this goal has been elusive due to the difficulties pointed to above and real integration is lacking.We have solved this problem in its full generality by developing and implementing a rigorous, generative framework, that learns unknown mappings between contrast mechanisms from data and infers missing data. The key idea in the framework is to minimize the difference between synthetic image volumes and real data over function classes of non-parametric mappings, including a diffeomorphic mapping, the contrast map and locations and types of missing data/non-reference signals. The non-parametric mappings are instantiated as regularized but over-parameterized functional forms over spatial grids. A final, manual refinement step is included to ensure scientific quality of the results.Our framework permits rigorous quantification of the local metric distortions between different individual brains, which is important for quantitative joint analysis of data gathered in multiple animals. Existing methods for atlas mapping do not provide metric quantifications and analyses of the resulting individual variations. We apply this pipeline to data modalities including various combinations of in-vivo and ex-vivo MRI, 3D STP and fMOST data sets, 2D serial histology sections including a 3D reassembly step, and brains processed for snRNAseq with tissue partially removed. Median local linear scale change with respect to a histologically processed Nissl reference brain, as measured using the Jacobian of the diffeomorphic transformations, was found to be 0.93 for STPT imaged brains (7% shrinkage) and 0.84 for fMOST imaged brains (16% shrinkage between reference brains and imaged volumes). Shrinkage between in-vivo and ex-vivo MRI for a mouse brain was found to be 0.96, and the distortion between the perfused brain and tape-cut digital sections was shown to be minimal (1.02 for Nissl histology sections). We were able to quantitatively characterize the individual variation in shape across individuals by studying variations in the tangent space of the diffeomorphic transformation around the reference brain. Based on this work we are able to establish co-variation patterns in metric distortions across the entire brain, across a large population set. We note that the magnitude of individual variation is often greater than differences between different sample preparation techniques. Our work establishes a quantitative, scalable and streamlined workflow for unifying a broad spectrum of multi-modal whole-brain light microscopic data volumes into a coordinate-based atlas framework, a step that is a prerequisite for large scale integration of whole brain data sets in modern neuroscience.

Download Full-text