Microbloggers’ interest inference using a subgraph stream

Inferring user interest over large-scale microblogs have attracted much attention in recent years. However, the emergence of the massive data, dynamic change of information and persistence of microblogs pose challenges to interest inference. Most of the existing approaches rarely take into account the combination of these microbloggers’ characteristics within the model, which may incur information loss with nontrivial magnitude in real-time extraction of user interest and massive social data processing. To address these problems, in this paper, we propose a novel User-Networked Interest Topic Extraction in the form of Subgraph Stream (UNITE_SS) for microbloggers’ interest inference. To be specific, we develop several strategies for the construction of subgraph stream to select the better strategy for user interest inference. Moreover, the information of microblogs in each subgraph is utilized to obtain a real-time and effective interest for microbloggers. The experimental evaluation on a large dataset from Sina Weibo, one of the most popular microblogs in China, demonstrates that the proposed approach outperforms the state-of-the-art baselines in terms of precision, mean reciprocal rank (MRR) as well as runtime from the effectiveness and efficiency perspectives.

Download Full-text

MG-DVD: A Real-time Framework for Malware Variant Detection Based on Dynamic Heterogeneous Graph Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/209 ◽

2021 ◽

Author(s):

Chen Liu ◽

Bo Li ◽

Jun Zhao ◽

Ming Su ◽

Xu-Dong Liu

Keyword(s):

Real Time ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Evolutionary Patterns ◽

Graph Learning ◽

Fine Grained ◽

Variant Detection ◽

Effectiveness And Efficiency ◽

The Cost

Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heterogeneous graphs and investigates real-world meta-graphs between malware objects, which can effectively characterize more discriminative malicious evolutionary patterns between malware and their variants. Then, MG-DVD presents two dynamic walk-based heterogeneous graph learning methods to learn more comprehensive representations of malware variants, which significantly reduces the cost of the entire graph retraining. As a result, MG-DVD is equipped with the ability to detect malware variants in real time, and it presents better interpretability by introducing meaningful meta-graphs. Comprehensive experiments on large-scale samples prove that our proposed MG-DVD outperforms state-of-the-art methods in detecting malware variants in terms of effectiveness and efficiency.

Download Full-text

Large-Scale Multi-View Subspace Clustering in Linear Time

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5867 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4412-4419 ◽

Cited By ~ 3

Author(s):

Zhao Kang ◽

Wangtao Zhou ◽

Zhitong Zhao ◽

Junming Shao ◽

Meng Han ◽

...

Keyword(s):

Large Scale ◽

State Of The Art ◽

Linear Time ◽

Subspace Clustering ◽

Data Sets ◽

Clustering Methods ◽

Single View ◽

Novel Approach ◽

Points Of View ◽

Effectiveness And Efficiency

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.

Download Full-text

Real Time Simulation for High Fidelity Multibody Dynamics and Mechatronic Systems

Volume 4B: Dynamics, Vibration, and Control ◽

10.1115/imece2014-39994 ◽

2014 ◽

Author(s):

William Prescott

Keyword(s):

Real Time ◽

Multibody Dynamics ◽

Large Scale ◽

State Of The Art ◽

High Fidelity ◽

Implicit Integration ◽

Mechatronic Systems ◽

Vehicle Simulation ◽

Current State ◽

Time Simulation

This paper will investigate the use of large scale multibody dynamics (MBD) models for real-time vehicle simulation. Current state of the art in the real-time solution of vehicle uses 15 degree of freedom models, but there is a need for higher-fidelity systems. To increase the fidelity of models uses this paper will propose the use of the following techniques: implicit integration, parallel processing and co-simulation in a real-time environment.

Download Full-text

Dynamic user modeling for expert recommendation in community question answering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200729 ◽

2020 ◽

Vol 39 (5) ◽

pp. 7281-7292

Author(s):

Tongze He ◽

Caili Guo ◽

Yunfei Chu ◽

Yang Yang ◽

Yanjun Wang

Keyword(s):

Real World ◽

Large Scale ◽

Question Answering ◽

Dynamic Change ◽

Superior Performance ◽

User Interest ◽

Short Term ◽

Expert Recommendation ◽

Community Question Answering ◽

User Expertise

Community Question Answering (CQA) websites has become an important channel for people to acquire knowledge. In CQA, one key issue is to recommend users with high expertise and willingness to answer the given questions, i.e., expert recommendation. However, a lot of existing methods consider the expert recommendation problem in a static context, ignoring that the real-world CQA websites are dynamic, with users’ interest and expertise changing over time. Although some methods that utilize time information have been proposed, their performance improvement can be limited due to fact that they fail they fail to consider the dynamic change of both user interests and expertise. To solve these problems, we propose a deep learning based framework for expert recommendation to exploit user interest and expertise in a dynamic environment. For user interest, we leverage Long Short-Term Memory (LSTM) to model user’s short-term interest so as to capture the dynamic change of users’ interests. For user expertise, we design user expertise network, which leverages feedback on users’ historical behavior to estimate their expertise on new question. We propose two methods in user expertise network according to whether the dynamic property of expertise is considered. The experimental results on a large-scale dataset from a real-world CQA site demonstrate the superior performance of our method.

Download Full-text

A Data Structure for real-time Aggregation Queries of Big Brain Networks

10.1101/346338 ◽

2018 ◽

Author(s):

Florian Ganglberger ◽

Joanna Kaczanowska ◽

Wulf Haubensak ◽

Katja Bühler

Keyword(s):

Gene Expression ◽

Data Structure ◽

Real Time ◽

Large Scale ◽

State Of The Art ◽

Brain Networks ◽

Spatial Context ◽

Brain Areas ◽

Public Data ◽

Aggregation Queries

AbstractRecent advances in neuro-imaging allowed big brain-initiatives and consortia to create vast resources of brain data that can be mined by researchers for their individual projects. Exploring the relationship between genes, brain circuitry, and behavior is one of key elements of neuroscience research. This requires fusion of spatial connectivity data at varying scales, such as whole brain correlated gene expression, structural and functional connectivity. With ever-increasing resolution, those exceed the past state-of-the art in several orders of magnitude in size and complexity. Current analytical workflows in neuroscience involve time-consuming manual aggregation of the data and only sparsely incorporate spatial context to operate continuously on multiple scales. Incorporating techniques for handling big connectivity data is therefore a necessity.We propose a data structure to explore heterogeneous neurobiological connectivity data for integrated visual analytics workflows. Aggregation Queries, i.e. the aggregated connectivity from, to or between brain areas allow experts the comparison of multimodal networks residing at different scales, or levels of hierarchically organized anatomical atlases. Executed on-demand on volumetric gene expression and connectivity data, they enable an interactive dissection of networks, with billions of edges, in real-time, and based on their spatial context. The data structure is optimized to be accessed directly from the hard disk, since connectivity of large-scale networks typically exceed the memory size of current consumer level PCs. This allows experts to embed and explore their own experimental data in the framework of public data resources without large-scale infrastructure.Our novel data structure outperforms state-of-the-art graph engines in retrieving connectivity of local brain areas experimentally. We demonstrate the application of our approach for neuroscience by analyzing fear-related functional neuroanatomy in mice. Further, we show its versatility by comparing multimodal brain networks linked to autism. Importantly, we achieve cross-species congruence in retrieving human psychiatric traits networks, which facilitates selection of neural substrates to be further studied in mouse models.

Download Full-text

A large dataset for the evaluation of ontology matching

The Knowledge Engineering Review ◽

10.1017/s026988890900023x ◽

2009 ◽

Vol 24 (2) ◽

pp. 137-157 ◽

Cited By ~ 22

Author(s):

Fausto Giunchiglia ◽

Mikalai Yatskevich ◽

Paolo Avesani ◽

Pavel Shivaiko

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Ontology Matching ◽

Large Dataset ◽

Scale Evaluation ◽

Evaluation Dataset ◽

Matching Techniques ◽

Web Directories

AbstractRecently, the number of ontology matching techniques and systems has increased significantly. This makes the issue of their evaluation and comparison more severe. One of the challenges of the ontology matching evaluation is in building large-scale evaluation datasets. In fact, the number of possible correspondences between two ontologies grows quadratically with respect to the numbers of entities in these ontologies. This often makes the manual construction of the evaluation datasets demanding to the point of being infeasible for large-scale matching tasks. In this paper, we present an ontology matching evaluation dataset composed of thousands of matching tasks, called TaxME2. It was built semi-automatically out of the Google, Yahoo, and Looksmart web directories. We evaluated TaxME2 by exploiting the results of almost two-dozen of state-of-the-art ontology matching systems. The experiments indicate that the dataset possesses the desired key properties, namely it is error-free, incremental, discriminative, monotonic, and hard for the state-of-the-art ontology matching systems.

Download Full-text

IMPLEMENTING BACK-PROPAGATION- THROUGH-TIME LEARNING ALGORITHM USING CELLULAR NEURAL NETWORKS

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127499000730 ◽

1999 ◽

Vol 09 (06) ◽

pp. 1041-1074 ◽

Cited By ~ 4

Author(s):

TAO YANG ◽

LEON O. CHUA

Keyword(s):

Real Time ◽

Time Complexity ◽

Large Scale ◽

Learning Algorithm ◽

State Of The Art ◽

Back Propagation ◽

Cellular Neural Network ◽

Small Scale ◽

Von Neumann ◽

On Line

In a programmable (multistage) cellular neural network (CNN) structure, the CPU is a CNN universal chip which supports massively parallel computations on patterns and images, including videos. In this paper, we decompose the structure of a class of simultaneous recurrent networks (SRN) into a CNN program and run it on a von Neumann-like stored program CNN structure. To train the SRN, we map the back-propagation-through-time (BTT) learning algorithm into a sequence of CNN subroutines to achieve real-time performance via a CNN universal chip. By computing in parallel, the CNN universal chip can be programmed to implement in real time the BTT learning algorithm, which has a very high time complexity. An estimate of the time complexity of the BTT learning algorithm based on the CNN universal chip is presented. For small-scale problems, our simulation results show that a CNN implementation of the BTT learning algorithm for a two-dimensional SRN is at least 10,000 times faster than that based on state-of-the-art sequential workstations. For the few large-scale problems which we have so far simulated, the CNN implemented BTT learning algorithm maintained virtually the same time complexity with a learning time of a few seconds, while those implemented on state-of-the-art sequential workstations dramatically increased their time complexity, often requiring several days of running time. Several examples are presented to demonstrate how efficiently a CNN universal chip can speed up the learning algorithm for both off-line and on-line applications.

Download Full-text

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

PLoS ONE ◽

10.1371/journal.pone.0253829 ◽

2021 ◽

Vol 16 (6) ◽

pp. e0253829

Author(s):

Karthik V. Sarma ◽

Alex G. Raman ◽

Nikhil J. Dhinagar ◽

Alan M. Priester ◽

Stephanie Harmon ◽

...

Keyword(s):

Hausdorff Distance ◽

Large Scale ◽

State Of The Art ◽

Learning Performance ◽

Research Quality ◽

Large Dataset ◽

Template Model ◽

Prostate Segmentation ◽

Original Dataset ◽

Entire Dataset

Purpose Developing large-scale datasets with research-quality annotations is challenging due to the high cost of refining clinically generated markup into high precision annotations. We evaluated the direct use of a large dataset with only clinically generated annotations in development of high-performance segmentation models for small research-quality challenge datasets. Materials and methods We used a large retrospective dataset from our institution comprised of 1,620 clinically generated segmentations, and two challenge datasets (PROMISE12: 50 patients, ProstateX-2: 99 patients). We trained a 3D U-Net convolutional neural network (CNN) segmentation model using our entire dataset, and used that model as a template to train models on the challenge datasets. We also trained versions of the template model using ablated proportions of our dataset, and evaluated the relative benefit of those templates for the final models. Finally, we trained a version of the template model using an out-of-domain brain cancer dataset, and evaluated the relevant benefit of that template for the final models. We used five-fold cross-validation (CV) for all training and evaluation across our entire dataset. Results Our model achieves state-of-the-art performance on our large dataset (mean overall Dice 0.916, average Hausdorff distance 0.135 across CV folds). Using this model as a pre-trained template for refining on two external datasets significantly enhanced performance (30% and 49% enhancement in Dice scores respectively). Mean overall Dice and mean average Hausdorff distance were 0.912 and 0.15 for the ProstateX-2 dataset, and 0.852 and 0.581 for the PROMISE12 dataset. Using even small quantities of data to train the template enhanced performance, with significant improvements using 5% or more of the data. Conclusion We trained a state-of-the-art model using unrefined clinical prostate annotations and found that its use as a template model significantly improved performance in other prostate segmentation tasks, even when trained with only 5% of the original dataset.

Download Full-text

DefogGAN: Predicting Hidden Information in the StarCraft Fog of War with Generative Adversarial Nets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5853 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4296-4303

Author(s):

Yonghyun Jeong ◽

Hyunjin Choi ◽

Byoungjip Kim ◽

Youngjune Gwon

Keyword(s):

Real Time ◽

State Of The Art ◽

Superior Performance ◽

Hidden Information ◽

Large Dataset ◽

State Information ◽

Multiple Feature ◽

Partially Observed ◽

Generative Approach ◽

Feature Resolution

We propose DefogGAN, a generative approach to the problem of inferring state information hidden in the fog of war for real-time strategy (RTS) games. Given a partially observed state, DefogGAN generates defogged images of a game as predictive information. Such information can lead to create a strategic agent for the game. DefogGAN is a conditional GAN variant featuring pyramidal reconstruction loss to optimize on multiple feature resolution scales. We have validated DefogGAN empirically using a large dataset of professional StarCraft replays. Our results indicate that DefogGAN can predict the enemy buildings and combat units as accurately as professional players do and achieves a superior performance among state-of-the-art defoggers.

Download Full-text

Advances and Trends in Real Time Visual Crowd Analysis

Sensors ◽

10.3390/s20185073 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5073

Author(s):

Khalil Khan ◽

Waleed Albattah ◽

Rehan Ullah Khan ◽

Ali Mustafa Qamar ◽

Durre Nayab

Keyword(s):

Real Time ◽

Large Scale ◽

State Of The Art ◽

Research Work ◽

Future Research ◽

Crowd Analysis ◽

Crowd Management ◽

Public Events ◽

Advantages And Disadvantages ◽

Art Methods

Real time crowd analysis represents an active area of research within the computer vision community in general and scene analysis in particular. Over the last 10 years, various methods for crowd management in real time scenario have received immense attention due to large scale applications in people counting, public events management, disaster management, safety monitoring an so on. Although many sophisticated algorithms have been developed to address the task; crowd management in real time conditions is still a challenging problem being completely solved, particularly in wild and unconstrained conditions. In the proposed paper, we present a detailed review of crowd analysis and management, focusing on state-of-the-art methods for both controlled and unconstrained conditions. The paper illustrates both the advantages and disadvantages of state-of-the-art methods. The methods presented comprise the seminal research works on crowd management, and monitoring and then culminating state-of-the-art methods of the newly introduced deep learning methods. Comparison of the previous methods is presented, with a detailed discussion of the direction for future research work. We believe this review article will contribute to various application domains and will also augment the knowledge of the crowd analysis within the research community.

Download Full-text