Augmented Intention Model for Next-Location Prediction from Graphical Trajectory Context

Human trajectory prediction is an essential task for various applications such as travel recommendation, location-sensitive advertisement, and traffic planning. Most existing approaches are sequential-model based and produce a prediction by mining behavior patterns. However, the effectiveness of pattern-based methods is not as good as expected in real-life conditions, such as data sparse or data missing. Moreover, due to the technical limitations of sensors or the traffic situation at the given time, people going to the same place may produce different trajectories. Even for people traveling along the same route, the observed transit records are not exactly the same. Therefore trajectories are always diverse, and extracting user intention from trajectories is difficult. In this paper, we propose an augmented-intention recurrent neural network (AI-RNN) model to predict locations in diverse trajectories. We first propose three strategies to generate graph structures to demonstrate travel context and then leverage graph convolutional networks to augment user travel intentions under graph view. Finally, we use gated recurrent units with augmented node vectors to predict human trajectories. We experiment with two representative real-life datasets and evaluate the performance of the proposed model by comparing its results with those of other state-of-the-art models. The results demonstrate that the AI-RNN model outperforms other methods in terms of top-k accuracy, especially in scenarios with low similarity.

Download Full-text

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018658 ◽

2019 ◽

Vol 33 ◽

pp. 8658-8665 ◽

Cited By ~ 10

Author(s):

Xiangpeng Li ◽

Jingkuan Song ◽

Lianli Gao ◽

Xianglong Liu ◽

Wenbing Huang ◽

...

Keyword(s):

Question Answering ◽

State Of The Art ◽

Computation Time ◽

Comparable Result ◽

Video Encoding ◽

Visual Question Answering ◽

Proposed Model ◽

Ablation Study ◽

The Given ◽

Video Question Answering

Most of the recent progresses on visual question answering are based on recurrent neural networks (RNNs) with attention. Despite the success, these models are often timeconsuming and having difficulties in modeling long range dependencies due to the sequential nature of RNNs. We propose a new architecture, Positional Self-Attention with Coattention (PSAC), which does not require RNNs for video question answering. Specifically, inspired by the success of self-attention in machine translation task, we propose a Positional Self-Attention to calculate the response at each position by attending to all positions within the same sequence, and then add representations of absolute positions. Therefore, PSAC can exploit the global dependencies of question and temporal information in the video, and make the process of question and video encoding executed in parallel. Furthermore, in addition to attending to the video features relevant to the given questions (i.e., video attention), we utilize the co-attention mechanism by simultaneously modeling “what words to listen to” (question attention). To the best of our knowledge, this is the first work of replacing RNNs with selfattention for the task of visual question answering. Experimental results of four tasks on the benchmark dataset show that our model significantly outperforms the state-of-the-art on three tasks and attains comparable result on the Count task. Our model requires less computation time and achieves better performance compared with the RNNs-based methods. Additional ablation study demonstrates the effect of each component of our proposed model.

Download Full-text

Topology Optimization based Graph Convolutional Network

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/563 ◽

2019 ◽

Cited By ~ 2

Author(s):

Liang Yang ◽

Zesheng Kang ◽

Xiaochun Cao ◽

Di Jin ◽

Bo Yang ◽

...

Keyword(s):

Topology Optimization ◽

Network Topology ◽

State Of The Art ◽

Convolutional Network ◽

Topological Information ◽

Convolutional Networks ◽

The Past ◽

Attributed Network ◽

Fully Connected ◽

The Given

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

Download Full-text

Learning with Adaptive Neighbors for Image Clustering

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/344 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Quanxue Gao ◽

Zhaohua Yang ◽

Shujian Wang

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

Original Data ◽

Image Clustering ◽

Complex Structures ◽

Clustering Methods ◽

Proposed Model ◽

Data Graph ◽

The Given ◽

Optimal Graph

Due to the importance and efficiency of learning complex structures hidden in data, graph-based methods have been widely studied and get successful in unsupervised learning. Generally, most existing graph-based clustering methods require post-processing on the original data graph to extract the clustering indicators. However, there are two drawbacks with these methods: (1) the cluster structures are not explicit in the clustering results; (2) the final clustering performance is sensitive to the construction of the original data graph. To solve these problems, in this paper, a novel learning model is proposed to learn a graph based on the given data graph such that the new obtained optimal graph is more suitable for the clustering task. We also propose an efficient algorithm to solve the model. Extensive experimental results illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

Download Full-text

Audio Captioning with Composition of Acoustic and Semantic Information

International Journal of Semantic Computing ◽

10.1142/s1793351x21400018 ◽

2021 ◽

Vol 15 (02) ◽

pp. 143-160

Author(s):

Ayşegül Özkaya Eren ◽

Mustafa Sert

Keyword(s):

Language Processing ◽

Semantic Information ◽

State Of The Art ◽

Research Area ◽

Audio Features ◽

Audio Clip ◽

Proposed Model ◽

Decoder Architecture ◽

Gated Recurrent Units ◽

New Research

Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder–decoder-based models without considering semantic information. To fill this gap, we present a novel encoder–decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder–decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the efficiency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.

Download Full-text

Automatic Manipulator Tracking Control Based on Moving Target Trajectory Prediction

Scientific Programming ◽

10.1155/2021/7944300 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Haifeng Luo

Keyword(s):

Target Tracking ◽

Tracking Control ◽

Moving Target ◽

Trajectory Prediction ◽

Target Trajectory ◽

Proposed Model ◽

Core Issue ◽

Testing Algorithm ◽

The Given ◽

Automatic Manipulator

The core issue of automatic manipulator tracking control is how to ensure the given moving target follows the expected trajectory and adapts to various uncertain factors. However, the existing moving target trajectory prediction methods rely on highly complex and accurate models, lacking the ability to generalize different automatic manipulator tracking scenarios. Therefore, this study tries to find a way to realize automatic manipulator tracking control based on moving target trajectory prediction. In particular, a moving target trajectory prediction model was established, and its parameters were optimized. Next, a tracking-training-testing algorithm was proposed for manipulator’s automatic moving target tracking, and the operating flows were detailed for training module, target detection module, and target tracking module. The proposed model and algorithm were proved effective through experiments.

Download Full-text

Knowledge-Grounded Chatbot Based on Dual Wasserstein Generative Adversarial Networks with Effective Attention Mechanisms

Applied Sciences ◽

10.3390/app10093335 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3335 ◽

Cited By ~ 1

Author(s):

Sihyung Kim ◽

Oh-Woog Kwon ◽

Harksoo Kim

Keyword(s):

State Of The Art ◽

The State ◽

Generative Adversarial Networks ◽

External Knowledge ◽

Adversarial Networks ◽

Proposed Model ◽

Context Knowledge ◽

The Given ◽

Internal Knowledge

A conversation is based on internal knowledge that the participants already know or external knowledge that they have gained during the conversation. A chatbot that communicates with humans by using its internal and external knowledge is called a knowledge-grounded chatbot. Although previous studies on knowledge-grounded chatbots have achieved reasonable performance, they may still generate unsuitable responses that are not associated with the given knowledge. To address this problem, we propose a knowledge-grounded chatbot model that effectively reflects the dialogue context and given knowledge by using well-designed attention mechanisms. The proposed model uses three kinds of attention: Query-context attention, query-knowledge attention, and context-knowledge attention. In our experiments with the Wizard-of-Wikipedia dataset, the proposed model showed better performances than the state-of-the-art model in a variety of measures.

Download Full-text

Multiple-Attribute Decision-Making Using Fermatean Fuzzy Hamacher Interactive Geometric Operators

Mathematical Problems in Engineering ◽

10.1155/2021/5150933 ◽

2021 ◽

Vol 2021 ◽

pp. 1-20

Author(s):

Gulfam Shahzadi ◽

Fariha Zafar ◽

Maha Abdullah Alghamdi

Keyword(s):

Decision Making ◽

Real Life ◽

Salient Feature ◽

Multiple Attribute Decision Making ◽

Aggregation Operators ◽

Proposed Model ◽

Multiple Attribute ◽

The Impact ◽

The Given ◽

The Relationship

Fermatean fuzzy set (FFS) is a more efficient, flexible, and generalized model to deal with uncertainty, as compared to intuitionistic and Pythagorean fuzzy models. This research article presents a novel multiple-attribute decision-making (MADM) technique based on FFS. Aggregation operators (AOs), for example, Dombi, Einstein, and Hamacher, are frequently being used in the MADM process and are considered useful tools for evaluating the given alternatives. Among these, one of the most effective is the Hamacher operator. The salient feature of this operator is that it reduces the impact of negative information and provides more accurate results. Motivated by the primary characteristics of the Hamacher operator, we apply Hamacher interactive aggregation operators based on FFSs to determine the best alternative. Using Hamacher’s norm operations, we introduce some new geometric operators, namely, Fermatean fuzzy Hamacher interactive weighted geometric (FFHIWG) operator, Fermatean fuzzy Hamacher interactive ordered weighted geometric (FFHIOWG) operator, and Fermatean fuzzy Hamacher interactive hybrid weighted geometric (FFHIHWG) operator. Some important results and properties of the proposed AOs are discussed, and to achieve the optimal alternative, the proposed MADM technique is carried out in a real-life application of the medical field. An algorithm of the proposed technique is also developed. The significance of the proposed method is that Fermatean fuzzy Hamacher interactive geometric (FFHIG) operators deal with the relationship among belongingness degree (BD) and nonbelongingness degree (NBD) of the objects, which perform a crucial role in decision-making (DM). At last, to show the exactness and validity of the proposed work, a comparative analysis of the proposed model and the existing models is presented.

Download Full-text

Localizing Natural Language in Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018175 ◽

2019 ◽

Vol 33 ◽

pp. 8175-8182 ◽

Cited By ~ 13

Author(s):

Jingyuan Chen ◽

Lin Ma ◽

Xinpeng Chen ◽

Zequn Jie ◽

Jiebo Luo

Keyword(s):

Natural Language ◽

Video Sequence ◽

State Of The Art ◽

Recurrent Networks ◽

The Public ◽

Fine Grained ◽

Proposed Model ◽

Boundary Model ◽

The Given ◽

Language Description

In this paper, we consider the task of natural language video localization (NLVL): given an untrimmed video and a natural language description, the goal is to localize a segment in the video which semantically corresponds to the given natural language description. We propose a localizing network (LNet), working in an end-to-end fashion, to tackle the NLVL task. We first match the natural sentence and video sequence by cross-gated attended recurrent networks to exploit their fine-grained interactions and generate a sentence-aware video representation. A self interactor is proposed to perform crossframe matching, which dynamically encodes and aggregates the matching evidences. Finally, a boundary model is proposed to locate the positions of video segments corresponding to the natural sentence description by predicting the starting and ending points of the segment. Extensive experiments conducted on the public TACoS and DiDeMo datasets demonstrate that our proposed model performs effectively and efficiently against the state-of-the-art approaches.

Download Full-text

Risk-Based Robust Bidding Strategies for EVs’ Aggregators in Day-ahead Markets with Uncertainty

10.36227/techrxiv.12771257 ◽

2020 ◽

Author(s):

Ahmed Abdelmoaty ◽

Wessam Mesbah ◽

Mohammad A. M. Abdel-Aal ◽

Ali T. Alawami

Keyword(s):

Robust Optimization ◽

Electricity Market ◽

Real Life ◽

Optimization Approach ◽

Case Scenario ◽

Worst Case ◽

Bidding Strategies ◽

Wind Generators ◽

Proposed Model ◽

Optimal Bidding

In the recent electricity market framework, the profit of the generation companies depends on the decision of the operator on the schedule of its units, the energy price, and the optimal bidding strategies. Due to the expanded integration of uncertain renewable generators which is highly intermittent such as wind plants, the coordination with other facilities to mitigate the risks of imbalances is mandatory. Accordingly, coordination of wind generators with the evolutionary Electric Vehicles (EVs) is expected to boost the performance of the grid. In this paper, we propose a robust optimization approach for the coordination between the wind-thermal generators and the EVs in a virtual<br>power plant (VPP) environment. The objective of maximizing the profit of the VPP Operator (VPPO) is studied. The optimal bidding strategy of the VPPO in the day-ahead market under uncertainties of wind power, energy<br>prices, imbalance prices, and demand is obtained for the worst case scenario. A case study is conducted to assess the e?effectiveness of the proposed model in terms of the VPPO's profit. A comparison between the proposed model and the scenario-based optimization was introduced. Our results confirmed that, although the conservative behavior of the worst-case robust optimization model, it helps the decision maker from the fluctuations of the uncertain parameters involved in the production and bidding processes. In addition, robust optimization is a more tractable problem and does not suffer from<br>the high computation burden associated with scenario-based stochastic programming. This makes it more practical for real-life scenarios.<br>

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text