scholarly journals Augmented Intention Model for Next-Location Prediction from Graphical Trajectory Context

2019 ◽  
Vol 2019 ◽  
pp. 1-12
Author(s):  
Canghong Jin ◽  
Zhiwei Lin ◽  
Minghui Wu

Human trajectory prediction is an essential task for various applications such as travel recommendation, location-sensitive advertisement, and traffic planning. Most existing approaches are sequential-model based and produce a prediction by mining behavior patterns. However, the effectiveness of pattern-based methods is not as good as expected in real-life conditions, such as data sparse or data missing. Moreover, due to the technical limitations of sensors or the traffic situation at the given time, people going to the same place may produce different trajectories. Even for people traveling along the same route, the observed transit records are not exactly the same. Therefore trajectories are always diverse, and extracting user intention from trajectories is difficult. In this paper, we propose an augmented-intention recurrent neural network (AI-RNN) model to predict locations in diverse trajectories. We first propose three strategies to generate graph structures to demonstrate travel context and then leverage graph convolutional networks to augment user travel intentions under graph view. Finally, we use gated recurrent units with augmented node vectors to predict human trajectories. We experiment with two representative real-life datasets and evaluate the performance of the proposed model by comparing its results with those of other state-of-the-art models. The results demonstrate that the AI-RNN model outperforms other methods in terms of top-k accuracy, especially in scenarios with low similarity.

Author(s):  
Xiangpeng Li ◽  
Jingkuan Song ◽  
Lianli Gao ◽  
Xianglong Liu ◽  
Wenbing Huang ◽  
...  

Most of the recent progresses on visual question answering are based on recurrent neural networks (RNNs) with attention. Despite the success, these models are often timeconsuming and having difficulties in modeling long range dependencies due to the sequential nature of RNNs. We propose a new architecture, Positional Self-Attention with Coattention (PSAC), which does not require RNNs for video question answering. Specifically, inspired by the success of self-attention in machine translation task, we propose a Positional Self-Attention to calculate the response at each position by attending to all positions within the same sequence, and then add representations of absolute positions. Therefore, PSAC can exploit the global dependencies of question and temporal information in the video, and make the process of question and video encoding executed in parallel. Furthermore, in addition to attending to the video features relevant to the given questions (i.e., video attention), we utilize the co-attention mechanism by simultaneously modeling “what words to listen to” (question attention). To the best of our knowledge, this is the first work of replacing RNNs with selfattention for the task of visual question answering. Experimental results of four tasks on the benchmark dataset show that our model significantly outperforms the state-of-the-art on three tasks and attains comparable result on the Count task. Our model requires less computation time and achieves better performance compared with the RNNs-based methods. Additional ablation study demonstrates the effect of each component of our proposed model.


Author(s):  
Liang Yang ◽  
Zesheng Kang ◽  
Xiaochun Cao ◽  
Di Jin ◽  
Bo Yang ◽  
...  

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.


Author(s):  
Yang Liu ◽  
Quanxue Gao ◽  
Zhaohua Yang ◽  
Shujian Wang

Due to the importance and efficiency of learning complex structures hidden in data, graph-based methods have been widely studied and get successful in unsupervised learning. Generally, most existing graph-based clustering methods require post-processing on the original data graph to extract the clustering indicators. However, there are two drawbacks with these methods: (1) the cluster structures are not explicit in the clustering results; (2) the final clustering performance is sensitive to the construction of the original data graph. To solve these problems, in this paper, a novel learning model is proposed to learn a graph based on the given data graph such that the new obtained optimal graph is more suitable for the clustering task. We also propose an efficient algorithm to solve the model. Extensive experimental results illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.


2021 ◽  
Vol 15 (02) ◽  
pp. 143-160
Author(s):  
Ayşegül Özkaya Eren ◽  
Mustafa Sert

Generating audio captions is a new research area that combines audio and natural language processing to create meaningful textual descriptions for audio clips. To address this problem, previous studies mostly use the encoder–decoder-based models without considering semantic information. To fill this gap, we present a novel encoder–decoder architecture using bi-directional Gated Recurrent Units (BiGRU) with audio and semantic embeddings. We extract semantic embedding by obtaining subjects and verbs from the audio clip captions and combine these embedding with audio embedding to feed the BiGRU-based encoder–decoder model. To enable semantic embeddings for the test audios, we introduce a Multilayer Perceptron classifier to predict the semantic embeddings of those clips. We also present exhaustive experiments to show the efficiency of different features and datasets for our proposed model the audio captioning task. To extract audio features, we use the log Mel energy features, VGGish embeddings, and a pretrained audio neural network (PANN) embeddings. Extensive experiments on two audio captioning datasets Clotho and AudioCaps show that our proposed model outperforms state-of-the-art audio captioning models across different evaluation metrics and using the semantic information improves the captioning performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Haifeng Luo

The core issue of automatic manipulator tracking control is how to ensure the given moving target follows the expected trajectory and adapts to various uncertain factors. However, the existing moving target trajectory prediction methods rely on highly complex and accurate models, lacking the ability to generalize different automatic manipulator tracking scenarios. Therefore, this study tries to find a way to realize automatic manipulator tracking control based on moving target trajectory prediction. In particular, a moving target trajectory prediction model was established, and its parameters were optimized. Next, a tracking-training-testing algorithm was proposed for manipulator’s automatic moving target tracking, and the operating flows were detailed for training module, target detection module, and target tracking module. The proposed model and algorithm were proved effective through experiments.


2020 ◽  
Vol 10 (9) ◽  
pp. 3335 ◽  
Author(s):  
Sihyung Kim ◽  
Oh-Woog Kwon ◽  
Harksoo Kim

A conversation is based on internal knowledge that the participants already know or external knowledge that they have gained during the conversation. A chatbot that communicates with humans by using its internal and external knowledge is called a knowledge-grounded chatbot. Although previous studies on knowledge-grounded chatbots have achieved reasonable performance, they may still generate unsuitable responses that are not associated with the given knowledge. To address this problem, we propose a knowledge-grounded chatbot model that effectively reflects the dialogue context and given knowledge by using well-designed attention mechanisms. The proposed model uses three kinds of attention: Query-context attention, query-knowledge attention, and context-knowledge attention. In our experiments with the Wizard-of-Wikipedia dataset, the proposed model showed better performances than the state-of-the-art model in a variety of measures.


2021 ◽  
Vol 2021 ◽  
pp. 1-20
Author(s):  
Gulfam Shahzadi ◽  
Fariha Zafar ◽  
Maha Abdullah Alghamdi

Fermatean fuzzy set (FFS) is a more efficient, flexible, and generalized model to deal with uncertainty, as compared to intuitionistic and Pythagorean fuzzy models. This research article presents a novel multiple-attribute decision-making (MADM) technique based on FFS. Aggregation operators (AOs), for example, Dombi, Einstein, and Hamacher, are frequently being used in the MADM process and are considered useful tools for evaluating the given alternatives. Among these, one of the most effective is the Hamacher operator. The salient feature of this operator is that it reduces the impact of negative information and provides more accurate results. Motivated by the primary characteristics of the Hamacher operator, we apply Hamacher interactive aggregation operators based on FFSs to determine the best alternative. Using Hamacher’s norm operations, we introduce some new geometric operators, namely, Fermatean fuzzy Hamacher interactive weighted geometric (FFHIWG) operator, Fermatean fuzzy Hamacher interactive ordered weighted geometric (FFHIOWG) operator, and Fermatean fuzzy Hamacher interactive hybrid weighted geometric (FFHIHWG) operator. Some important results and properties of the proposed AOs are discussed, and to achieve the optimal alternative, the proposed MADM technique is carried out in a real-life application of the medical field. An algorithm of the proposed technique is also developed. The significance of the proposed method is that Fermatean fuzzy Hamacher interactive geometric (FFHIG) operators deal with the relationship among belongingness degree (BD) and nonbelongingness degree (NBD) of the objects, which perform a crucial role in decision-making (DM). At last, to show the exactness and validity of the proposed work, a comparative analysis of the proposed model and the existing models is presented.


Author(s):  
Jingyuan Chen ◽  
Lin Ma ◽  
Xinpeng Chen ◽  
Zequn Jie ◽  
Jiebo Luo

In this paper, we consider the task of natural language video localization (NLVL): given an untrimmed video and a natural language description, the goal is to localize a segment in the video which semantically corresponds to the given natural language description. We propose a localizing network (LNet), working in an end-to-end fashion, to tackle the NLVL task. We first match the natural sentence and video sequence by cross-gated attended recurrent networks to exploit their fine-grained interactions and generate a sentence-aware video representation. A self interactor is proposed to perform crossframe matching, which dynamically encodes and aggregates the matching evidences. Finally, a boundary model is proposed to locate the positions of video segments corresponding to the natural sentence description by predicting the starting and ending points of the segment. Extensive experiments conducted on the public TACoS and DiDeMo datasets demonstrate that our proposed model performs effectively and efficiently against the state-of-the-art approaches.


2020 ◽  
Author(s):  
Ahmed Abdelmoaty ◽  
Wessam Mesbah ◽  
Mohammad A. M. Abdel-Aal ◽  
Ali T. Alawami

In the recent electricity market framework, the profit of the generation companies depends on the decision of the operator on the schedule of its units, the energy price, and the optimal bidding strategies. Due to the expanded integration of uncertain renewable generators which is highly intermittent such as wind plants, the coordination with other facilities to mitigate the risks of imbalances is mandatory. Accordingly, coordination of wind generators with the evolutionary Electric Vehicles (EVs) is expected to boost the performance of the grid. In this paper, we propose a robust optimization approach for the coordination between the wind-thermal generators and the EVs in a virtual<br>power plant (VPP) environment. The objective of maximizing the profit of the VPP Operator (VPPO) is studied. The optimal bidding strategy of the VPPO in the day-ahead market under uncertainties of wind power, energy<br>prices, imbalance prices, and demand is obtained for the worst case scenario. A case study is conducted to assess the e?effectiveness of the proposed model in terms of the VPPO's profit. A comparison between the proposed model and the scenario-based optimization was introduced. Our results confirmed that, although the conservative behavior of the worst-case robust optimization model, it helps the decision maker from the fluctuations of the uncertain parameters involved in the production and bidding processes. In addition, robust optimization is a more tractable problem and does not suffer from<br>the high computation burden associated with scenario-based stochastic programming. This makes it more practical for real-life scenarios.<br>


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Sign in / Sign up

Export Citation Format

Share Document