Attention Mechanism based Real Time Gaze Tracking in Natural Scenes with Residual Blocks

With the rapid growth of e-commerce in recent years, e-commerce platforms are becoming a primary place for people to find, compare and ultimately purchase products. To improve online shopping experience for consumers and increase sales for sellers, it is important to understand user intent accurately and be notified of its change timely. In this way, the right information could be offered to the right person at the right time. To achieve this goal, we propose a unified deep intent prediction network, named EdgeDIPN, which is deployed at the edge, i.e., mobile device, and able to monitor multiple user intent with different granularity simultaneously in real-time. We propose to train EdgeDIPN with multi-task learning, by which EdgeDIPN can share representations between different tasks for better performance and saving edge resources in the meantime. In particular, we propose a novel task-specific attention mechanism which enables different tasks to pick out the most relevant features from different data sources. To extract the shared representations more effectively, we utilize two kinds of attention mechanisms, where the multi-level attention mechanism tries to identify the important actions within each data source and the inter-view attention mechanism learns the interactions between different data sources. In the experiments conducted on a large-scale industrial dataset, EdgeDIPN significantly outperforms the baseline solutions. Moreover, EdgeDIPN has been deployed in the operational system of Alibaba. Online A/B testing results in several business scenarios reveal the potential of monitoring user intent in real-time. To the best of our knowledge, EdgeDIPN is the first full-fledged real-time user intent understanding center deployed at the edge and serving hundreds of millions of users in a large-scale e-commerce platform.

Download Full-text

Real-time gaze estimation via pupil center tracking

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0002 ◽

2018 ◽

Vol 9 (1) ◽

pp. 6-18 ◽

Cited By ~ 2

Author(s):

Dario Cazzato ◽

Fabio Dominio ◽

Roberto Manduchi ◽

Silvia M. Castro

Keyword(s):

Real Time ◽

Learning Algorithm ◽

Natural Environments ◽

Gaze Estimation ◽

Head Pose ◽

Data Set ◽

Gaze Tracking ◽

Illumination Changes ◽

Wide Range ◽

Estimation System

Abstract Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications.

Download Full-text

Deterministic and Stochastic Methods for Gaze Tracking in Real-Time

Computer Analysis of Images and Patterns - Lecture Notes in Computer Science ◽

10.1007/978-3-540-74272-2_6 ◽

2007 ◽

pp. 45-52 ◽

Cited By ~ 1

Author(s):

Javier Orozco ◽

F. Xavier Roca ◽

Jordi Gonzàlez

Keyword(s):

Real Time ◽

Stochastic Methods ◽

Gaze Tracking

Download Full-text

Real-time pedestrian tracking in natural scenes

Computer Analysis of Images and Patterns - Lecture Notes in Computer Science ◽

10.1007/3-540-63460-6_98 ◽

1997 ◽

pp. 42-49 ◽

Cited By ~ 8

Author(s):

J. Denzler ◽

H. Niemann

Keyword(s):

Real Time ◽

Natural Scenes ◽

Pedestrian Tracking

Download Full-text

Real-Time Ultrasound Image Despeckling Using Mixed-Attention Mechanism Based Residual UNet

IEEE Access ◽

10.1109/access.2020.3034230 ◽

2020 ◽

Vol 8 ◽

pp. 195327-195340

Author(s):

Yancheng Lan ◽

Xuming Zhang

Keyword(s):

Real Time ◽

Ultrasound Image ◽

Attention Mechanism ◽

Image Despeckling

Download Full-text

Real Time Learning Evaluation Based on Gaze Tracking

2015 14th International Conference on Computer-Aided Design and Computer Graphics (CAD/Graphics) ◽

10.1109/cadgraphics.2015.13 ◽

2015 ◽

Cited By ~ 3

Author(s):

Jiayue Yi ◽

Bin Sheng ◽

Ruimin Shen ◽

Weiyao Lin ◽

Enhua Wu

Keyword(s):

Real Time ◽

Gaze Tracking ◽

Learning Evaluation

Download Full-text

A New Multi-Scale Convolutional Model Based on Multiple Attention for Image Classification

Applied Sciences ◽

10.3390/app10010101 ◽

2019 ◽

Vol 10 (1) ◽

pp. 101 ◽

Cited By ~ 3

Author(s):

Yadong Yang ◽

Chengji Xu ◽

Feng Dong ◽

Xiaofeng Wang

Keyword(s):

Image Classification ◽

Attention Mechanism ◽

Natural Scenes ◽

Significant Information ◽

Convolutional Networks ◽

Multi Scale ◽

Model Based ◽

Convolution Model ◽

Grouping Method ◽

Feature Expression

Computer vision systems are insensitive to the scale of objects in natural scenes, so it is important to study the multi-scale representation of features. Res2Net implements hierarchical multi-scale convolution in residual blocks, but its random grouping method affects the robustness and intuitive interpretability of the network. We propose a new multi-scale convolution model based on multiple attention. It introduces the attention mechanism into the structure of a Res2-block to better guide feature expression. First, we adopt channel attention to score channels and sort them in descending order of the feature’s importance (Channels-Sort). The sorted residual blocks are grouped and intra-block hierarchically convolved to form a single attention and multi-scale block (AMS-block). Then, we implement channel attention on the residual small blocks to constitute a dual attention and multi-scale block (DAMS-block). Introducing spatial attention before sorting the channels to form multi-attention multi-scale blocks(MAMS-block). A MAMS-convolutional neural network (CNN) is a series of multiple MAMS-blocks. It enables significant information to be expressed at more levels, and can also be easily grafted into different convolutional structures. Limited by hardware conditions, we only prove the validity of the proposed ideas through convolutional networks of the same magnitude. The experimental results show that the convolution model with an attention mechanism and multi-scale features is superior in image classification.

Download Full-text

RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) ◽

10.1109/iccvw.2019.00568 ◽

2019 ◽

Cited By ~ 6

Author(s):

Aayush K. Chaudhary ◽

Rakshit Kothari ◽

Manoj Acharya ◽

Shusil Dangi ◽

Nitinraj Nair ◽

...

Keyword(s):

Real Time ◽

Semantic Segmentation ◽

Gaze Tracking

Download Full-text

A Study on the Characteristics of Consumer Visual-Perceptional Information Acquisition in Commercial Facilities in Regard to its Construction of Space from Real-Time Eye Gaze Tracking

Korean Society for Emotion and Sensibility ◽

10.14695/kjsos.2018.21.2.3 ◽

2018 ◽

Vol 21 (2) ◽

pp. 3-14

Author(s):

Sunmyung Park

Keyword(s):

Real Time ◽

Information Acquisition ◽

Eye Gaze ◽

Gaze Tracking

Download Full-text

Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving

Sensors ◽

10.3390/s21238072 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8072

Author(s):

Yu-Bang Chang ◽

Chieh Tsai ◽

Chang-Hong Lin ◽

Poki Chen

Keyword(s):

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Attention Mechanism ◽

Trade Off ◽

Segmentation Methods ◽

General Semantic ◽

Deep Learning Model

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.

Download Full-text