EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.

Download Full-text

Self-Interfaces: Utilizing Real-Time Biofeedback in the Wild to Elicit Subconscious Behavior Change

Proceedings of the Fourteenth International Conference on Tangible, Embedded, and Embodied Interaction ◽

10.1145/3374920.3374979 ◽

2020 ◽

Author(s):

Nava Haghighi ◽

Arvind Satyanarayan

Keyword(s):

Behavior Change ◽

Real Time ◽

In The Wild

Download Full-text

XERIS/APEX

ACM SIGAda Ada Letters ◽

10.1145/3463478.3463484 ◽

2021 ◽

Vol 40 (2) ◽

pp. 65-69

Author(s):

Richard Wai

Keyword(s):

Distributed Systems ◽

Real Time ◽

Distributed System ◽

Dynamic Scaling ◽

Distributed Application ◽

Heavy Weight ◽

System Models ◽

In The Wild ◽

On Line ◽

Language Technologies

Modern day cloud native applications have become broadly representative of distributed systems in the wild. However, unlike traditional distributed system models with conceptually static designs, cloud-native systems emphasize dynamic scaling and on-line iteration (CI/CD). Cloud-native systems tend to be architected around a networked collection of distinct programs ("microservices") that can be added, removed, and updated in real-time. Typically, distinct containerized programs constitute individual microservices that then communicate among the larger distributed application through heavy-weight protocols. Common communication stacks exchange JSON or XML objects over HTTP, via TCP/TLS, and incur significant overhead, particularly when using small size message sizes. Additionally, interpreted/JIT/VM-based languages such as Javascript (NodeJS/Deno), Java, and Python are dominant in modern microservice programs. These language technologies, along with the high-overhead messaging, can impose superlinear cost increases (hardware demands) on scale-out, particularly towards hyperscale and/or with latency-sensitive workloads.

Download Full-text

Hybrid Attention Cascade Network for Facial Expression Recognition

Sensors ◽

10.3390/s21062003 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2003 ◽

Cited By ~ 1

Author(s):

Xiaoliang Zhu ◽

Shihao Ye ◽

Liang Zhao ◽

Zhicheng Dai

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Facial Expression Recognition ◽

Expression Recognition ◽

Spatial Features ◽

Face Images ◽

Temporal Features ◽

The Face ◽

In The Wild ◽

Fusion Features

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.

Download Full-text

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3463498 ◽

2021 ◽

Vol 5 (2) ◽

pp. 1-30

Author(s):

HyeonJung Park ◽

Youngki Lee ◽

JeongGil Ko

Keyword(s):

Real Time ◽

Sign Language ◽

Data Augmentation ◽

Language Translation ◽

Mobile Platforms ◽

Depth Cameras ◽

Language Data ◽

In The Wild ◽

Environmental Robustness ◽

Cloud Servers

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.

Download Full-text

A Distributed Real-Time Algorithm for Preference-Based Agreement

IFAC Proceedings Volumes ◽

10.3182/20110828-6-it-1002.03155 ◽

2011 ◽

Vol 44 (1) ◽

pp. 8933-8938

Author(s):

Daniel Zelazo ◽

Mathias Bürger ◽

Frank Allgöwer

Keyword(s):

Real Time ◽

Time Algorithm

Download Full-text

Real-Time Identification and Tracking of Infrared Markers Based on Kalman Filter

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.249-250.1147 ◽

2012 ◽

Vol 249-250 ◽

pp. 1147-1153

Author(s):

Qiao Na Xing ◽

Da Yuan Yan ◽

Xiao Ming Hu ◽

Jun Qin Lin ◽

Bo Yang

Keyword(s):

Kalman Filter ◽

Visible Light ◽

Real Time ◽

Complex Terrain ◽

Experimental Results ◽

Identification Method ◽

In The Wild ◽

Real Time Identification

Automatic equipmenttransportation in the wild complex terrain circumstances is very important in rescue or military. In this paper, an accompanying system based on the identification and tracking of infrared LEDmarkers is proposed. This system avoidsthe defect that visible-light identification method has. In addition, this paper presents a Kalman filter to predict where infraredmarkers may appear in the nextframe imageto reduce the searchingarea of infrared markers, which remarkablyimproves the identificationspeed of infrared markers. The experimental results show that the algorithm proposed in this paper is effective and feasible.

Download Full-text

Real-Time Algorithm for Versatile Displacement Sensors Based on Self-Mixing Interferometry

IEEE Sensors Journal ◽

10.1109/jsen.2015.2478755 ◽

2016 ◽

Vol 16 (1) ◽

pp. 195-202 ◽

Cited By ~ 23

Author(s):

Antonio Luna Arriaga ◽

Francis Bony ◽

Thierry Bosch

Keyword(s):

Real Time ◽

Time Algorithm ◽

Displacement Sensors

Download Full-text