LARNet: Real-Time Detection of Facial Micro Expression Using Lossless Attention Residual Network

Facial micro expressions are brief, spontaneous, and crucial emotions deep inside the mind, reflecting the actual thoughts for that moment. Humans can cover their emotions on a large scale, but their actual intentions and emotions can be extracted at a micro-level. Micro expressions are organic when compared with macro expressions, posing a challenge to both humans, as well as machines, to identify. In recent years, detection of facial expressions are widely used in commercial complexes, hotels, restaurants, psychology, security, offices, and education institutes. The aim and motivation of this paper are to provide an end-to-end architecture that accurately detects the actual expressions at the micro-scale features. However, the main research is to provide an analysis of the specific parts that are crucial for detecting the micro expressions from a face. Many states of the art approaches have been trained on the micro facial expressions and compared with our proposed Lossless Attention Residual Network (LARNet) approach. However, the main research on this is to provide analysis on the specific parts that are crucial for detecting the micro expressions from a face. Many CNN-based approaches extracts the features at local level which digs much deeper into the face pixels. However, the spatial and temporal information extracted from the face is encoded in LARNet for a feature fusion extraction on specific crucial locations, such as nose, cheeks, mouth, and eyes regions. LARNet outperforms the state-of-the-art methods with a slight margin by accurately detecting facial micro expressions in real-time. Lastly, the proposed LARNet becomes accurate and better by training with more annotated data.

Download Full-text

Erratum to: Real-time, large-scale duplicate image detection method based on multi-feature fusion

Journal of Real-Time Image Processing ◽

10.1007/s11554-017-0673-8 ◽

2017 ◽

Vol 16 (5) ◽

pp. 1881-1881

Author(s):

Ming Chen ◽

Yuhua Li ◽

Zhifeng Zhang ◽

Ching-Hsien Hsu ◽

Shangguang Wang

Keyword(s):

Real Time ◽

Large Scale ◽

Detection Method ◽

Feature Fusion ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

Real-time, large-scale duplicate image detection method based on multi-feature fusion

Journal of Real-Time Image Processing ◽

10.1007/s11554-016-0632-9 ◽

2016 ◽

Vol 13 (3) ◽

pp. 557-570 ◽

Cited By ~ 6

Author(s):

Ming Chen ◽

Yuhua Li ◽

Zhifeng Zhang ◽

Ching-Hsien Hsu ◽

Shangguang Wang

Keyword(s):

Real Time ◽

Large Scale ◽

Detection Method ◽

Feature Fusion ◽

Image Detection ◽

Duplicate Image Detection

Download Full-text

Does the face show what the mind tells? A comparison between dynamic emotions obtained from facial expressions and Temporal Dominance of Emotions (TDE)

Food Quality and Preference ◽

10.1016/j.foodqual.2020.103976 ◽

2020 ◽

Vol 85 ◽

pp. 103976

Author(s):

Roelien van Bommel ◽

Markus Stieger ◽

Michel Visalli ◽

Rene de Wijk ◽

Gerry Jager

Keyword(s):

Facial Expressions ◽

The Face ◽

The Mind ◽

Temporal Dominance

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

The Face of Time: Temporal Cues in Facial Expressions of Emotion

Psychological Science ◽

10.1111/1467-9280.00054 ◽

1998 ◽

Vol 9 (4) ◽

pp. 270-276 ◽

Cited By ~ 51

Author(s):

Kari Edwards

Keyword(s):

Facial Expression ◽

Real Time ◽

Facial Expressions ◽

Experimental Task ◽

Temporal Cues ◽

Strategic Processes ◽

The Face ◽

Facial Expressions Of Emotion ◽

Spontaneous Expression ◽

Better Than

Results of studies reported here indicate that humans are attuned to temporal cues in facial expressions of emotion. The experimental task required subjects to reproduce the actual progression of a target person's spontaneous expression (i.e., onset to offset) from a scrambled set of photographs. Each photograph depicted a segment of the expression that corresponded to approximately 67 ms in real time. Results of two experiments indicated that (a) individuals could detect extremely subtle dynamic cues in a facial expression and could utilize these cues to reproduce the proper temporal progression of the display at above-chance levels of accuracy; (b) women performed significantly better than men on the task designed to assess this ability; (c) individuals were most sensitive to the temporal characteristics of the early stages of an expression; and (d) accuracy was inversely related to the amount of time allotted for the task. The latter finding may reflect the relative involvement of (error-prone) cognitively mediated or strategic processes in what is normally a relatively automatic, nonconscious process.

Download Full-text

Stereo Vision Based Sensory Substitution for the Visually Impaired

Sensors ◽

10.3390/s19122771 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2771 ◽

Cited By ~ 5

Author(s):

Simona Caraiman ◽

Otilia Zvoristeanu ◽

Adrian Burlacu ◽

Paul Herghelegiu

Keyword(s):

Real Time ◽

Stereo Vision ◽

Visually Impaired ◽

Large Scale ◽

Sensory Substitution ◽

Main Research ◽

Visually Impaired People ◽

Time Operation ◽

Outdoor Environments ◽

Real Time Operation

The development of computer vision based systems dedicated to help visually impaired people to perceive the environment, to orientate and navigate has been the main research subject of many works in the recent years. A significant ensemble of resources has been employed to support the development of sensory substitution devices (SSDs) and electronic travel aids for the rehabilitation of the visually impaired. The Sound of Vision (SoV) project used a comprehensive approach to develop such an SSD, tackling all the challenging aspects that so far restrained the large scale adoption of such systems by the intended audience: Wearability, real-time operation, pervasiveness, usability, cost. This article is set to present the artificial vision based component of the SoV SSD that performs the scene reconstruction and segmentation in outdoor environments. In contrast with the indoor use case, where the system acquires depth input from a structured light camera, in outdoors SoV relies on stereo vision to detect the elements of interest and provide an audio and/or haptic representation of the environment to the user. Our stereo-based method is designed to work with wearable acquisition devices and still provide a real-time, reliable description of the scene in the context of unreliable depth input from the stereo correspondence and of the complex 6 DOF motion of the head-worn camera. We quantitatively evaluate our approach on a custom benchmarking dataset acquired with SoV cameras and provide the highlights of the usability evaluation with visually impaired users.

Download Full-text

Reading Food Experiences from the Face: Effects of Familiarity and Branding of Soy Sauce on Facial Expressions and Video-Based RPPG Heart Rate

Foods ◽

10.3390/foods10061345 ◽

2021 ◽

Vol 10 (6) ◽

pp. 1345

Author(s):

Rene A. de Wijk ◽

Shota Ushiama ◽

Meeke Ummels ◽

Patrick Zimmerman ◽

Daisuke Kaneko ◽

...

Keyword(s):

Heart Rate ◽

Facial Expressions ◽

Skin Color ◽

Large Scale ◽

Soy Sauce ◽

Color Changes ◽

Video Images ◽

Study Results ◽

The Face ◽

Brand Information

Food experiences are not only driven by the food’s intrinsic properties, such as its taste, texture, and aroma, but also by extrinsic properties such as visual brand information and the consumers’ previous experiences with the foods. Recent developments in automated facial expression analysis and heart rate detection based on skin color changes (remote photoplethysmography or RPPG) allow for the monitoring of food experiences based on video images of the face. RPPG offers the possibility of large-scale non-laboratory and web-based testing of food products. In this study, results from the video-based analysis were compared to the more conventional tests (scores of valence and arousal using Emojis and photoplethysmography heart rate (PPG)). Forty participants with varying degrees of familiarity with soy sauce were presented with samples of rice and three commercial soy sauces with and without brand information. The results showed that (1) liking and arousal were affected primarily by the specific tastes, but not by branding and familiarity. In contrast, facial expressions were affected by branding and familiarity, and to a lesser degree by specific tastes. (2) RPPG heart rate and PPG both showed effects of branding and familiarity. However, RPPG heart rate needs further development because it underestimated the heart rate compared to PPG and was less sensitive to changes over time and with activity (viewing of brand information and tasting). In conclusion, this study suggests that recording of facial expressions and heart rates may no longer be limited to laboratories but can be done remotely using video images, which offers opportunities for large-scale testing in consumer science.

Download Full-text

A Lightweight Object Detection Framework for Remote Sensing Images

Remote Sensing ◽

10.3390/rs13040683 ◽

2021 ◽

Vol 13 (4) ◽

pp. 683

Author(s):

Lang Huyan ◽

Yunpeng Bai ◽

Ying Li ◽

Dongmei Jiang ◽

Yanning Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Computational Cost ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Low Level

Onboard real-time object detection in remote sensing images is a crucial but challenging task in this computation-constrained scenario. This task not only requires the algorithm to yield excellent performance but also requests limited time and space complexity of the algorithm. However, previous convolutional neural networks (CNN) based object detectors for remote sensing images suffer from heavy computational cost, which hinders them from being deployed on satellites. Moreover, an onboard detector is desired to detect objects at vastly different scales. To address these issues, we proposed a lightweight one-stage multi-scale feature fusion detector called MSF-SNET for onboard real-time object detection of remote sensing images. Using lightweight SNET as the backbone network reduces the number of parameters and computational complexity. To strengthen the detection performance of small objects, three low-level features are extracted from the three stages of SNET respectively. In the detection part, another three convolutional layers are designed to further extract deep features with rich semantic information for large-scale object detection. To improve detection accuracy, the deep features and low-level features are fused to enhance the feature representation. Extensive experiments and comprehensive evaluations on the openly available NWPU VHR-10 dataset and DIOR dataset are conducted to evaluate the proposed method. Compared with other state-of-art detectors, the proposed detection framework has fewer parameters and calculations, while maintaining consistent accuracy.

Download Full-text

The face is the index of the mind: understanding the association between self-construal and facial expressions

European Journal of Marketing ◽

10.1108/ejm-03-2019-0295 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Defeng Yang ◽

Hao Shen ◽

Robert S. Wyer

Keyword(s):

Causal Relationship ◽

Facial Expressions ◽

Social Orientation ◽

Emotional Expressions ◽

Daily Lives ◽

Content Type ◽

Self Construal ◽

Different Types ◽

The Face ◽

The Mind

Purpose This study aims to examine the relationship between consumers’ emotional expressions and their self-construals. The authors suggest that because an independent self-construal can reinforce the free expression of emotion, the expression of extreme emotions is likely to become associated with feelings of independence through social learning. Design/methodology/approach The paper includes five studies. Study 1A provided evidence that priming participants with different types of self-construal can influence the extremity of their emotional expressions. Study 1B showed that chronic self-construal could predict facial expressions of students who were told to smile for a group photograph. Studies 2–4 found that inducing people to either manifest or to simply view an extreme facial expression activated an independent social orientation and influenced their performance on tasks that reflect this orientation. Findings The studies provide support for a bidirectional causal relationship between individuals’ self-construals and the extremity of their emotional expressions. They show that people’s general social orientation could predict the spontaneous facial expressions that they manifest in their daily lives. Research limitations/implications Although this research was generally restricted to the effects of smiling, similar considerations influence the expression of other emotions. That is, dispositions to exhibit extreme expressions can generalize over different types of emotions. To this extent, expressions of sadness, anger or fear might be similarly associated with people’s social orientation and the behavior that is influenced by it. Practical implications The paper provides marketing implications into how marketers can influence consumers’ choices of unique options and how marketers can assess consumers’ social orientation based on their observation of consumers’ emotional expressions. Originality/value To the best of the authors’ knowledge, this research is the first to demonstrate a bidirectional causal relationship between individuals’ self-construals and the extremity of their emotional expressions, and to demonstrate the association between chronic social orientation and emotional expression people spontaneously make in their daily lives.

Download Full-text

Extracting Dynamic Facial Expressions from Naturalistic Videos

10.31234/osf.io/wsbdq ◽

2021 ◽

Author(s):

Jianxin Wang ◽

Craig Poskanzer ◽

Stefano Anzellotti

Keyword(s):

Facial Expression ◽

Facial Expressions ◽

Large Scale ◽

Social Cognitive ◽

Future Research ◽

Large Numbers ◽

Dynamic Facial Expressions ◽

The Face ◽

Face Stimuli ◽

Dynamics Of Action

Facial expressions are critical in our daily interactions. Studying how humans recognize dynamic facial expressions is an important area of research in social perception, but advancements are hampered by the difficulty of creating well-controlled stimuli. Research on the perception of static faces has made significant progress thanks to techniques that make it possible to generate synthetic face stimuli. However, synthetic dynamic expressions are more difficult to generate; methods that yield realistic dynamics typically rely on the use of infrared markers applied on the face, making it expensive to create datasets that include large numbers of different expressions. In addition, the use of markers might interfere with facial dynamics. In this paper, we contribute a new method to generate large amounts of realistic and well-controlled facial expression videos. We use a deep convolutional neural network with attention and asymmetric loss to extract the dynamics of action units from videos, and demonstrate that this approach outperforms a baseline model based on convolutional neural networks without attention on the same stimuli. Next, we develop a pipeline to use the action unit dynamics to render realistic synthetic videos. This pipeline makes it possible to generate large scale naturalistic and controllable facial expression datasets to facilitate future research in social cognitive science.

Download Full-text