Deep Spatial-Temporal Field for Human Head Orientation Estimation

Author(s):  
Zhansheng Xiong ◽  
Zhenhua Wang ◽  
Zheng Wang ◽  
Jianhua Zhang
Author(s):  
Stephanie Tan ◽  
David M. J. Tax ◽  
Hayley Hung

Human head orientation estimation has been of interest because head orientation serves as a cue to directed social attention. Most existing approaches rely on visual and high-fidelity sensor inputs and deep learning strategies that do not consider the social context of unstructured and crowded mingling scenarios. We show that alternative inputs, like speaking status, body location, orientation, and acceleration contribute towards head orientation estimation. These are especially useful in crowded and in-the-wild settings where visual features are either uninformative due to occlusions or prohibitive to acquire due to physical space limitations and concerns of ecological validity. We argue that head orientation estimation in such social settings needs to account for the physically evolving interaction space formed by all the individuals in the group. To this end, we propose an LSTM-based head orientation estimation method that combines the hidden representations of the group members. Our framework jointly predicts head orientations of all group members and is applicable to groups of different sizes. We explain the contribution of different modalities to model performance in head orientation estimation. The proposed model outperforms baseline methods that do not explicitly consider the group context, and generalizes to an unseen dataset from a different social event.


Author(s):  
Qiang Yang ◽  
Yuanqing Zheng

Voice interaction is friendly and convenient for users. Smart devices such as Amazon Echo allow users to interact with them by voice commands and become increasingly popular in our daily life. In recent years, research works focus on using the microphone array built in smart devices to localize the user's position, which adds additional context information to voice commands. In contrast, few works explore the user's head orientation, which also contains useful context information. For example, when a user says, "turn on the light", the head orientation could infer which light the user is referring to. Existing model-based works require a large number of microphone arrays to form an array network, while machine learning-based approaches need laborious data collection and training workload. The high deployment/usage cost of these methods is unfriendly to users. In this paper, we propose HOE, a model-based system that enables Head Orientation Estimation for smart devices with only two microphone arrays, which requires a lower training overhead than previous approaches. HOE first estimates the user's head orientation candidates by measuring the voice energy radiation pattern. Then, the voice frequency radiation pattern is leveraged to obtain the final result. Real-world experiments are conducted, and the results show that HOE can achieve a median estimation error of 23 degrees. To the best of our knowledge, HOE is the first model-based attempt to estimate the head orientation by only two microphone arrays without the arduous data training overhead.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3073 ◽  
Author(s):  
Christian Nawroth ◽  
Alan G. McElligott

Animals domesticated for working closely with humans (e.g. dogs) have been shown to be remarkable in adjusting their behaviour to human attentional stance. However, there is little evidence for this form of information perception in species domesticated for production rather than companionship. We tested domestic ungulates (goats) for their ability to differentiate attentional states of humans. In the first experiment, we investigated the effect of body and head orientation of one human experimenter on approach behaviour by goats. Test subjects (N = 24) significantly changed their behaviour when the experimenter turned its back to the subjects, but did not take into account head orientation alone. In the second experiment, goats (N = 24) could choose to approach one of two experimenters, while only one was paying attention to them. Goats preferred to approach humans that oriented their body and head towards the subject, whereas head orientation alone had no effect on choice behaviour. In the third experiment, goats (N = 32) were transferred to a separate test arena and were rewarded for approaching two experimenters providing a food reward during training trials. In subsequent probe test trials, goats had to choose between the two experimenters differing in their attentional states. Like in Experiments 1 and 2, goats did not show a preference for the attentive person when the inattentive person turned her head away from the subject. In this last experiment, goats preferred to approach the attentive person compared to a person who closed their eyes or covered the whole face with a blind. However, goats showed no preference when one person covered only the eyes. Our results show that animals bred for production rather than companionship show differences in their approach and choice behaviour depending on human attentive state. However, our results contrast with previous findings regarding the use of the head orientation to attribute attention and show the importance of cross-validating results.


2014 ◽  
Vol 6 (0) ◽  
pp. 63-67 ◽  
Author(s):  
Mitsuru Nakazawa ◽  
Ikuhisa Mitsugami ◽  
Hirotake Yamazoe ◽  
Yasushi Yagi

Sign in / Sign up

Export Citation Format

Share Document