iCatcher+: Robust and automatic gaze classification of infant webcam videos
Studies of human infants provide a window into the origins of the mind, and collecting and annotating behavioral data from them remains slow and laborious. Although online platforms enable families to participate in studies via webcam, it remains time- and energy- consuming to manually annotate gaze directions on the collected videos. Existing gaze coding algorithms on videos either suffer from low video quality or still require considerable manual effort. In this project, we built on a promising system for automatic gaze annotation in human infants, iCatcher (Erel et al., 2020), and added two key additional features: a more robust infant face detector, and a gaze estimator that takes into account not only the features of the selected face, but where and how far it is from the screen. In our framework, iCatcher+, all possible faces in a video frame are first extracted by a face extractor, and then the infant face is selected by a infant selector; finally, the gaze direction is classified by a gaze estimator based on both the selected face and features of its bounding box. We evaluated iCatcher+ on a large-scale infant video dataset collected via webcam. The experimental results show improvements in gaze estimation accuracy, compared to the baseline framework. We see iCatcher+ as a key tool for enabling rapid large-scale studies of infant behavior.