Social Saliency Prediction

Hyun Soo Park and Jianbo Shi

University of Pennsylvania

Figure 1. We present a method to estimate the likelihood of joint attention called social saliency from a spatial distribution of social members. The inset image shows the top view of the reconstructed scene. The blue points are the points belonging to humans. The heat map shows the predicted social saliency and we overlay this map by projecting onto the ground plane in the image.


This paper presents a method to predict social saliency, the likelihood of joint attention, given an input image or video by leveraging the social interaction data captured by first person cameras. Inspired by electric dipole moments, we introduce a social formation feature that encodes the geometric relationship between joint attention and its social formation. We learn this feature from the first person social interaction data where we can precisely measure the locations of joint attention and its associated members in 3D. An ensemble classifier is trained to learn the geometric relationship. Using the trained classifier, we predict social saliency in real-world scenes with multiple social groups including scenes from team sports captured in a third person view. Our representation does not require directional measurements such as gaze directions. A geometric analysis of social interactions in terms of the F-formation theory is also presented.


Hyun Soo Park and Jianbo Shi "Social Saliency Prediction" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (oral), 2015, [paper, extendend abstract, slide (pdf), bib]


video download (238 MB)


Coming soon.