Recognizing visual attention using dynamic head pose-gaze mapping and the robot conversational state
Samira Sheikhi, EPFL
The ability to recognize the Visual Focus of Attention (VFOA, i.e. what or whom a person is looking at) of people is important for robots or conversational agents interacting with multiple people, since it plays a key role in turn-taking, engagement or intention monitoring. As eye gaze estimation is often impossible to achieve, most systems currently rely on head pose as an approximation, creating ambiguities since the same head pose can be used to look at different VFOA targets. To address this challenge, we propose a dynamic Bayesian model for the VFOA recognition from head pose, where we make two main contributions. First, taking inspiration from behavioral models describing the relationships between the body, head and gaze orientations involved in gaze shifts, we propose novel gaze models that dynamically and more accurately predict the expected head orientation used for looking in a given gaze target direction. Secondly, we propose to exploit the robot conversational state as context to to reduce recognition ambiguities. In this talk I also give a quick overview on head pose tracking, addressee estimation and nod recognition tasks studied as part of the Humavips European project.