Predicting, detecting, and explaining vocal interaction in multi-party conversation
Human Communication seminar series
The automatic understanding of conversation represents a challenging problem. It not only calls for the understanding of all participants\'
contributions, but also for the inference of dependency among consecutive or simultaneous ributions across participants. Legacy approaches tend to treat participant streams independently, processing them for detection, segmentation, word recognition, phrasing, and parsing, and only then merging informations across streams. This is patently suboptimal: it precludes the use of overt participant inter-dependencies in early processing. This talk presents several generative density models of joint multi-participant behavior, suitable for vocal activity recognition, assignment of illocutionary intent and emotional state, and inference of social status.