Abstract: In visual-inertial simultaneous localization and mapping (VI-SLAM), visual residuals are typically formulated using multiview geometry, parameterizing both camera poses and scene feature ...
Abstract: Audio-visual event localization (AVEL) aims to identify both the categories and temporal boundaries of events that are both audible and visible in unconstrained videos. However, the inherent ...