Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
This research addresses the problem of automatically extracting semantic video scenes from feature films based on multi-modal information. A three-stage scene detection scheme is proposed. First, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. Second, audio cue is integrated to refine the scene detection results by considering various kinds of audiovisual scenarios. Finally, we introduce users into this process by allowing them to interactively tune the final results to their own satisfaction. The generated scene structure forms a compact yet meaningful abstraction of the video data, which can help facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results. © 2004 Wiley Periodicals, Inc.
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Sören Bleikertz, Carsten Vogel, et al.
ACSAC 2014
Minerva M. Yeung, Fred Mintzer
ICIP 1997
T. Syeda-Mahmood
Computer Vision and Image Understanding