Abstract
There is a growing evidence that visual saliency can be better modeled using top-down mechanisms that incorporate object semantics. This suggests a new direction for image and video analysis, where semantics extraction can be effectively utilized to improve video summarization, indexing and retrieval. This paper presents a framework that models semantic contexts for key-frame extraction. Semantic context of video frames is extracted and its sequential changes are monitored so that significant novelties are located using a one-class classifier. Working with wildlife video frames, the framework undergoes image segmentation, feature extraction and matching of image blocks, and then a co-occurrence matrix of semantic labels is constructed to represent the semantic context within the scene. Experiments show that our approach using high-level semantic modeling achieves better key-frame extraction as compared with its counterparts using low-level features.