Abstract
In order to improve scene understanding, the focus in image and video retrieval research has been shifted from low-level feature extraction to producing high-level semantic representation of scenes. This paper presents a framework that produces semantic context features for image frame understanding and further employs a one-class classifier for key-frame extraction. Working with wildlife video frames, the framework starts with image segmentation, followed by low-level feature extraction and classification of the image blocks extracted from image segments. The labeled image blocks are then scanned through to generate a co-occurrence matrix of object labels, representing the semantic context within the scene. The semantic co-occurrence matrices then undergo binarization and principal component analysis for dimension reduction, forming the basis feature for frame representation. Experiments show that the utilization of high-level semantic features result in better key-frames extracted semantically as compared with using low-level features.