Abstract
In recent work on image and video retrieval there seems to be a shift of focus from low-level feature extraction to producing high-level semantic representation of scenes. This paper presents a framework that produces semantic context features from video frames which are then employed for key-frame extraction. Working with wildlife video frames, the framework starts with image segmentation, followed by low-level feature extraction and classification of the image blocks extracted from image segments. Based on the image block labels in the neighbourhood a co-occurrence matrix is then constructed to represent the semantic context of the scene. The semantic co-occurrence matrices then undergo binarization and principal component analysis for dimension reduction, forming the feature vectors used in a one-class classifier that extracts the key-frames. Experiments show that the utilization of high-level semantic features result in better key-frame extraction when compared with methods using low-level features only.