Today I read a paper titled “Fast keypoint detection in video sequences”
The abstract is:
A number of computer vision tasks exploit a succinct representation of the visual content in the form of sets of local features.
Given an input image, feature extraction algorithms identify a set of keypoints and assign to each of them a description vector, based on the characteristics of the visual content surrounding the interest point.
Several tasks might require local features to be extracted from a video sequence, on a frame-by-frame basis.
Although temporal downsampling has been proven to be an effective solution for mobile augmented reality and visual search, high temporal resolution is a key requirement for time-critical applications such as object tracking, event recognition, pedestrian detection, surveillance.
In recent years, more and more computationally efficient visual feature detectors and decriptors have been proposed.
Nonetheless, such approaches are tailored to still images.
In this paper we propose a fast keypoint detection algorithm for video sequences, that exploits the temporal coherence of the sequence of keypoints.
According to the proposed method, each frame is preprocessed so as to identify the parts of the input frame for which keypoint detection and description need to be performed.
Our experiments show that it is possible to achieve a reduction in computational time of up to 40%, without significantly affecting the task accuracy.