Today I read a paper titled “Keypoint Encoding for Improved Feature Extraction from Compressed Video at Low Bitrates”
The abstract is:
In many mobile visual analysis applications, compressed video is transmitted over a communication network and analyzed by a server.
Typical processing steps performed at the server include keypoint detection, descriptor calculation, and feature matching.
Video compression has been shown to have an adverse effect on feature-matching performance.
The negative impact of compression can be reduced by using the keypoints extracted from the uncompressed video to calculate descriptors from the compressed video.
Based on this observation, we propose to provide these keypoints to the server as side information and to extract only the descriptors from the compressed video.
First, we introduce four different frame types for keypoint encoding to address different types of changes in video content.
These frame types represent a new scene, the same scene, a slowly changing scene, or a rapidly moving scene and are determined by comparing features between successive video frames.
Then, we propose Intra, Skip and Inter modes of encoding the keypoints for different frame types.
For example, keypoints for new scenes are encoded using the Intra mode, and keypoints for unchanged scenes are skipped.
As a result, the bitrate of the side information related to keypoint encoding is significantly reduced.
Finally, we present pairwise matching and image retrieval experiments conducted to evaluate the performance of the proposed approach using the Stanford mobile augmented reality dataset and 720p format videos.
The results show that the proposed approach offers significantly improved feature matching and image retrieval performance at a given bitrate.