Today I read a paper titled “Dynamic Feature Description in Human Action Recognition”
The abstract is:
This work aims to present novel description methods for human action recognition.
Generally, a video sequence can be represented as a collection of spatial temporal words by detecting space-time interest points and describing the unique features around the detected points (Bag of Words representation).
Interest points as well as the cuboids around them are considered informative for feature description in terms of both the structural distribution of interest points and the information content inside the cuboids.
Our proposed description approaches are based on this idea and making the feature descriptors more discriminative.