Today I read a paper titled “Watch-n-Patch: Unsupervised Learning of Actions and Relations”
The abstract is:
There is a large variation in the activities that humans perform in their everyday lives.
We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting.
Our model learns high-level co-occurrence and temporal relations between the actions.
We consider the video as a sequence of short-term action clips, which contains human-words and object-words.
An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with.
We then propose a new probabilistic model relating the words and the topics.
It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works.
We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching.
For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects.
Moreover, we develop a robotic system that watches people and reminds people by applying our action patching algorithm.
Our robotic setup can be easily deployed on any assistive robot.