Today I read a paper titled “Toward Natural Gesture/Speech Control of a Large Display”
The abstract is:
In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI).
Together with improved speech processing technology it is an important step toward natural multimodal HCI.
However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem.
In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control.
We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure.
Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis.
We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive testbed (iMAP).
The testbed enabled elicitation of adequate, non-sequential, multimodal patterns in a narrative mode of HCI.
Conducted user studies illustrate significance of accounting for the temporal alignment of gesture and speech parts in semantic mapping.
Furthermore, co-occurrence analysis of gesture/speech production suggests syntactic organization of gestures at the lexical level.