Cite as:
Sigala R A, Serre T, Poggio T, Giese M A, Casile A, 2005, "Mid-level motion features for the recognition of biological movements" Perception 34 ECVP Abstract Supplement
Mid-level motion features for the recognition of biological movements
R A Sigala, T Serre, T Poggio, M A Giese, A Casile
Recognition of biological motion probably needs the integration of form and motion information. For recognition and categorisation of complex static shapes, recognition performance can be significantly increased by optimisation of the extracted mid-level form features. Several algorithms for the learning of optimised mid-level features from image data have been proposed. It seems likely that the visual recognition of complex movements is also based on optimised features. Exploiting a new physiologically inspired algorithm and classical unsupervised learning methods, we have tried to determine mid-level motion features that are maximally useful for the recognition of body movements from image sequences. We optimised mid-level neural detectors in a hierarchical model for the recognition of human actions (Giese and Poggio, 2003 Nature Reviews Neuroscience 4 179 - 192) by unsupervised learning. Learning is based on a memory trace learning rule: Each detector is associated with a memory variable that increases when the detector is activated during correct classifications, and that decreases otherwise. Detectors whose memory variable falls below a critical threshold 'die', and are eliminated from the model. In addition, we tested a classical principal-components approach. The model is trained with movies showing different human actions, from which optic flow fields are computed. The tested learning algorithms extract mid-level motion features that lead to a substantial improvement of the recognition performance. For the special case of walking, many of the extracted motion features are characterised by horizontal opponent motion. This result is consistent with psychophysical data showing that opponent horizontal motion is a dominant mid-level feature that accounts for high recognition rates, even for strongly impoverished stimuli (Casile and Giese, 2005 Journal of Vision 5 348 - 360). As for the categorisation of static shapes, recognition performance for human actions is improved by choosing optimised mid-level features. The learned features might predict receptive field properties of complex motion-selective neurons (eg in area KO/V3B).
[Supported by the DFG, HFSP and the Volkswagen Foundation. CBCL is supported by NIH, Office of Naval Research, DARPA and National Science Foundation.]
These web-based abstracts are provided for ease of seaching and access, but certain aspects (such as as mathematics) may not appear in their optimum form. For the final published version of this abstract, please see
ECVP 2005 Abstract Supplement (complete) size: 2356 Kb