|Thesis abstract: |
This thesis describes an innovative extension of morphological operators to three dimensional images represented by voxels. These new morphological operators are applied to the analysis and classification of a database of human actions in a predefined vocabulary of gestures. The database is composed by the volumetric reconstructions of sequences of poses performed by one actor in a scene captured with a multi-Kinect system developed in our laboratory (ISPG). We span the entire pipeline from calibration, capturing process, preprocessing and volume reconstruction, till topological skeleton extraction, surface representation and classification.
The first part of our research is dedicated to the extraction of volumetric information starting from the acquisition system implemented. The use of a 3D reconstruction technique, prior to any analysis or recognition routine, allows the recognition system to work directly on 3D data. Problems like viewpoint dependencies and motion ambiguities are inherently solved. We show how the knowledge of the underlying depth map together with a visual snapshot of the scene can greatly improve the robustness of points matching in wide-baseline contexts with respect to the state of the art descriptors.
Frame-by-frame 3D representations of the scene in terms of voxels have been the input data for any other successive analysis and processing. We extend the morphological skeleton extraction algorithm to 3D and we develop a new 3D thinning algorithm for the computation of an approximation of the topological curve skeleton. Our algorithm provides good results, preserves topology, is easy to implement and shows noise robustness.
We consider two application scenarios within the context of human-computer interaction: Surface Reconstruction and Human Action Recognition. The proposed morphological skeleton extraction algorithm provides a method to reconstruct the actor body surface that is accurate and computationally inexpensive. Moreover, working with morphological operators, no resolution requirements are imposed. The developed 3D thinning algorithm highlights the movement incrementing the similarity between sequences representing the same action, even if performed by actors with different gender or different body structure. Extending the descriptor used to find robust stereo-correspondences, we build motion features that are invariant respect to the actor position and orientation in the scene. The classification shows good results and the improvements using our thinning algorithm are demonstrated by the classification rate.