|CANNICI MARCO||Cycle: XXXIV |
Section: Computer Science and Engineering
Tutor: AMIGONI FRANCESCO
Advisor: MATTEUCCI MATTEO Major Research topic
:Deep learning approaches for event-based high-speed roboticsAbstract:
Event-based cameras are vision sensors that attempt to emulate the functioning of biological retinas. Unlike conventional cameras, which generate frames at a constant rate, these devices output sequences of asynchronous events that efficiently encode pixel-level brightness changes caused by objects moving inside the scene. The growth in popularity of this type of sensors, due to their benefits in terms of power consumption and temporal resolution, has encouraged the development of new event-based algorithms for various applications, e.g., depth estimation and 3D reconstruction, optical flow estimation, visual odometry and SLAM, as well as object tracking.
The most widespread method to extract visual features from such type of data are Spiking Neural Networks (SNNs), a type of artificial neural networks based on units that communicate with each other through spikes in a similar way as real neurons do. Despite being more efficient in terms of energy consumption, the use of spikes to communicate limits their expressive power and makes this kind of models not differentiable and therefore difficult to properly train. For these reasons, very few works to date have tackled complex vision tasks such as object detection and semantic segmentation.
Due to the proven effectiveness of conventional artificial neural networks in complex vision tasks involving standard cameras, especially convolutional and recurrent architectures, research is now also focusing on their use in event-based computation. However, designing real-time neural networks capable of effectively handle events while exploiting the sparseness of event streams is still an open issue.
This research aims to explore new approaches to handle events using conventional neural models by focusing on designing effective event representations, able to adapt to the scene characteristics, and frame-free architectures that intrinsically exploit the sparse nature of the event-based visual encoding. To achieve this goal, techniques such as automatic differentiation, graph neural models, recurrent networks and efficient architectures for point cloud processing will be considered during research.