Current students


Section: Telecommunications

Major Research topic:
A multimodal approach to face multimedia forensics challenges

Nowadays, creating and sharing multimedia content is an increasingly simple operation. Indeed, thanks to the improvement of smartphones and other consumer devices, we daily generate a large amount of material that is shared on the web and social media platforms. Moreover, as connection speed, bandwidth and storage space are not limiting factors anymore,  shared content is becoming more and more complex, migrating from simple images to high-quality videos. While this progress opens the door to new exciting possibilities, it also facilitates disseminating unpleasant material by malicious users. For this reason, the ability to analyze multimedia material automatically to prevent dangerous consequences is becoming of paramount importance.

In this work, we propose to tackle this problem using a multimodal approach, simultaneously analyzing different content modalities (e.g., video, audio, text, etc.). By exploiting consistencies and differences between different modalities, we can increase the amount of helpful information to solve specific tasks. We do so by combining signal processing and deep learning techniques. The goal is to exploit the potential of both methods by overcoming their weaknesses. As an example, using the information provided by model-based approaches, we can reduce the amount of data needed for data-driven techniques, improving the results obtained.

The proposed multimodal approach can be exploited in the Multimedia Forensics field. Here we are interested in a few specific and subtle traces contained within an image, a video or an audio track, while most of the other content (e.g., colors in the scene, the timbre of a sound, etc.) is not relevant to the purpose. With a purely data-driven approach, the risk is to learn some content-based bias that is harmful to the final goal, leading to poor models and results. On the other hand, a hybrid approach, which is both data and model-driven, allows us to focus the model on the data of our interest and improve its results. Moreover, the joint use of multiple data modalities opens the door to a new class of detectors intrinsically more robust to adversarial and anti-forensic attacks. Finally, we can use this kind of approach in many different fields. For example, jointly analyzing audio and video may help improve event classification methods. Alternatively, mixing model-based and data-driven approaches may prove useful in applications related to facial analysis.