This project investigates advanced methodologies for forensics and anti-forensics analysis, specifically focusing on the family of source device identification problems.
In the last 15 years, social media giants like Facebook, Instagram and Twitter have become the primary communication channel for sharing content with friends, but also for marketing purposes and politics propaganda. This ingredient, mixed into the diffusion of innovative techniques for easily editing digital multimedia content, moves a wildfire spreading of easy-to-access content on the web. Consequently, the average time for absorbing and sharing new information is definitely reduced to few seconds. Experts talk about information disorder: visual media are downloaded, modified and re-shared so rapidly that it is everyday more a hard problem to identify their source and the differences between original and modified content. This revolution in visual information necessarily asks for efficient and smart solutions to assess the trustworthiness of digital multimedia data. Indeed, finding ways to authenticate images or videos could help minimize the impact of visual content manipulation.
One current issue for forensics analysts is that of source device identification, namely, given some multimedia content, estimating which is the actual device who took it. This is actually a very fine request, because it requires to estimate the precise source device, not just the corresponding camera model or vendor. Being able to uniquely associate a visual media to the device that acquired such a content may provide precious evidence both during investigations and before a court of law. For example, it can expose copyright violations or point to the authors of hideous crimes such as acts of terrorism or child exploitation.The key assumption for source identification is that acquisition devices leave distinctive traces in the acquired content. State-of-the-art solutions rely on camera photo response non-uniformity (PRNU), introduced in all the acquired images and videos by imperfections in the sensor manufacturing process. If attributing images to their correct device is a relatively known problem in forensics community, it has a few limitations when dealing with very large image databases, for example related to memory occupation and long computational times. Exploiting sensor traces in video domain is even a more challenging problem, and several peculiar issues need to be addressed to obtain satisfactory performances. Indeed, differently from images, videos are almost always compressed with relatively low quality. Moreover, further issues have risen in the last 5 years, when smartphones have started including motion stabilization technologies which strongly hinder the performances of standard forensics techniques for device identification.
Recently, the anti-forensics perspective to the source device identification problem has started gaining importance as well. The goal, in this case, is opposite: anonymize images and videos such that these can be no more linked to the original device. The motivations behind this growing research field are two: (i) preserve the privacy of data owner, like photoreporters in countries at war, human right defenders and activists; (ii) spot the weaknesses of forensics algorithms for source device identification. In order to test the vulnerability of commonly used algorithms against an attacker who manipulates the data for anonymization, it is a good thing to step into attacker shoes, trying to make the source device identification algorithms fail.
Given these premises, this PhD research proposes novel contributions both in forensics and anti-forensics analysis, specifically focusing on the family of source device identification problems. Furthermore, we also include researches regarding other applications which require similar technical approaches to those seen in forensics investigations. In particular, the research activities can be split into three main areas:
Forensics methods for source device identification
- Investigation of forensics methods for source device identification;
- Investigation of anti-forensics methods for image anonymization;
- Other applications.
We face the source device identification problem for both images and videos. Different strategies are proposed, which can basically split into two categories: model-based and CNN-based approaches. Regarding model-based strategies, we solve the source device identification problem on stabilized video sequences, and study how to detect and localize splicing portions in video compilations. The CNN-based approach consists of a strategy based on convolutional neural networks to solve the image-device attribution problem.
Investigations of source device identification on stabilized videos have been motivated by the recent advancements in motion stabilization of video content, which strongly hinder robustness of PRNU-based techniques. Indeed, video stabilization introduces geometric transformations to video frames, thus making camera fingerprint estimation problematic with classical approaches. In order to deal with the challenging problem of attributing stabilized videos to their source device, we propose: (1) two different techniques to extract the characteristic fingerprint of a stabilized device, starting from a set of captured images; (2) a strategy to extract the device fingerprint using only stabilized video sequences taken by the device. Then, we provide two diverse strategies to match a stabilized video sequence with a given fingerprint. We leverage global optimization strategies (e.g., particle swarm optimization and genetic algorithms) and Fourier-Mellin transform in order to match each video to the correct device.
The research on blind detection and localization of video temporal splicing has been inspired by the diffusion on the web of user-generated video compilations, obtained by splicing together in time various video shots coming from different devices. In order to perform forensic analysis on this kind of videos, it can be useful to split the whole sequence into the set of originating shots. As video shots are seldom obtained with a single device, we propose to identify each video shot exploiting sensor-based traces. We consider the challenging scenario in which videos are composed by few-seconds shots with variable and unknown length, coming from an unknown number of never seen devices. We aim at blind detecting and localizing the splicing points. Until now, data-driven approaches based on convolutional neural networks (CNNs) have limited their studies to camera model identification and not to source device identification. Indeed, these approaches present very accurate performances when discriminating from one model to another but cannot identify the specific device which shot an image. In this project, we propose a CNN-based approach for source device identification on images. We make use of the characteristic fingerprint of each device to train a CNN able to distinguish the subtle sensor traces left on images and match them with the correct source. While being computationally efficient (i.e., required times are always below those of standard approaches), our method works using a reduced portion of the image, thus allowing to save memory occupation.
Furthermore, the experience acquired during investigations enabled us to extend our studies on diverse forensics activities which are not strictly related to the source device identification problem. As an example, we tackle the problem of estimating the number of JPEG compressions an image underwent, specifically up to 4 compression steps. Indeed, images available online are likely to be the result of a multi-processing chain, engendering concerns about their authenticity and integrity. Usually, images are compressed according to JPEG standard, which leaves on each picture peculiar traces that can be exploited for forensic investigations. Our approach leverages the task-driven non-negative matrix factorization (TNMF) model to extrapolate information from the discrete cosine transform of the image under analysis.Anti-forensics methods for image anonymization
Being the anti-forensics analysis yet in its infancy, this project investigates anonymization strategies dealing only with images, leaving the study on video sequences to future. Specifically, we propose two different strategies for attenuating the traces of PRNU left on images: a strategy based on image inpainting and one based on convolutional neural networks.
Inpainting-based image anonymization is based on the concept of deleting and reconstructing the image pixels in order to attenuate the device noise traces. We delete a predefined set of pixels and inpaint them from neighbours exploiting regularization techniques. This operation helps to reduce the traces of device PRNU left on the image, thus lowering the cross-correlation test with the device while guaranteeing an acceptable image visual quality. This strategy is totally blind, namely it does not require any priors about the device fingerprint to remove.
Contrarily to the inpainting strategy proposed, we present also a method based on the knowledge of the device PRNU, in which an image-wise anonymization loop is built upon a CNN-based noise extractor. Specifically, an autoencoder-inspired fully-convolutional neural network is trained as anonymization function via back-propagation, exploiting the possibilities offered by a recently introduced CNN-based denoising method. We follow a different perspective with respect to all standard approaches to CNNs: instead of training a CNN on many images to learn a generalizable method, we “overfit” the proposed CNN on each single image to be anonymized. In other words, we consider the CNN as a parametric operator.Other applications
The know-how acquired during the PhD program on regularization techniques for inpainting and denoising of digital images, in addition to the matured experience on convolutional neural networks, has generated the possibility to solve similar problems belonging to fields of research completely far from forensics. For instance, even if including higher dimension data, different constraints and priors, denoising and inpainting problems are commonplace in geophysics applications as well. Some examples can be found in removal of random and/or coherent noise in seismic pre-processing workflow, or in reconstruction of regular and densely-sampled seismic traces. Indeed, there is a keen interest in the geophysics community regarding data denoising and interpolation.Given these premises, our research includes also investigations of CNN architectures for the goal of denoising and/or interpolation of 2D common shot gathers. Inspired by the great contributions achieved in image processing and computer vision, we investigate a particular architecture referred to as U-net, which implements a convolutional autoencoder able to describe the complex features of clean and regularly sampled data for reconstructing the corrupted ones. Future perspectives
The research done in this project enabled us defining a set of future research lines. As a matter of fact, we are currently working on source device identification considering stabilized video sequences. Since in-camera video stabilization affects the residuals of PRNU in video frames by introducing geometric transformations different from frame to frame, there is still room for improvement in developing even more efficient and fast strategies to identify the source device and estimate a reliable fingerprint directly from stabilized videos. Differently from the already investigated model-based approaches, we aim at exploiting CNN-based strategies, conceptually similar to that proposed for image-source device identification.
Specifically, we set two goals: (i) given a device reference fingerprint, training a CNN which takes as input the video frames and the device fingerprint and returns the probability that the video sequence comes from that device; (ii) given a set of video frames coming from the same device, computing the reference fingerprint of the source device.The major challenge with respect to CNN applied to images is that architecture and training strategy must be chosen in order to be robust to the transformations introduced by stabilization. In other words, we aspire to exploit the CNN as a sort of video de-stabilizer: given a stabilized video, the network should learn how to follow the subtle sensor traces left on frames to correctly realign them, in such a way as to delete the effect of stabilization.