|BORRELLI CLARA||Cycle: XXXIV |
Tutor: CESANA MATTEO
Advisor: SARTI AUGUSTO Major Research topic
:Merging machine learning and signal processing methods for audio applications.Abstract:
My research project focuses on the possibilities offered by machine learning techniques in audio analysis applications. In particular artificial intelligence and deep learning have been central topics in computer science research field during the last years and have been extendedly applied to audio signal processing. A common framework includes a first step of feature learning and a second classification step. These data-driven methods are based on huge collections of data and often avoid making any assumption on the characteristics of considered signals, hence withouth considering any prior knowledge of the context under analysis. This blind approach has led to ground-breaking results but it is often limited to the availability of a large dataset, where noisy or wrongly labeled samples can be present, and it does not take any advantage of the intrisic properties of audio signals.
My goal is to explore the possibility offered by classic signal processing techniques for audio signals (i.e. sub-band selection by pre filtering, noise or interference cancellation, data selection, …) and to embed information from the surrounding environment or considered application for the feature learning step. A model-based approach should be used for designing a significant representation of the audio signals, while a data-driven approach should be employed for learning a solution to complex problems. This idea can be applied in different audio applications.
Portable devices for audio reproduction, new compression algorithms and the rise of many music streaming services have allowed users to access to large databases of music anywhere and whenever desired. An urgent necessity for analysing and understanding this new way of “consuming” music has raised, leading to birth of music information retrieval research field. The aforementioned approach can be applied to classical computer music tasks, like genre recognition, instrument detection, emotion recognition, playlist generation and many others. The feature learning phase should be tailored to the specific task, introducing appropriate pre-processing blocks, properly designed filters or a dataset cleaning step.
On the other hand, acquisition devices are becoming cheaper and smaller. Microphones can be embedded and distributed in any environment, allowing the design of efficient surveillance systems and to collect recordings from a variety of acoustic scenes. Audio forensics research field can greatly benefit from these datasets, for example adopting machine learning techniques for solving open problems like audio integrity verification. In fact, audio manipulation has become easier, cheaper and can be performed automatically by specific software applications. Exploiting large data collections of audio from real environments, machine and deep learning based methods can greatly improve the detection of artefacts in maliciously modified audio signals. The proposed approach for feature design can be effective as well, tailoring the descriptors depending on the context considered (speech, noisy or controlled environment, …). I am interested in exploring the potentialities of this solution also for other audio forensics problems, like signal enhancement or audio event detection.