|Thesis abstract: |
In the latest decades, the global amount of digital audio content has considerably increased, leading to music organization, navigation, and browsing issues. In order to handle large music libraries, novel paradigms are needed. Current paradigms widely rely on the context-based approach, which concerns using music descriptors that are manually tagged by humans. Context-based approach involves both meta-information, such as title or artist, and high-level features, which provide semantic rich information on other relevant aspects of music, such as its emotional content. High-level features are classified into binary or dimensional descriptors. Binary descriptors express whether a feature applies to a music excerpt, whereas dimensional descriptors express how much it applies. However, since context-based descriptors need manual annotation, they are not scalable for large music libraries. Moreover, binary features do not model the degree of descriptiveness with respect to the audio content. Novel paradigms should involve dimensional high-level descriptors that can be automatically computed from the audio signal (content-based approach). This task concerns two levels: low-level features, directly extracted from the audio signal, which provide a numeric description feasible for scalable automatic computation and high-level features, which is understandable by human. The so-called ¿gap between low-level and high-level features¿ can be filled by means of automatic machine-learning techniques. The goal of this work is to develop dimensional semantic high-level descriptors by using machine-learning techniques.
As far as low-level features are concerned, the community has proposed several descriptors and several machine-learning techniques have been applied. In the past few years, deep learning techniques have been applied to the music information retrieval problems and have shown promising results. We also intend to focus on deep learning techniques in order to extract more effective low-level descriptors from the signal and to infer high-level descriptors. As far as high-level features are concerned, dimensional semantic high-level descriptors are especially considered. These semantic descriptors can model the natural language descriptions and they are also feasible for visualizing and organizing large music libraries. Due to the time-varying nature of music, high-level descriptors are usually referred to excerpts within a song and univocally express the semantics of the whole song is a hard task. Models that take into account the music semantics time-variance are also considered.