|SHAHNAWAZ MUHAMMAD||Cycle: XXX |
Tutor: MONTI-GUARNIERI ANDREA VIRGILIO
Advisor: SARTI AUGUSTO Abstract:
Interactive Spatial Sound Rendering through Binaural Audio
During past few decades a great deal of effort has been put into technologies for immersive and realistic multimedia experiences, which has brought novel applications in the area of tele-collaborations, simulators for military training and entertainment. So far audio engineers have provided same level of immersion using multi-channel sound systems. However, more recently the availability of personalized 3D video rendering systems like Oculus rifts, Morpheus and Samsung gear set a new challenge and demand the same level of immersive experience on personal devices. The best solution for that is to use binaural audio rendering.
Listening to the ordinary audio gives the impression to listeners that all the sound sources are located at the position of speaker. While listening to a 3D audio makes a user able to locate sound source anywhere in space, i.e. listen a bird flying over the head from one direction to other, whisper in one ear and listening to music as if the user were present in a live performance. In order to provide realistic 3D audio immersion, sound reproduction system should either produce the actual sound field around the head of the user or produce a virtual impression of being in the same sound field by producing the same sound pressure level in the air canals of the user as would have been produced by an actual sound source at a specific position in the space. To produce the actual sound field we require a spatial arrangement of multiple speakers around the listener and complex sound rendering systems and techniques. The sound field gets spatially filtered by the ear lobes, head and torso of the listener before it reaches to ear drums. Anatomy of ear, head and torso controls the perception of 3D sound. Sound reaches at both ears at different times i.e. if a sound source is on our right side, sound takes some extra microseconds to reach to the left ear. The sound on right will also be louder than the sound received on left ear. These differences of arrival time and level are called interaural time and level difference (ITD and ILD respectively). In addition to this, interaction of sound field with ear, torso and head anatomy creates listener¿s specific spatial filtering effects which can be modeled as LTI and FIR filter response called HRIR (head related impulse response). Brain uses these cues to locate the sound source in 3D space. Second method is to bypass the complex rendering process and directly inject the 1D sound signals into the ear canals of the listener using simple in-ear headphones. As this is done by just using two playback channels one for each ear it is called binaural audio rendering. In this case we have to incorporate the response of the body of listener in sound signal which was being done automatically in the first case. Due to fact that anatomy of the individual¿s head is unique these filters are idiosyncratic in nature. This means in order to provide 3D audio to a user, personal HRIR of that specific user should be used otherwise it will result in localization errors. The reproduction of spatial sound through binaural rendering can be considered as a threefold problem. First; virtualization of the body response (HRIR) of the listener, second; the incorporation of this response in the audio stream in order to repro duce the personalized 3D audio for listener and third; generation of binaural 3D sound content. The most important and crucial process is the measurement of HRIR for every individual user. It requires lengthy acoustic measurements inside a semi-anechoic room and requires user to stand still for this duration. It is also very costly in terms of money, effort and time.
During the course of this PhD research my aim is to explore the possibility to make the HRTF generation process as simple and unobtrusive as possible. In order to achieve this goal there are a number of research directions that can be pursued. The first and most crucial operation where we can introduce some simplifications is the measurement process of HRIR and binaural cues. We will start by exploring the possibilities of simplifying the process of HRIR generation for an individual, initially starting from 3D models. There are already solutions today that allow the pre-calculation of the HRIR using the 3D model of head and ears. This will save user to go into an anechoic chamber and go through long and complex acoustic measurement and will shift the problem to find an accurate 3D model of ear lobes and head. Several techniques are available for generating a 3D model in a simple and accurate fashion. Taking this direction will require to explore the possible solutions to do extraction of 3D model of earlobes for the user or get, user model by performing interpolation between the existing and available 3D models in the database. First we will try to do this interpolating the existing earlobes and estimate the HRIR of the user using that estimated model. Further extensions of this work could consist of skipping this interpolation of 3D models and work directly on the interpolation of HRIRs stored in a database using different acoustic cues or images of the ear lobes, head and torso. Solutions of this sort would have a tremendous impact on the research community. For the purpose of testing we will also require to generate 3D audio content. This has some problems and additional tools like object based acoustic scene modelling, up mixing and down mixing to generate 3D audio contents. Depending on the direction that my thesis take I will decide on which one I have to embark into depending on the needs.
Advisor: Dr. Augusto Sarti