Immersive Audio Rendering Using Higher Order Ambisonics

The creation of immersive 3-D sound over headphones has had renewed interest of late, in part due to technical advances in ultra high-definition video displays and interactive virtual reality headsets as well as production support for consumer-based 360 degree and audio content generation and consumption. Binaural surround-sound delivered over headphones is commonly used to accompany such immersive displays and is challenged with forming realistic (or hyper-real) soundfields that are experienced with good externalisation and localisation.

Binaural audio can be created by filtering source material with HRTFs (Head-Related Transfer Functions), which describe the interaction between the head, torso and ears of a listener and a sound source at a given angle relative to the head. HRTFs are typically measured at fixed angles relative to the head from probe microphones in a subject's ears or from binaural dummy head microphones. Such filters should ideally render source material that is externalised with an intended source angle and distance.

Rendering of sound source positions can be achieved by treating the measured HRTFs as virtual loudspeakers, where the measurement positions are chosen to represent the positions of a loudspeaker array around the head. In this research, Ambisonic soundfield decoding is used to generate the loudspeaker feeds as Ambisonic recordings and mixes can be readily translated via rotation matrices to counter head movements detected by a motion tracker. Furthermore, Ambisonics is supported by soundfield recording microphones such as tetrahedral or spherical microphone arrays.  

This research addresses significant open questions on the applicability of Binaural-Based Higher Order Ambisonics (BBHOA) for virtual reality applications. The perceptive timbral and spatial differences between BBHOA orders, decoding schemes, loudspeaker configurations and HRTF types are quantified in the context of head-tracked binaural rendering and the commonly utilised Binaural-Based First Order Ambisonics (BBFOA) rendering. Such differences are required to be characterised in an effort to ensure high-quality consumption of Ambisonic content for the majority of listeners. This research seeks to understand the most important qualitative differences between BBFOA and BBHOA under the conditions of different HRTF datasets for the majority of listeners.

Members

  • Gavin Kearney
  • Calum Armstrong
  • Tom McKenzie
  • Lewis Thresh