Next-generation technologies, ranging from driverless cars to immersive virtual reality, are expected to understand and analyse the surrounding world through a range of high-resolution sensors. In particular, 3D audio sensors will endow them with a clear spatial sense of the environment on-par with the auditory human system. Exploiting this information can provide agents and autonomous applications with the capability of localizing and conveying sounds more efficiently and with a higher level of perceptual awareness. At the same time, analysing 3D raw audio data in real-time poses new, significant research and implementation challenges preventing a successful deployment. Algorithms should be able to understand the spatial distribution of audio sources in the sound field while, at the same time, allowing for efficient inference from the raw waveforms in a variety of applications.
The aim of the HYD3A project (pronounced as ¿idea¿) is to design a family of deep learning algorithms tailored to such 3D audio signals for deployment in immersive environments. To accomplish this goal, the algorithms will leverage a new generation of deep neural networks to model and learn signals in hypercomplex (e.g., quaternion) domains.
Prior research has shown that 3D audio can be naturally modelled in a hypercomplex representation. HYD3A will build upon these insights to design a set of deep networks for analysing 3D audio coming from a variety of microphone sensors. Hypercomplex deep networks have the potential to reduce significantly the network complexity with respect to state-of-the-art competitors (thus simplifying their implementation on-device), while allowing for a more accurate learning and optimization procedure.
HYD3A is expected to have a positive impact, both in research and industry, for a range of problems involving the analysis of 3D audio, including immersive sound localization, audio enhancement, acoustic scene recognition, and audio super-resolution.