We use our auditory system not only to listen and recognize sounds, but also to make spatial sense of the surrounding environment and navigate in it. The sense of spatial immersion in a sound field allows the user to clearly understand every sound surrounding in it, as well as any acoustic environment characterized by certain sounds. The ELeSA project is mainly focused on the 3D acoustic scene analysis and understanding to detect, localize and classify sound sources and perfectly describe their nature. This goal also entails the audio quality enhancement of the signals recorded within the acoustic scene surrounding the user by means of 3D microphone arrays. 3D acoustic scene analysis can have a great impact in many applications including audio virtual reality, speech and sound recognition, safe and security. However, the same approach can be applied in many other fields of applications, from telecommunications to electronics to physics and manufacturing industry.
In order to accomplish this goal, suitable and powerful algorithms can be developed and implemented based on the advanced machine learning paradigm called end-to-end learning. Using traditional machine learning methods, the output of the models is as much accurate as the choice of the feature selected for a specific task. However, an optimal feature selection involves an a priori knowledge of the input signals, which is not always possible in practical applications. End-to-end learning directly processes raw data, thus enabling the processing of more complex structured data and resulting in more natural and reliable output. The only drawback of end-to-end learning is the huge amount of computational resources required. However, this issue can be easily solved by using high performance GPU servers.
In the medium and long terms, we expect the ELeSA project to have a positive impact, both on industry and on the research community for the wide range of solutions that can be applied to several scenarios.
The project will develop and synergically integrate different audio processing, machine audition, deep learning, and auditory augmented reality technologies for improving the audio scene awareness and the safety of subjects wearing headsets for audio protection or entertainment. ELeSA is not aimed at informing people about occurring dangers, but making them aware of what's going on around them so they can protect themselves. ELeSA is aimed at restoring and augmenting the audio scene perception we have without wearing a headset, while still preserving all benefits of hearing noise protection and infotaiment fruition.
Examples of application scenarios are: workers in a factory wearing audio protections (indoor scenario), and a user wearing headset for infotainment (outdoor scenario). To make possible the above scenarios, the research project will focus on different technologies that could have an important scientific, technological, social and economical impact.
From a scientific and technological point of view, the ELeSA project will represent a milestone for both machine audition and audio augmented reality research fields, due to the wide range of cost effective developed solutions. To allow the feasibility of the above mentioned application scenarios, all operations need to satisfy hard real time constraints, which actually represents one of the main challenges of the proposed project. In fact, many of these audio tools (in particular, the machine audition solutions for acoustic scene understanding) are typically implemented offline without strict latency limitations whereas, in the application contexts here addressed, more responsive execution times are required to reproduce realistic sound fields through headsets aimed at preventing the occurrence of risks and thus enhancing the personal safety.
It is possible to provide many examples of applications with high social impact that could be developed with the real-time technologies studied in this project. The factory scenario could be extended to allow improved audio awareness and safety of workers wearing ear protections in outdoor construction sites, shipyards, or mines. The technologies could also be applied to in-vehicle infotainment systems. The vehicle and the infotainment audio shelter the driver and the passengers from the disturbing sounds of the road. Reproducing with the right directional cue specific sounds, e.g., the siren of emergency vehicles or honks, could improve safety of the driver and passengers. The augmented audio scene perception could be helpful in find and rescue missions after earthquakes, building collapses, or avalanches. It could be a fundamental aid for allowing firefighters to move in a building saturated by smoke at the search of possible victims.
The novel real-time technologies could also be employed in remote operation of robots or drones. Using microphone arrays, the robot/drone could analyze the surrounding audio scene that will be faithfully reproduced to a far-end operator in control of its actions.
The technologies could also be integrated in hearing aids, again with the intent to provide an increased awareness of the audio scene and to increase safety of the user. They could be integrated with machine vision to help blind people movement and safety.
Audio surveillance systems could also benefit from the machine audition, sound detection and localization solutions developed within this project.
The technologies could also be adapted and exploited for applications not related with security, like cultural heritage, games, audio virtual reality, and infotainment in general.
Recent commercial devices that could integrate and benefit from the scientific and technological achievements of this project are the smart glasses, as those recently proposed for augmented audio reality.
We expect the ELeSA project to have a positive impact on the society, particularly thanks to the user safety tools that will be developed. We believe that the proof of concepts developed during the lifetime of the project will steer the attention of the industry towards devices that merge headsets and compact microphone arrays to develop innovative industrial products. The project will enable the development of novel audio and security industrial products, with foreseeable economic benefits both in terms of increased production and, more importantly, of reduced losses by decreasing incident rates thanks to an enhanced audio awareness. The scientific, technological and social impacts that the project can produce are also testified by recent works of well-known international researchers working on audio and machine learning.