Most T V stations are quite used to their audience complaining about hard-to-understand dialogue – be it in films, documentaries, sports coverage, and even the news. The matter is not an easy one to solve. Firstly, because the loudness difference between background sound and dialogue is a unique decision made by creators for every piece of content, and secondly, because the “perfect” dialogue loudness is a very personal issue.
The evolution of AI-based technologies and object-based audio (OBA), however, has enabled the creation of technologies such as MPEG-H Dialog+ by Fraunhofer IIS. The technology uses Deep Neural Networks (DNN) to automatically identify the dialogue of existing content, separate it from the background sounds, and remix it with a lowered background level. Using OBA, users can even adapt the dialogue level on their device to meet their personal requirements.
Recently, Fraunhofer IIS joined forces with German public broadcaster WDR and Telos Alliance to develop a professional workflow and bring MPEG-H Dialog+ into use. Fraunhofer IIS conducted field tests over DVB and the VoD platform “ARD Mediathek” to refine requirements and production workflows. The results were then fed into the product development of the Telos Alliance Minnetonka AudioTools Server Dialog+ module. The software has now been implemented as part of an automatic workflow – from archive to transcoding farm – in the WDR production infrastructure.