The Fraunhofer IAIS Mining Platform is a Microservice-based system for automatically extracting insights from text, audio, image, and video at scale. A large number of AI-based services for analyzing the different modalities have already been integrated: text documents can be processed with named entity recognition, keyword extraction, topic modeling, or smart keywording services. Speech can be transcribed using the Fraunhofer IAIS Audio Mining solution, and faces can be detected and recognized in both image and video files.
Thanks to its modular structure, additional services can easily be integrated, and AI-based services for object recognition, concept detection, cut and scene detection, and semantic keyframe extraction are currently under development. All services come with readymade models trained on openly available datasets, but it is also possible to train and use customer-specific models, an advantage not offered by other systems so far.
The individual steps for analyzing media files are governed by a workflow component, which makes large-scale insight extraction robust and fail-safe. Explicitly orchestrating the workflow steps also allows to prioritize the analysis of media files depending on urgency. For example, Mining Platform for the daily business of news editors and journalists.
The workflow component also allows modeling more complex media analysis workflows. For example, a video could be processed using face recognition (and other visual analysis services), while Audio Mining would transcribe its speech content and recognize the speakers. The extracted insights—the persons recognized in the video and the detected speakers—could then be linked together, which would allow to detect when a person is shown in the video while also talking. The transcript could then be further analyzed using text mining services to identify topics, as well as people, places, or institutions that are mentioned.
Future Developments
To focus our developments and to benefit our customers, the Audio Mining components will be integrated into our Mining Platform. As the Mining Platform can be customized to need, such an integrated platform can serve both customers interested in automatic speech recognition services as well as customers in need of a versatile media analysis platform.
The availability of video mining services will facilitate the development of more complex analysis workflows, which allow to comprehensively analyze videos, and combine the generated insights to gain a more extensive and deeper understanding of videos. In the field of speech recognition, we plan to extend our systems to other European languages and support more dialects and accents.