Multimedia Content Analysis

  • Multimedia indexing
  • Multimedia mining
  • Multimedia abstraction and summarisation
  • Multimedia annotation, tagging and recommendation
  • Multimodal analysis for retrieval applications
  • Semantic analysis of multimedia and contextual data
  • Interactive learning
  • Multimedia knowledge acquisition and construction
  • Multimedia verification
  • Multimedia fusion methods
  • Multimedia content generation

Multimedia Signal Processing and Communications

  • Media representation and algorithms
  • Multimedia sensors and interaction modes
  • Multimedia privacy, security and content protection
  • Multimedia standards and related issues
  • Multimedia databases, query processing, and scalability
  • Multimedia content delivery, transport and streaming
  • Wireless and mobile multimedia networking
  • Sensor networks (video surveillance, distributed systems)
  • Audio, image, video processing, coding and compression
  • Multi-camera and multi-view systems

Multimedia Applications, Interfaces and Services

  • Media content retrieval, browsing and recommendation tools
  • Extended reality (AR/VR/MR) and virtual environments
  • Real-time and interactive multimedia applications
  • Multimedia analytics applications
  • Egocentric, wearable and personal multimedia
  • Urban and satellite multimedia
  • Mobile multimedia applications
  • Question answering, multimodal conversational AI and hybrid intelligence
  • Multimedia authoring and personalisation
  • Cultural, educational and social multimedia applications
  • Multimedia for e-health and medical applications

Ethical, Legal and Societal Aspects of Multimedia

  • Fairness, accountability, transparency and ethics in multimedia modeling
  • Environmental footprint of multimedia modeling
  • Large multimedia models and LLMs
  • Multimodal pretraining and representation learning
  • Reproducibility, interpretability, explainability and robustness
  • Embodied multimodal applications and tasks
  • Responsible multimedia modeling and learning
  • Legal and ethical aspects of multimodal generative AI
  • Multimedia research valorisation
  • Digital transformation

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The conference calls for research papers reporting original investigation results and demonstrations reporting novel and compelling applications.

The proceedings of previous editions of MMM can be found here.

MMM 2024 calls for submissions reporting novel and compelling demonstrations of MMM related technologies, in all areas listed in the call for (regular) papers. All kinds of demonstrations of working systems, prototypes, or proof-of-concepts that demonstrate new solutions, interesting ideas, or new applications of multimedia systems and applications are welcome.

Demonstration paper submissions have specific requirements for length, content, and supporting materials that should be submitted. Please check the submission guidelines for details.

The Brave New Ideas track of MMM 2024 is calling for papers that suggest new opportunities and challenges in the general domain of multimedia analytics and modelling. A BNI paper is expected to stimulate activity towards addressing new, long term challenges of interest to the multimedia modelling community. The papers should address topics with a clear potential for high societal impact; authors should be able to argue that their proposal is important to solving problems, to supporting new perspectives, or to providing services that positively impact on people. Note that is not necessary that papers in this track have large-scale experimental results or comparisons to the state of the art, since it is expected that large, publicly available datasets may not be available, and there may be no existing approaches to which the proposed approach in the paper can be compared.

BNI papers should adhere to the same formatting guidelines and page limits as the Regular and Special Session papers.

Special session papers must follow the same guidelines as regular research papers with respect to restrictions on formatting, length, and double-blind reviews. Only MDRE papers will undergo single-blind review, and authors will not have to anonymize their MDRE papers because of the inherent difficulty of doing so for open datasets.

  • MDRE: Multimedia Datasets for Repeatable Experimentation. This special session focuses on sharing data and code to allow other researchers to replicate research results, with a long term goal of improving the performance of systems and the reproducibility of published papers.
  • MOMST: Multi-Object Multi-Sensor Tracking. This special session addresses the challenging problem of multi-object multi-sensor tracking in computer vision and machine learning, essential in applications such as surveillance systems, autonomous vehicles, and robotics.
  • MARGeM: Multimodal Analytics and Retrieval of Georeferenced Multimedia. This special session focuses on multimodal analytics and retrieval techniques for georeferenced multimedia data, addressing challenges in lifelog computing, urban computing, satellite computing, and earth observation
  • ICDAR: Intelligent Cross-Data Analysis and Retrieval. This special session focuses on intelligent cross-data analytics and retrieval research and to bring a smart, sustainable society to human beings.
  • XR-MACCI: eXtended Reality and Multimedia - Advancing Content Creation and Interaction. This Special session focuses on the latest advancements in extended reality (XR) and multimedia technologies, including the development and integration of XR solutions with multimedia analysis, retrieval and processing methods.
  • FMM: Foundation Models for Multimedia. This special session focuses on the transformative impact of Foundation Models (FMs) such as large language models (LLMs) and large vision language models (LVLMs) and explores the future directions and challenges in harnessing FMs for multimedia applications.
  • MULTICON: Towards Multimedia and Multimodality in Conversational Systems. This special session aims to present the most recent works and applications for addressing the challenges and opportunities in developing multimedia and multimodality-enabled conversational systems and chatbots. Indicative domains of application include healthcare, education, immigration, customer service, finance and others.
  • CultMM: Cultural AI in Multimedia. This Special session aims to bring together experts from Cultural AI and Multimedia to discuss the challenges surrounding cultural data, as well as the complexities of human culture, that require multimedia solutions.

As in previous years, VBS 2024 will be part of the International Conference on MultiMedia Modeling 2024 (MMM 2024) in Amsterdam, The Netherlands, and organized as a special side event to the Welcome Reception. It will be a moderated session where participants solve Known-Item Search (KIS), Ad-Hoc Video Search (AVS), and Question Answering (Q/A) tasks that are issued as live presentation of scenes of interest, either as a visual clip, or as a textual description. The goal is to find correct segments (for KIS exactly one segment, for AVS many segments) or the correct answer (for Q/A tasks) as fast as possible and submit the answer (for KIS and AVS: segment description – video id and frame number) to the VBS server (DRES), which evaluates the correctness of submissions.

More information can be found at

The Benchmarking Initiative for Multimedia Evaluation (MediaEval) offers challenges related to multimedia analysis, retrieval and exploration. MediaEval tasks involve multiple modalities, (e.g., audio, visual, textual, and/or contextual) and focus on the human and social aspects of multimedia. The larger aim is to promote reproducible research that makes multimedia a positive force for society.

More information can be found at