Advanced Scene Understanding Methods and Applications in Multi-Modal Data (Submission Deadline: Nov. 26, 2025)
面向多模态数据的场景理解方法及应用
Chair: | Co-chair: |
Weiyao Lin | Xiankai Lu |
Shanghai Jiaotong University, China | Shandong University, China |
Keywords: | Topic: |
|
|
Summary:
Deep learning-based intelligent algorithms have demonstrated remarkable versatility and prowess across various domain-specific tasks. Despite these significant achievements, existing unimodal models still exhibit limitations in meeting the diverse requirements of daily-life applications. This has spurred researchers to delve into the field of multimodal data pattern recognition, where models exemplified by Clip have significantly enhanced multimodal scene understanding capabilities. More recently developed Large Multimodal Models (LMMs) such as Gemini (Google) and Sora (OpenAI) further showcase powerful abilities in comprehending or creating realistic and imaginative videos. Although deep learning-based multimodal algorithms have garnered widespread attention, they face numerous challenges when processing dynamic visual scenes. These include: Integrating and aligning multimodal information (e.g., video, audio, 3D data) Addressing domain shift issues Handling noisy data and labeling defects Discovering novel objects or patterns Furthermore, infusing temporal consistency and coherence properties into these algorithms poses a significant challenge for understanding multimodal scenes. This special session aims to provide a platform for researchers to share the latest advances in multimodal model theories, methodologies, and applications. We also cordially invite submissions exploring the potential of multimodal data in enhancing the diversity and inclusivity of scene understanding.
基于深度学习的智能算法已在各领域任务中展现出卓越的通用能力。尽管成就显著,现有单模态模型在日常生活应用的广泛需求方面仍存在局限。这促使研究者深入探索多模态数据的模式识别领域,以Clip为代表的多模态模型显著提升了多模态场景理解能力。而近期问世的Gemini(谷歌)和Sora(OpenAI)等LMMs,更展现出理解或创作逼真且富有想象力视频的强大能力。 尽管基于深度学习的多模态算法已获得广泛关注,其在处理动态视觉场景时仍面临诸多挑战:包括视频、声音、三维数据等多模态信息的融合对齐、领域偏移问题、噪声数据与标注缺陷处理、以及新物体或模式的发现。此外,为理解多模态场景,如何将时间一致性与连贯性融入构成重大挑战。本期特刊旨在为研究者提供分享多模态模型理论、方法以及应用最新进展的平台。同时诚邀探索多模态数据在提升场景理解的多样性与包容性方面潜力的研究成果投稿。
Copyright ©www.icvisp.org 2025-2026 All Rights Reserved.