*license_notice_EDS_guest_loggedout* *Login*

Result: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding.

Title:

Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding.

Authors:

Liu, Yi¹ (AUTHOR) liuyi0089@gmail.com, Li, Chengxin¹ (AUTHOR) s23150812015@smail.cczu.edu.cn, Xu, Shoukun¹ (AUTHOR) jpuxsk@163.com, Han, Jungong² (AUTHOR) jungonghan77@gmail.com

Source:

International Journal of Computer Vision. Jul2025, Vol. 133 Issue 7, p4483-4503. 21p.

Subject Terms:

*CAPSULE neural networks, *SOURCE code, *AUTONOMOUS vehicles, *LIDAR, *SEMANTICS

Database:

Academic Search Index

Further Information

*Multi-modal fusion has played a vital role in multi-modal scene understanding. Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion, which is essential for real-world applications like autonomous driving, where visible, depth, event, LiDAR, etc., are used. Besides, few attempts for multi-modal fusion, e.g., simple concatenation, cross-modal attention, and token selection, cannot well dig into the intrinsic shared and specific details of multiple modalities. To tackle the challenge, in this paper, we propose a Part-Whole Relational Fusion (PWRF) framework. For the first time, this framework treats multi-modal fusion as part-whole relational fusion. It routes multiple individual part-level modalities to a fused whole-level modality using the part-whole relational routing ability of Capsule Networks (CapsNets). Through this part-whole routing, our PWRF generates modal-shared and modal-specific semantics from the whole-level modal capsules and the routing coefficients, respectively. On top of that, modal-shared and modal-specific details can be employed to solve the issue of multi-modal scene understanding, including synthetic multi-modal segmentation and visible-depth-thermal salient object detection in this paper. Experiments on several datasets demonstrate the superiority of the proposed PWRF framework for multi-modal scene understanding. The source code has been released on https://github.com/liuyi1989/PWRF. [ABSTRACT FROM AUTHOR]*

*Result*: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding.

*Further Information*

*Links*

*Additional functions*

Result: Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding.

Further Information

Links

Additional functions