Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

التفاصيل البيبلوغرافية
العنوان: Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios
المؤلفون: Xu, Jialei, Liu, Xianming, Jiang, Junjun, Jiang, Kui, Li, Rui, Cheng, Kai, Ji, Xiangyang
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods focus solely on a single modality due to the difficulties to identify and integrate faithful depth cues from both sources. To address these issues, this paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. Concretely, we independently compute the coarse depth maps with separate networks by fully utilizing the individual depth cues from each modality. As the advantageous depth spreads across both modalities, we propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner. Harnessing the proposed pipeline, our method demonstrates the ability of robust depth estimation in a variety of difficult scenarios. Experimental results on the challenging MS$^2$ and ViViD++ datasets demonstrate the effectiveness and robustness of our method.
نوع الوثيقة: Working Paper
الوصول الحر: http://arxiv.org/abs/2402.11826Test
رقم الانضمام: edsarx.2402.11826
قاعدة البيانات: arXiv