Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection

التفاصيل البيبلوغرافية
العنوان:	Object-Centric Masked Image Modeling-Based Self-Supervised Pretraining for Remote Sensing Object Detection
المؤلفون:	Tong Zhang, Yin Zhuang, He Chen, Liang Chen, Guanqun Wang, Peng Gao, Hao Dong
المصدر:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 16, Pp 5013-5025 (2023)
بيانات النشر:	IEEE, 2023.
سنة النشر:	2023
المجموعة:	LCC:Ocean engineering LCC:Geophysics. Cosmic physics
مصطلحات موضوعية:	Masked image modeling (MIM), object detection, optical remote sensing, self-supervised learning, vision transformer (ViT), Ocean engineering, TC1501-1800, Geophysics. Cosmic physics, QC801-809
الوصف:	Masked image modeling (MIM) has been proved to be an optimal pretext task for self-supervised pretraining (SSP), which can facilitate the model to capture an effective task-agnostic representation at the pretraining step and then advance the fine-tuning performance of various downstream tasks. However, under the high randomly masked ratio of MIM, the scene-level MIM-based SSP is hard to capture the small-scale objects or local details from complex remote sensing scenes. Then, when the pretrained models capturing more scene-level information are directly applied for object-level fine-tuning step, there is an obvious representation learning misalignment between model pretraining and fine-tuning steps. Therefore, in this article, a novel object-centric masked image modeling (OCMIM) strategy is proposed to make the model better capture the object-level information at the pretraining step and then further advance the object detection fine-tuning step. First, to better learn the object-level representation involving full scales and multicategories at MIM-based SSP, a novel object-centric data generator is proposed to automatically setup targeted pretraining data according to objects themselves, which can provide the specific data condition for object detection model pretraining. Second, an attention-guided mask generator is designed to generate a guided mask for MIM pretext task, which can lead the model to learn more discriminative representation of highly attended object regions than by using the randomly masking strategy. Finally, several experiments are conducted on six remote sensing object detection benchmarks, and results proved that the proposed OCMIM-based SSP strategy is a better pretraining way for remote sensing object detection than normally used methods.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	2151-1535
العلاقة:	https://ieeexplore.ieee.org/document/10129022Test/; https://doaj.org/toc/2151-1535Test
DOI:	10.1109/JSTARS.2023.3277588
الوصول الحر:	https://doaj.org/article/415638424ffe4b1cb98e0b51f3e25558Test
رقم الانضمام:	edsdoj.415638424ffe4b1cb98e0b51f3e25558
قاعدة البيانات:	Directory of Open Access Journals

View record in DOAJ

الوصف
تدمد:	21511535
DOI:	10.1109/JSTARS.2023.3277588