دورية أكاديمية

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger

التفاصيل البيبلوغرافية
العنوان: SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger
المؤلفون: Gao, Yuting, Liu, Jinfeng, Xu, Zihan, Wu, Tong, Zhang, Enwei, Li, Ke, Yang, Jie, Liu, Wei, Sun, Xing
المصدر: Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38 No. 3: AAAI-24 Technical Tracks 3; 1860-1868 ; 2374-3468 ; 2159-5399
بيانات النشر: Association for the Advancement of Artificial Intelligence
سنة النشر: 2024
المجموعة: Association for the Advancement of Artificial Intelligence: AAAI Publications
مصطلحات موضوعية: CV: Language and Vision, CV: Representation Learning for Vision
الوصف: During the preceding biennium, vision-language pre-training has achieved noteworthy success on several downstream tasks. Nevertheless, acquiring high-quality image-text pairs, where the pairs are entirely exclusive of each other, remains a challenging task, and noise exists in the commonly used datasets. To address this issue, we propose SoftCLIP, a novel approach that relaxes the strict one-to-one constraint and achieves a soft cross-modal alignment by introducing a softened target, which is generated from the fine-grained intra-modal self-similarity. The intra-modal guidance is indicative to enable two pairs have some local similarities and model many-to-many relationships between the two modalities. Besides, since the positive still dominates in the softened target distribution, we disentangle the negatives in the distribution to further boost the relation alignment with the negatives in the cross-modal learning. Extensive experiments demonstrate the effectiveness of SoftCLIP. In particular, on ImageNet zero-shot classification task, using CC3M/CC12M as pre-training dataset, SoftCLIP brings a top-1 accuracy improvement of 6.8%/7.2% over the CLIP baseline.
نوع الوثيقة: article in journal/newspaper
وصف الملف: application/pdf
اللغة: English
العلاقة: https://ojs.aaai.org/index.php/AAAI/article/view/27955/27930Test; https://ojs.aaai.org/index.php/AAAI/article/view/27955/27931Test; https://ojs.aaai.org/index.php/AAAI/article/view/27955Test
DOI: 10.1609/aaai.v38i3.27955
الإتاحة: https://doi.org/10.1609/aaai.v38i3.27955Test
https://ojs.aaai.org/index.php/AAAI/article/view/27955Test
حقوق: Copyright (c) 2024 Association for the Advancement of Artificial Intelligence
رقم الانضمام: edsbas.DB220F59
قاعدة البيانات: BASE