Multi-Modal Prompt Learning on Blind Image Quality Assessment ...

التفاصيل البيبلوغرافية
العنوان:	Multi-Modal Prompt Learning on Blind Image Quality Assessment ...
المؤلفون:	Pan, Wensheng, Gao, Timin, Zhang, Yan, Hu, Runze, Zheng, Xiawu, Zhang, Enwei, Gao, Yuting, Liu, Yutao, Shen, Yunhang, Li, Ke, Zhang, Shengchuan, Cao, Liujuan, Ji, Rongrong
بيانات النشر:	arXiv
سنة النشر:	2024
المجموعة:	DataCite Metadata Store (German National Library of Science and Technology)
مصطلحات موضوعية:	Computer Vision and Pattern Recognition cs.CV, FOS Computer and information sciences
الوصف:	Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This paper introduces an innovative multi-modal prompt-based methodology for IQA. Our approach employs carefully crafted prompts ...
نوع الوثيقة:	article in journal/newspaper report
اللغة:	unknown
DOI:	10.48550/arxiv.2404.14949
الإتاحة:	https://doi.org/10.48550/arxiv.2404.14949Test https://arxiv.org/abs/2404.14949Test
حقوق:	arXiv.org perpetual, non-exclusive license ; http://arxiv.org/licenses/nonexclusive-distrib/1.0Test/
رقم الانضمام:	edsbas.75EF1814
قاعدة البيانات:	BASE

View record in BASE

الوصف
DOI:	10.48550/arxiv.2404.14949