Learning Situation Hyper-Graphs for Video Question Answering

التفاصيل البيبلوغرافية
العنوان: Learning Situation Hyper-Graphs for Video Question Answering
المؤلفون: Khan, Aisha Urooj, Kuehne, Hilde, Wu, Bo, Chheu, Kim, Bousselham, Walid, Gan, Chuang, Lobo, Niels, Shah, Mubarak
بيانات النشر: arXiv, 2023.
سنة النشر: 2023
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
الوصف: Answering questions about complex situations in videos requires not only capturing the presence of actors, objects, and their relations but also the evolution of these relationships over time. A situation hyper-graph is a representation that describes situations as scene sub-graphs for video frames and hyper-edges for connected sub-graphs and has been proposed to capture all such information in a compact structured form. In this work, we propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs, coined Situation Hyper-Graph based Video Question Answering (SHG-VQA). To this end, we train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip. and to use cross-attention between the predicted situation hyper-graphs and the question embedding to predict the correct answer. The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction. The effectiveness of the proposed architecture is extensively evaluated on two challenging benchmarks: AGQA and STAR. Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks.
DOI: 10.48550/arxiv.2304.08682
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::254d5c4a2eaf4bbd46db4c934d769ad4Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....254d5c4a2eaf4bbd46db4c934d769ad4
قاعدة البيانات: OpenAIRE