Using an LLM to Turn Sign Spottings into Spoken Language Sentences

التفاصيل البيبلوغرافية
العنوان: Using an LLM to Turn Sign Spottings into Spoken Language Sentences
المؤلفون: Sincan, Ozge Mercanoglu, Camgoz, Necati Cihan, Bowden, Richard
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a powerful Large Language Model (LLM) to improve SLT performance. Spotter+GPT breaks down the SLT task into two stages. The videos are first processed by the Spotter, which is trained on a linguistic sign language dataset, to identify individual signs. These spotted signs are then passed to an LLM, which transforms them into coherent and contextually appropriate spoken language sentences. The source code of the Spotter is available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotterTest.
نوع الوثيقة: Working Paper
الوصول الحر: http://arxiv.org/abs/2403.10434Test
رقم الانضمام: edsarx.2403.10434
قاعدة البيانات: arXiv