Progressive Transformer Machine for Natural Character Reenactment

التفاصيل البيبلوغرافية
العنوان: Progressive Transformer Machine for Natural Character Reenactment
المؤلفون: Yongzong Xu, Zhijing Yang, Tianshui Chen, Kai Li, Chunmei Qing
المصدر: ACM Transactions on Multimedia Computing, Communications, and Applications. 19:1-22
بيانات النشر: Association for Computing Machinery (ACM), 2023.
سنة النشر: 2023
مصطلحات موضوعية: Computer Networks and Communications, Hardware and Architecture
الوصف: Character reenactment aims to control a target person’s full-head movement by a driving monocular sequence that is made up of the driving character video. Current algorithms utilize convolution neural networks in generative adversarial networks, which extract historical and geometric information to iteratively generate video frames. However, convolution neural networks can merely capture local information with limited receptive fields and ignore global dependencies that play a crucial role in face synthesis, leading to generating unnatural video frames. In this work, we design a progressive transformer module that introduces multi-head self-attention with convolution refinement to simultaneously capture global-local dependencies. Specifically, we utilize the non-lapping window-based multi-head self-attention mechanism with hierarchical architecture to obtain the larger receptive fields at low-resolution feature map and thus extract global information. To better model local dependencies, we introduce the convolution operation to further refine the attentional weight in the multi-head self-attention mechanism. Finally, we use several stacked progressive transformer modules with the down-sampling operation to encode information of appearance information of previously generated frames and parameterized 3D face information of the current frame. Similarly, we use several stacked progressive transformer modules with the up-sampling operation to iteratively generate video frames. In this way, it can capture global-local information to facilitate generating video frames that are globally natural while preserving sharp outlines and rich detail information. Extensive experiments on several standard benchmarks suggest that the proposed method outperforms current leading algorithms.
تدمد: 1551-6865
1551-6857
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_________::8f23a76a938fab0f3f78e5bbd153653bTest
https://doi.org/10.1145/3559107Test
رقم الانضمام: edsair.doi...........8f23a76a938fab0f3f78e5bbd153653b
قاعدة البيانات: OpenAIRE