دورية أكاديمية

Progressive Transformer Machine for Natural Character Reenactment.

التفاصيل البيبلوغرافية
العنوان: Progressive Transformer Machine for Natural Character Reenactment.
المؤلفون: YONGZONG XU, ZHIJING YANG, TIANSHUI CHEN, KAI LI, CHUNMEI QING
المصدر: ACM Transactions on Multimedia Computing, Communications & Applications; 2023 Suppl 2, Vol. 19, p1-22, 22p
مصطلحات موضوعية: CONVOLUTIONAL neural networks, GENERATIVE adversarial networks, ELECTRIC transformers
مستخلص: Character reenactment aims to control a target person’s full-head movement by a driving monocular se)quence that is made up of the driving character video. Current algorithms utilize convolution neural net)works in generative adversarial networks, which extract historical and geometric information to iteratively generate video frames. However, convolution neural networks can merely capture local information with limited receptive fields and ignore global dependencies that play a crucial role in face synthesis, leading to generating unnatural video frames. In this work, we design a progressive transformer module that introduces multi-head self-attention with convolution refinement to simultaneously capture global-local dependencies. Specifically, we utilize the non-lapping window-based multi-head self-attention mechanism with hierarchical architecture to obtain the larger receptive fields at low-resolution feature map and thus extract global infor)mation. To better model local dependencies, we introduce the convolution operation to further refine the attentional weight in the multi-head self-attention mechanism. Finally, we use several stacked progressive transformer modules with the down-sampling operation to encode information of appearance information of previously generated frames and parameterized 3D face information of the current frame. Similarly, we use several stacked progressive transformer modules with the up-sampling operation to iteratively generate video frames. In this way, it can capture global-local information to facilitate generating video frames that are globally natural while preserving sharp outlines and rich detail information. Extensive experiments on several standard benchmarks suggest that the proposed method outperforms current leading algorithms. [ABSTRACT FROM AUTHOR]
Copyright of ACM Transactions on Multimedia Computing, Communications & Applications is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index