رسالة جامعية

Neural Rendering Techniques for Photo-realistic Image Generation and Novel View Synthesis

التفاصيل البيبلوغرافية
العنوان: Neural Rendering Techniques for Photo-realistic Image Generation and Novel View Synthesis
المؤلفون: Meshry, Moustafa
المساهمون: Shrivastava, Abhinav, Davis, Larry S, Digital Repository at the University of Maryland, University of Maryland (College Park, Md.), Computer Science
سنة النشر: 2022
المجموعة: University of Maryland: Digital Repository (DRUM)
مصطلحات موضوعية: Artificial intelligence, Computer science, Generative Adversarial Networks (GANs), Image synthesis, Neural Radiance Fields (NeRF), Neural rendering, neural talking heads
الوصف: Recent advances in deep generative models have enabled computers to imagine and generate fictional images from any given distribution of images. Techniques like Generative Adversarial Networks (GANs) and image-to-image (I2I) translation can generate images by mapping a random noise or an input image (e.g., a sketch or a semantic map) to photo-realistic images. However, there are still plenty of challenges regarding training such models and improving their output quality and diversity. Furthermore, to harness this imaginative and generative power for solving real-world applications, we need to be able to control different aspects of the rendering process; for example to specify the content and/or style of generated images, camera pose, lighting, . etc. One challenge to training image generation models is the multi-modal nature of image synthesis. An image with a specific content, such as a cat or a car, can be generated with countless choices of different styles (e.g., colors, lighting, and local texture details). To enable user control over the generated style, previous works train multi-modal I2I translation networks, but they suffer from a complicated and slow training, and their training is specific to one target image domain. We address this limitation and propose a style pre-training strategy that generalizes across many image domains, improves the training stability and speed, and improves the performance in terms of output quality and diversity. Another challenge to GANs and I2I translation is to provide 3D control over the rendering process. For example, applications such as AR/VR, virtual tours and telepresence require generating consistent images or videos of 3D environments. However, GANs and I2I translation mainly operate in 2D, which limits their use for such applications. To address this limitation, we propose to condition image synthesis on coarse geometric proxies (e.g., a point cloud, a coarse mesh, or a voxel grid), and we augment these rough proxies with machine learned components to fix and ...
نوع الوثيقة: doctoral or postdoctoral thesis
وصف الملف: application/pdf
اللغة: English
العلاقة: https://doi.org/10.13016/byyk-qzwmTest; http://hdl.handle.net/1903/29327Test
DOI: 10.13016/byyk-qzwm
الإتاحة: https://doi.org/10.13016/byyk-qzwmTest
http://hdl.handle.net/1903/29327Test
رقم الانضمام: edsbas.9FF5AE
قاعدة البيانات: BASE