رسالة جامعية
Neural Rendering Techniques for Photo-realistic Image Generation and Novel View Synthesis
العنوان: | Neural Rendering Techniques for Photo-realistic Image Generation and Novel View Synthesis |
---|---|
المؤلفون: | Meshry, Moustafa |
المساهمون: | Shrivastava, Abhinav, Davis, Larry S, Digital Repository at the University of Maryland, University of Maryland (College Park, Md.), Computer Science |
سنة النشر: | 2022 |
المجموعة: | University of Maryland: Digital Repository (DRUM) |
مصطلحات موضوعية: | Artificial intelligence, Computer science, Generative Adversarial Networks (GANs), Image synthesis, Neural Radiance Fields (NeRF), Neural rendering, neural talking heads |
الوصف: | Recent advances in deep generative models have enabled computers to imagine and generate fictional images from any given distribution of images. Techniques like Generative Adversarial Networks (GANs) and image-to-image (I2I) translation can generate images by mapping a random noise or an input image (e.g., a sketch or a semantic map) to photo-realistic images. However, there are still plenty of challenges regarding training such models and improving their output quality and diversity. Furthermore, to harness this imaginative and generative power for solving real-world applications, we need to be able to control different aspects of the rendering process; for example to specify the content and/or style of generated images, camera pose, lighting, . etc. One challenge to training image generation models is the multi-modal nature of image synthesis. An image with a specific content, such as a cat or a car, can be generated with countless choices of different styles (e.g., colors, lighting, and local texture details). To enable user control over the generated style, previous works train multi-modal I2I translation networks, but they suffer from a complicated and slow training, and their training is specific to one target image domain. We address this limitation and propose a style pre-training strategy that generalizes across many image domains, improves the training stability and speed, and improves the performance in terms of output quality and diversity. Another challenge to GANs and I2I translation is to provide 3D control over the rendering process. For example, applications such as AR/VR, virtual tours and telepresence require generating consistent images or videos of 3D environments. However, GANs and I2I translation mainly operate in 2D, which limits their use for such applications. To address this limitation, we propose to condition image synthesis on coarse geometric proxies (e.g., a point cloud, a coarse mesh, or a voxel grid), and we augment these rough proxies with machine learned components to fix and ... |
نوع الوثيقة: | doctoral or postdoctoral thesis |
وصف الملف: | application/pdf |
اللغة: | English |
العلاقة: | https://doi.org/10.13016/byyk-qzwmTest; http://hdl.handle.net/1903/29327Test |
DOI: | 10.13016/byyk-qzwm |
الإتاحة: | https://doi.org/10.13016/byyk-qzwmTest http://hdl.handle.net/1903/29327Test |
رقم الانضمام: | edsbas.9FF5AE |
قاعدة البيانات: | BASE |
DOI: | 10.13016/byyk-qzwm |
---|