رسالة جامعية

An efficient neural representation for videos

التفاصيل البيبلوغرافية
العنوان: An efficient neural representation for videos
المؤلفون: Chen, Hao
المساهمون: Shrivastava, Abhinav, Computer Science, Digital Repository at the University of Maryland, University of Maryland (College Park, Md.)
سنة النشر: 2023
المجموعة: University of Maryland: Digital Repository (DRUM)
مصطلحات موضوعية: Computer science, efficient video loading, Implicit neural representation, Video compression, video editing, video restoration
الوصف: With the increasing popularity of videos, it has become crucial to find efficient and compact ways to represent them for easier storage, transmission, and downstream video tasks. Our dissertation proposes an innovative neural representation for videos called NeRV, which stores each video implicitly as a neural network. Building on NeRV, we introduce a hybrid representation for videos called HNeRV, which improves internal generalization and representation capacity. HNeRV allows for highly efficient video representation and compression, with a model size that can be up to 1000 times smaller than the original raw video. Apart from efficiency, HNeRV's simple decoding process, which involves a feedforward operation, enables fast video loading and easy deployment. To enhance efficiency, we develope an efficient neural video dataloader called NVLoader, which is 3-6 times faster than conventional video dataloaders. We also introduce the HyperNeRV framework to address encoding speed, which utilizes a hypernetwork to directly map input videos to NeRV model weights, resulting in a 10^4 faster encoding process. Aside from developing compact and implicit video neural representations, we explore several compelling applications, including frame interpolation, video restoration, and video editing. Furthermore, the compactness of these representations makes them an ideal output video format for video generation models, reducing the search space significantly. Additionally, they can serve as an efficient input for video understanding models.
نوع الوثيقة: doctoral or postdoctoral thesis
وصف الملف: application/pdf
اللغة: English
العلاقة: https://doi.org/10.13016/dspace/rpio-zrgbTest; http://hdl.handle.net/1903/30742Test
DOI: 10.13016/dspace/rpio-zrgb
الإتاحة: https://doi.org/10.13016/dspace/rpio-zrgbTest
http://hdl.handle.net/1903/30742Test
رقم الانضمام: edsbas.61CE6841
قاعدة البيانات: BASE