zheyangqin commited on
Commit
7cac834
·
verified ·
1 Parent(s): dd404ec

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +46 -3
  2. gitattributes +45 -0
  3. vader_videocrafter_hps_aesthetic.pt +3 -0
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ <div align="center">
5
+
6
+ <!-- TITLE -->
7
+ # **Video Diffusion Alignment via Reward Gradient**
8
+ ![VADER](assets/vader_method.png)
9
+
10
+ [![Website](https://img.shields.io/badge/🌎-Website-blue.svg)](http://vader-vid.github.io)
11
+ </div>
12
+
13
+ For more information on how to use it, please check [GitHub](https://github.com/mihirp1998/VADER).
14
+
15
+ This is a trained checkpoint using PickScore Reward Model for our paper [Video Diffusion Alignment via Reward Gradient](https://vader-vid.github.io/) by
16
+
17
+ Mihir Prabhudesai*, Russell Mendonca*, Zheyang Qin*, Katerina Fragkiadaki, Deepak Pathak.
18
+
19
+
20
+ <!-- DESCRIPTION -->
21
+ ## Abstract
22
+ We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks, such as video-text alignment or ethical video generation. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we instead utilize pre-trained reward models that are learned via preferences on top of powerful discriminative models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to be able to learn efficiently in complex search spaces, such as videos. We show that our approach can enable alignment of video diffusion for aesthetic generations, similarity between text context and video, as well long horizon video generations that are 3X longer than the training sequence length. We show our approach can learn much more efficiently in terms of reward queries and compute than previous gradient-free approaches for video generation.
23
+
24
+ ## Demo
25
+ | | | |
26
+ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
27
+ | <img src="assets/videos/8.gif" width=""> | <img src="assets/videos/5.gif" width=""> | <img src="assets/videos/7.gif" width=""> |
28
+ | <img src="assets/videos/10.gif" width=""> | <img src="assets/videos/3.gif" width=""> | <img src="assets/videos/4.gif" width=""> |
29
+ | <img src="assets/videos/9.gif" width=""> | <img src="assets/videos/1.gif" width=""> | <img src="assets/videos/11.gif" width=""> |
30
+
31
+
32
+ ## Citation
33
+
34
+ If you find this work useful in your research, please cite:
35
+
36
+ ```bibtex
37
+ @misc{prabhudesai2024videodiffusionalignmentreward,
38
+ title={Video Diffusion Alignment via Reward Gradients},
39
+ author={Mihir Prabhudesai and Russell Mendonca and Zheyang Qin and Katerina Fragkiadaki and Deepak Pathak},
40
+ year={2024},
41
+ eprint={2407.08737},
42
+ archivePrefix={arXiv},
43
+ primaryClass={cs.CV},
44
+ url={https://arxiv.org/abs/2407.08737},
45
+ }
46
+ ```
gitattributes ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/vader_method.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/videos/1.gif filter=lfs diff=lfs merge=lfs -text
38
+ assets/videos/10.gif filter=lfs diff=lfs merge=lfs -text
39
+ assets/videos/11.gif filter=lfs diff=lfs merge=lfs -text
40
+ assets/videos/3.gif filter=lfs diff=lfs merge=lfs -text
41
+ assets/videos/4.gif filter=lfs diff=lfs merge=lfs -text
42
+ assets/videos/5.gif filter=lfs diff=lfs merge=lfs -text
43
+ assets/videos/7.gif filter=lfs diff=lfs merge=lfs -text
44
+ assets/videos/8.gif filter=lfs diff=lfs merge=lfs -text
45
+ assets/videos/9.gif filter=lfs diff=lfs merge=lfs -text
vader_videocrafter_hps_aesthetic.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e470dfa659d6bfcb2ee2bcefc4b53e30933299bbb2d9203989be53512a63c9bd
3
+ size 10189387