wren93 commited on
Commit
1698fb1
β€’
1 Parent(s): 9ad9773

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # ConsistI2V Model Card
5
+
6
+ [**🌐 Homepage**](https://tiger-ai-lab.github.io/ConsistI2V/) | [**πŸ“– arXiv**](https://arxiv.org/abs/2402.04324) | [**πŸ–₯️ Code**](https://github.com/TIGER-AI-Lab/ConsistI2V) | [**πŸ“Š I2V-Bench**](https://drive.google.com/drive/folders/1eg_vtowKZBen74W-A1oeO4bR1K21giks)
7
+
8
+ We propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos.
9
+ <img src="https://tiger-ai-lab.github.io/ConsistI2V/static/images/consisti2v_main.png" alt="ConsistI2V" style="width:50%;">
10
+
11
+ ## Environment Setup
12
+ Prepare codebase and Conda environment using the following commands:
13
+ ```
14
+ git clone https://github.com/TIGER-AI-Lab/ConsistI2V
15
+ cd ConsistI2V
16
+
17
+ conda env create -f environment.yaml
18
+ conda activate consisti2v
19
+ ```
20
+
21
+ ## Inference
22
+ **Checkout our [GitHub codebase](https://github.com/TIGER-AI-Lab/ConsistI2V) for detailed inference setups.**
23
+
24
+ To generate videos with ConsistI2V, modify the inference configurations in `configs/inference/inference.yaml` and the input prompt file `configs/prompts/default.yaml`, and then run the sampling script with the following command:
25
+ ```
26
+ python -m scripts.animate \
27
+ --inference_config configs/inference/inference.yaml \
28
+ --prompt_config configs/prompts/default.yaml \
29
+ --format mp4
30
+ ```
31
+ The inference script automatically downloads the model from Hugging Face by specifying `pretrained_model_path` in `configs/inference/inference.yaml` as `TIGER-Lab/ConsistI2V` (default configuration). If you are having trouble downloading the model from the script, you can store the model on your local storage and modify `pretrained_model_path` to the local model path.
32
+
33
+ You can also explicitly define the input text prompt, negative prompt, sampling seed and first frame path as:
34
+ ```
35
+ python -m scripts.animate \
36
+ --inference_config configs/inference/inference.yaml \
37
+ --prompt "timelapse at the snow land with aurora in the sky." \
38
+ --n_prompt "your negative prompt" \
39
+ --seed 42 \
40
+ --path_to_first_frame assets/example/example_01.png \
41
+ --format mp4
42
+ ```
43
+
44
+ To modify inference configurations in `configs/inference/inference.yaml` from command line, append extra arguments to the end of the inference command:
45
+ ```
46
+ python -m scripts.animate \
47
+ --inference_config configs/inference/inference.yaml \
48
+ ... # additional arguments
49
+ --format mp4
50
+ sampling_kwargs.num_videos_per_prompt=4 \ # overwrite the configs in the config file
51
+ frameinit_kwargs.filter_params.d_s=0.5
52
+ ```
53
+
54
+ ## Training
55
+ Modify the training configurations in `configs/training/training.yaml` and run the following command to train the model:
56
+ ```
57
+ python -m torch.distributed.run \
58
+ --nproc_per_node=${GPU_PER_NODE} \
59
+ --master_addr=${MASTER_ADDR} \
60
+ --master_port=${MASTER_PORT} \
61
+ --nnodes=${NUM_NODES} \
62
+ --node_rank=${NODE_RANK} \
63
+ train.py \
64
+ --config configs/training/training.yaml \
65
+ -n consisti2v_training \
66
+ --wandb
67
+ ```
68
+ The dataloader in our code assumes a root folder `train_data.webvid_config.video_folder` containing all videos and a `jsonl` file `train_data.webvid_config.json_path` containing video relative paths and captions, with each line in the following format:
69
+ ```
70
+ {"text": "A man rolling a winter sled with a child sitting on it in the snow close-up", "time": "30.030", "file": "relative/path/to/video.mp4", "fps": 29.97002997002997}
71
+ ```
72
+ Videos can be stored in multiple subdirectories. Alternatively, you can modify the dataloader to support your own dataset. Similar to model inference, you can also add additional arguments at the end of the training command to modify the training configurations in `configs/training/training.yaml`.
73
+
74
+ ## Citation
75
+ Please kindly cite our paper if you find our code, data, models or results to be helpful.
76
+ ```bibtex
77
+ @article{ren2024consisti2v,
78
+ title={ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation},
79
+ author={Ren, Weiming and Yang, Harry and Zhang, Ge and Wei, Cong and Du, Xinrun and Huang, Stephen and Chen, Wenhu},
80
+ journal={arXiv preprint arXiv:2402.04324},
81
+ year={2024}
82
+ }
83
+ ```