XCLiu
/

2_rectified_flow_from_sd_1_5

RectifiedFlowPipeline

Model card Files Files and versions Community

2_rectified_flow_from_sd_1_5 / README.md

XCLiu's picture

Update README.md

0f2635d over 1 year ago

|

2.7 kB

	---
	license: cc-by-nc-4.0
	---

	# InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5

	2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5.

	We use text-conditioned reflow as described in [our paper](https://arxiv.org/abs/2309.06380).

	Reflow has interesting theoretical properties. You may check [this ICLR paper](https://arxiv.org/abs/2209.03003) and [this arXiv paper](https://arxiv.org/abs/2209.14577).

	## Images Generated from Random Diffusion DB prompts

	We compare SD 1.5+DPM-Solver and 2-Rectified Flow with random prompts from Diffusion DB. 2-Rectiifed Flow is straighter.

	\| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/MXEZ5YQtsnr70XzVnH8gQ.png) \|
	\| :---: \|
	\| Prompt: a renaissance portrait of dwayne johnson, art in the style of rembrandt. \|

	\| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/dqPdE0JFqNtUnu6wy3ugF.png) \|
	\| :---: \|
	\| Prompt: a photo of a rabbit head on a grizzly bear body. \|

	# Usage

	Please refer to the [official github repo](https://github.com/gnobitab/InstaFlow).

	## Training

	Training pipeline:
	1. Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations.
	The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days)
	2. Reflow (Stage 2): We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days)

	The final model is 2-Rectified Flow.

	Total Training Cost: It takes 75.2 A100 GPU days to get 2-Rectified Flow.


	## Evaluation Results - Metrics

	The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver:

	FID-5k = 21.5, CLIP score = 0.315

	Few-Step performance:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/GS_ApYjpbtmwnICgHOZmD.png)

	## Evaluation Results - Impact of Guidance Scale

	We evaluate the impact of the guidance scale on 2-Rectified Flow.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/h_GbLBjnE8tP67Fgzj6ER.png)

	Trade-off Curve:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/ldplYcANcoPogbqdOP1p9.png)

	## Citation
	```
	@article{liu2023insta,
	title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
	author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
	journal={arXiv preprint arXiv:2309.06380},
	year={2023}
	}
	```