Spaces:
Running
on
Zero
Running
on
Zero
File size: 10,019 Bytes
0305a63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
<div align="center">
<h1>InstantIR: Blind Image Restoration with</br>Instant Generative Reference</h1>
[**Jen-Yuan Huang**](https://jy-joy.github.io)<sup>1 2</sup>, [**Haofan Wang**](https://haofanwang.github.io/)<sup>2</sup>, [**Qixun Wang**](https://github.com/wangqixun)<sup>2</sup>, [**Xu Bai**](https://huggingface.co./baymin0220)<sup>2</sup>, Hao Ai<sup>2</sup>, Peng Xing<sup>2</sup>, [**Jen-Tse Huang**](https://penguinnnnn.github.io)<sup>3</sup> <br>
<sup>1</sup>Peking University Β· <sup>2</sup>InstantX Team Β· <sup>3</sup>The Chinese University of Hong Kong
<!-- <sup>*</sup>corresponding authors -->
<a href='https://arxiv.org/abs/2410.06551'><img src='https://img.shields.io/badge/arXiv-2410.06551-b31b1b.svg'>
<a href='https://jy-joy.github.io/InstantIR/'><img src='https://img.shields.io/badge/Project-Website-green'></a>
<a href='https://huggingface.co./InstantX/InstantIR'><img src='https://img.shields.io/static/v1?label=Model&message=Huggingface&color=orange'></a>
<!-- [![GitHub](https://img.shields.io/github/stars/InstantID/InstantID?style=social)](https://github.com/InstantID/InstantID) -->
<!-- <a href='https://huggingface.co./spaces/InstantX/InstantID'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
[![ModelScope](https://img.shields.io/badge/ModelScope-Studios-blue)](https://modelscope.cn/studios/instantx/InstantID/summary)
[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/InstantX/InstantID) -->
</div>
**InstantIR** is a novel single-image restoration model designed to resurrect your damaged images, delivering extrem-quality yet realistic details. You can further boost **InstantIR** performance with additional text prompts, even achieve customized editing!
<!-- >**Abstract**: <br>
> Handling test-time unknown degradation is the major challenge in Blind Image Restoration (BIR), necessitating high model generalization. An effective strategy is to incorporate prior knowledge, either from human input or generative model. In this paper, we introduce Instant-reference Image Restoration (InstantIR), a novel diffusion-based BIR method which dynamically adjusts generation condition during inference. We first extract a compact representation of the input via a pre-trained vision encoder. At each generation step, this representation is used to decode current diffusion latent and instantiate it in the generative prior. The degraded image is then encoded with this reference, providing robust generation condition. We observe the variance of generative references fluctuate with degradation intensity, which we further leverage as an indicator for developing a sampling algorithm adaptive to input quality. Extensive experiments demonstrate InstantIR achieves state-of-the-art performance and offering outstanding visual quality. Through modulating generative references with textual description, InstantIR can restore extreme degradation and additionally feature creative restoration. -->
<img src='assets/teaser_figure.png'>
## π’ News
- **11/03/2024** π₯ We provide a Gradio launching script for InstantIR, you can now deploy it on your local machine!
- **11/02/2024** π₯ InstantIR is now compatitble with 𧨠`diffusers`, you can utilize features from this fascinating package!
- **10/15/2024** π₯ Code and model released!
## π TODOs:
- [ ] Launch online demo
- [x] Remove dependency on local `diffusers`
- [x] Gradio launching script
## β¨ Usage
<!-- ### Online Demo
We provide a Gradio Demo on π€, click the button below and have fun with InstantIR! -->
### Quick start
#### 1. Clone this repo and setting up environment
```sh
git clone https://github.com/JY-Joy/InstantIR.git
cd InstantIR
conda create -n instantir python=3.9 -y
conda activate instantir
pip install -r requirements.txt
```
#### 2. Download pre-trained models
InstantIR is built on SDXL and DINOv2. You can download them either directly from π€ huggingface or using Python package.
| π€ link | Python command
| :--- | :----------
|[SDXL](https://huggingface.co./stabilityai/stable-diffusion-xl-base-1.0) | `hf_hub_download(repo_id="stabilityai/stable-diffusion-xl-base-1.0")`
|[facebook/dinov2-large](https://huggingface.co./facebook/dinov2-large) | `hf_hub_download(repo_id="facebook/dinov2-large")`
|[InstantX/InstantIR](https://huggingface.co./InstantX/InstantIR) | `hf_hub_download(repo_id="InstantX/InstantIR")`
Note: Make sure to import the package first with `from huggingface_hub import hf_hub_download` if you are using Python script.
#### 3. Inference
You can run InstantIR inference using `infer.sh` with the following arguments specified.
```sh
infer.sh \
--sdxl_path <path_to_SDXL> \
--vision_encoder_path <path_to_DINOv2> \
--instantir_path <path_to_InstantIR> \
--test_path <path_to_input> \
--out_path <path_to_output>
```
See `infer.py` for more config options.
#### 4. Using tips
InstantIR is powerful, but with your help it can do better. InstantIR's flexible pipeline makes it tunable to a large extent. Here are some tips we found particularly useful for various cases you may encounter:
- **Over-smoothing**: reduce `--cfg` to 3.0ο½5.0. Higher CFG scales can sometimes rigid lines or lack of details.
- **Low fidelity**: set `--preview_start` to 0.1~0.4 to preserve fidelity from inputs. The previewer can yield misleading references when input latent is too noisy. In such cases, we suggest to disable the previewer at early timesteps.
- **Local distortions**: set `--creative_start` to 0.6~0.8. This will let InstantIR render freely in the late diffusion process, where the high-frequency details are generated. Smaller `--creative_start` spares more spaces for creative restoration, but will diminish fidelity.
- **Faster inference**: higher `--preview_start` and lower `--creative_start` can both reduce computational costs and accelerate InstantIR inference.
> [!CAUTION]
> These features are training-free and thus experimental. If you would like to try, we suggest to tune these parameters case-by-case.
### Use InstantIR with diffusers π§¨
InstantIR is fully compatible with `diffusers` and is supported by all those powerful features in this package. You can directly load InstantIR via `diffusers` snippet:
```py
# !pip install diffusers opencv-python transformers accelerate
import torch
from PIL import Image
from diffusers import DDPMScheduler
from schedulers.lcm_single_step_scheduler import LCMSingleStepScheduler
from module.ip_adapter.utils import load_adapter_to_pipe
from pipelines.sdxl_instantir import InstantIRPipeline
# suppose you have InstantIR weights under ./models
instantir_path = f'./models'
# load pretrained models
pipe = InstantIRPipeline.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', torch_dtype=torch.float16)
# load adapter
load_adapter_to_pipe(
pipe,
f"{instantir_path}/adapter.pt",
image_encoder_or_path = 'facebook/dinov2-large',
)
# load previewer lora
pipe.prepare_previewers(instantir_path)
pipe.scheduler = DDPMScheduler.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', subfolder="scheduler")
lcm_scheduler = LCMSingleStepScheduler.from_config(pipe.scheduler.config)
# load aggregator weights
pretrained_state_dict = torch.load(f"{instantir_path}/aggregator.pt")
pipe.aggregator.load_state_dict(pretrained_state_dict)
# send to GPU and fp16
pipe.to(device='cuda', dtype=torch.float16)
pipe.aggregator.to(device='cuda', dtype=torch.float16)
```
Then, you just need to call the `pipe` and InstantIR will handle your image!
```py
# load a broken image
low_quality_image = Image.open('./assets/sculpture.png').convert("RGB")
# InstantIR restoration
image = pipe(
image=low_quality_image,
previewer_scheduler=lcm_scheduler,
).images[0]
```
### Deploy local gradio demo
We provide a python script to launch a local gradio demo of InstantIR, with basic and some advanced features implemented. Start by running the following command in your terminal:
```sh
INSTANTIR_PATH=<path_to_InstantIR> python gradio_demo/app.py
```
Then, visit your local demo via your browser at `http://localhost:7860`.
## βοΈ Training
### Prepare data
InstantIR is trained on [DIV2K](https://www.kaggle.com/datasets/joe1995/div2k-dataset), [Flickr2K](https://www.kaggle.com/datasets/daehoyang/flickr2k), [LSDIR](https://data.vision.ee.ethz.ch/yawli/index.html) and [FFHQ](https://www.kaggle.com/datasets/rahulbhalley/ffhq-1024x1024). We adopt dataset weighting to balance the distribution. You can config their weights in ```config_files/IR_dataset.yaml```. Download these training sets and put them under a same directory, which will be used in the following training configurations.
### Two-stage training
As described in our paper, the training of InstantIR is conducted in two stages. We provide corresponding `.sh` training scripts for each stage. Make sure you have the following arguments adapted to your own use case:
| Argument | Value
| :--- | :----------
| `--pretrained_model_name_or_path` | path to your SDXL folder
| `--feature_extractor_path` | path to your DINOv2 folder
| `--train_data_dir` | your training data directory
| `--output_dir` | path to save model weights
| `--logging_dir` | path to save logs
| `<num_of_gpus>` | number of available GPUs
Other training hyperparameters we used in our experiments are provided in the corresponding `.sh` scripts. You can tune them according to your own needs.
## π Acknowledgment
Our work is sponsored by [HuggingFace](https://huggingface.co.) and [fal.ai](https://fal.ai).
## π Citation
If InstantIR is helpful to your work, please cite our paper via:
```
@article{huang2024instantir,
title={InstantIR: Blind Image Restoration with Instant Generative Reference},
author={Huang, Jen-Yuan and Wang, Haofan and Wang, Qixun and Bai, Xu and Ai, Hao and Xing, Peng and Huang, Jen-Tse},
journal={arXiv preprint arXiv:2410.06551},
year={2024}
}
```
|