TheDenk's picture
Update README.md
91c1e1c verified
|
raw
history blame
1.86 kB
---
license: apache-2.0
language:
- en
tags:
- video
- genmo
- diffusers
pipeline_tag: text-to-video
library_name: diffusers
---
# πŸŽ₯ Distilled Mochi Transformer
Current repository contains distilled transformer for genmoai mochi-1.
This transformer consists of 42 blocks vs 48 blocks in original transformer.
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/FCG0Mdzmlh-KsFk0v4ixl.mp4"></video>
### Training details
We have analized MSE of latent after each block and iteratively dropped blocks which have minimum value of MSE.
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/ILt2OOC_La0hcQIedkNhx.mp4"></video>
After each block drop we have trained neighboring blocks (one before and one after deleted block) for 1K steps.
### πŸš€ Try it here: [Interactive Demo](https://nim.video/create/2855fa68-21b1-4114-b366-53e5e4705ebf?workflow=image2video)
---
## Usage
#### Minimal code example
```python
import torch
from diffusers import MochiPipeline, MochiTransformer3DModel
from diffusers.utils import export_to_video
transformer = MochiTransformer3DModel.from_pretrained(
"NimVideo/mochi-1-transformer-42",
torch_dtype=torch.bfloat16,
)
pipe = MochiPipeline.from_pretrained(
"genmo/mochi-1-preview",
transformer=transformer,
variant="bf16",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
frames = pipe(prompt, num_frames=85).frames[0]
export_to_video(frames, "mochi.mp4", fps=30)
```
## Acknowledgements
Original code and models [mochi](https://github.com/genmoai/mochi).
## Contacts
<p>Issues should be raised directly in the repository.</p>