πŸŽ₯ Distilled Mochi Transformer

Current repository contains distilled transformer for genmoai mochi-1. This transformer consists of 42 blocks vs 48 blocks in original transformer.

Training details

We have analized MSE of latent after each block and iteratively dropped blocks which have minimum value of MSE.

After each block drop we have trained neighboring blocks (one before and one after deleted block) for 1K steps.

πŸš€ Try it here: Interactive Demo


Usage

Minimal code example

import torch
from diffusers import MochiPipeline, MochiTransformer3DModel
from diffusers.utils import export_to_video

transformer = MochiTransformer3DModel.from_pretrained(
    "NimVideo/mochi-1-transformer-42",
    torch_dtype=torch.bfloat16,
)
pipe = MochiPipeline.from_pretrained(
    "genmo/mochi-1-preview", 
    transformer=transformer,
    variant="bf16", 
    torch_dtype=torch.bfloat16
)

pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
frames = pipe(prompt, num_frames=85).frames[0]

export_to_video(frames, "mochi.mp4", fps=30)

Acknowledgements

Original code and models mochi.

Contacts

Issues should be raised directly in the repository.

Downloads last month
31
Inference API
Inference API (serverless) does not yet support diffusers models for this pipeline type.