metadata
base_model: THUDM/CogVideoX-5b
datasets: finetrainers/crush-smol
library_name: diffusers
license: other
license_link: https://huggingface.co./THUDM/CogVideoX-5b/blob/main/LICENSE
instance_prompt: >-
DIFF_crush A red candle is placed on a metal platform, and a large metal
cylinder descends from above, flattening the candle as if it were under a
hydraulic press. The candle is crushed into a flat, round shape, leaving a
pile of debris around it.
widget:
- text: >-
DIFF_crush A red candle is placed on a metal platform, and a large metal
cylinder descends from above, flattening the candle as if it were under a
hydraulic press. The candle is crushed into a flat, round shape, leaving a
pile of debris around it.
output:
url: ./assets/output_0.mp4
- text: >-
DIFF_crush A bulb is placed on a wooden platform, and a large metal
cylinder descends from above, crushing the bulb as if it were under a
hydraulic press. The bulb is crushed into a flat, round shape, leaving a
pile of debris around it.
output:
url: ./assets/output_1.mp4
- text: >-
DIFF_crush A thick burger is placed on a dining table, and a large metal
cylinder descends from above, crushing the burger as if it were under a
hydraulic press. The bulb is crushed, leaving a pile of debris around it.
output:
url: ./assets/output_2.mp4
tags:
- text-to-video
- diffusers-training
- diffusers
- cogvideox
- cogvideox-diffusers
- template:sd-lora
- Prompt
- DIFF_crush A red candle is placed on a metal platform, and a large metal cylinder descends from above, flattening the candle as if it were under a hydraulic press. The candle is crushed into a flat, round shape, leaving a pile of debris around it.
- Prompt
- DIFF_crush A bulb is placed on a wooden platform, and a large metal cylinder descends from above, crushing the bulb as if it were under a hydraulic press. The bulb is crushed into a flat, round shape, leaving a pile of debris around it.
- Prompt
- DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
This is a fine-tune of the THUDM/CogVideoX-5b model on the finetrainers/crush-smol dataset. We also provide a LoRA variant of the params. Check it out here.
Code: https://github.com/a-r-r-o-w/finetrainers
This is an experimental checkpoint and its poor generalization is well-known.
Inference code:
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch
transformer = CogVideoXTransformer3DModel.from_pretrained(
"finetrainers/crush-smol-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
Training logs are available on WandB here.
LoRA
We extracted a 64-rank LoRA from the finetuned checkpoint (script here). This LoRA can be used to emulate the same kind of effect:
Code
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_crush_smol_lora_64.safetensors")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output_lora.mp4", fps=25)