THUDM/CogVideoX-5b · 为什么量化还是T4卡不能运行

Sep 3, 2024

OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB. GPU 0 has a total capacity of 14.75 GiB of which 7.37 GiB is free. Process 4566 has 7.38 GiB memory in use. Of the allocated memory 6.97 GiB is allocated by PyTorch, and 284.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

quantization = int8_weight_only
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())
# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Sep 3, 2024

请查看github issue置顶的信息

astar987

Sep 3, 2024

3090 could run within comfyui

zRzRzRzRzRzRzR changed discussion status to closed Sep 4, 2024