为什么量化还是T4卡不能运行
#9
by
loong
- opened
OutOfMemoryError: CUDA out of memory. Tried to allocate 56.50 GiB. GPU 0 has a total capacity of 14.75 GiB of which 7.37 GiB is free. Process 4566 has 7.38 GiB memory in use. Of the allocated memory 6.97 GiB is allocated by PyTorch, and 284.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
quantization = int8_weight_only
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())
transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())
vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())
# Create pipeline and run inference
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
text_encoder=text_encoder,
transformer=transformer,
vae=vae,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
请查看github issue置顶的信息
3090 could run within comfyui
zRzRzRzRzRzRzR
changed discussion status to
closed