Code used to convert this / could you do v3 base?
#3
by
deltanym
- opened
Been trying for a while to get v3 base running, using 8xh100 on modal, which has been frustrating to use, and my best setup right now takes >10min per response. I'll try it in autoawq because deepseek v3 support was recently added, but if you could do it, I would appreciate that a lot.
AutoAWQ is very slow, it would take about half a day, but here's the code:
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
MODEL_PATH = "in"
QUANT_PATH = "out"
QUANT_CONFIG = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM", "modules_to_not_convert": ["self_attn.kv_a_proj_with_mqa"]}
def main():
model = AutoAWQForCausalLM.from_pretrained(MODEL_PATH, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model.quantize(
tokenizer,
quant_config=QUANT_CONFIG,
)
model.save_quantized(QUANT_PATH, shard_size="10GB")
tokenizer.save_pretrained(QUANT_PATH)
print(f"Model is quantized and saved at \"{QUANT_PATH}\".")
if __name__ == "__main__":
main()
v2ray
changed discussion status to
closed