Sagemaker Deployment.

#1
by PrajwalM - opened

Can you Provide the Script for deploying this model on Sagemaker?
I was trying the same using following config using latest huggingface TGI image using g5.12xlarge.

config = {
'HF_MODEL_ID': "TheBloke/dolphin-2.7-mixtral-8x7b-AWQ", # path to where sagemaker stores the model
'SM_NUM_GPUS': json.dumps(2), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(3072), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(4096), # Max length of the generation (including input text)
'MAX_BATCH_TOTAL_TOKENS': json.dumps(16000),
'HF_MODEL_QUANTIZE': "gptq"
}

but got the following error.
ValueError: sharded is not supported for AutoModel.
Error: ShardCannotStart

Needed some solution for this.
Thanks in advance.

Sign up or log in to comment