huggigfacemodel deployment
Did you test deploying on sagemaker using huggigfacemodel API similar to this notebook:
I cloned repo and uploaded model.tar.gz to s3. When deploying it, I got error that task need to be set:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Task couldn\u0027t be inferenced from ReplitLM.Inference Toolkit can only inference tasks from architectures ending with [\u0027TapasForQuestionAnswering\u0027, \u0027ForQuestionAnswering\u0027, \u0027ForTokenClassification\u0027, \u0027ForSequenceClassification\u0027, \u0027ForMultipleChoice\u0027, \u0027ForMaskedLM\u0027, \u0027ForCausalLM\u0027, \u0027ForConditionalGeneration\u0027, \u0027MTModel\u0027, \u0027EncoderDecoderModel\u0027, \u0027GPT2LMHeadModel\u0027, \u0027T5WithLMHeadModel\u0027].Use env HF_TASK
to define your task."
}
After setting task by feeding HF_TASK as env variables, I got this error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/model requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option trust_remote_code\u003dTrue
to remove this error."
}
Here is my implementation:
hub = {
'HF_TASK':'text-generation'
}
huggingface_model = HuggingFaceModel(
model_data=s3_location, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17.0", # transformers version used
pytorch_version="1.10.2", # pytorch version used
py_version='py38', # python version used
env=hub
)
deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.8xlarge",
)
Would you please help me.
@arminnorouzi Can you please tell how did you download the model and upload to s3. i am using sagemaker notebook to do that but running out of space even if larger instance is used.
@JoselinSushma I used relatively large instances as I was trying a larger model (starcoder). I tried this, and it worked: ml.g4dn.2xlarge
@arminnorouzi
, Thank you,
I tried adding custom function for model_fn and predict_fn. Reference : https://huggingface.co./docs/sagemaker/inference
model_fn to load model with trust_remote=True as given in model card . It worked.
However Invocation in sagemaker takes more than 60sec which timesout if a longer code has to be generated.
@JoselinSushma is it possible to share the custom function you wrote here? Also, for deployment, did you use these versions?
transformers_version="4.17.0", # transformers version used
pytorch_version="1.10.2", # pytorch version used
py_version='py38', # python version used
def model_fn(model_dir):
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True)
code_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
return code_generator
Thanks
@JoselinSushma
for providing support on this issue!
Closing for now, as you seem to have made it work correctly π