Any reason why this longer context length wasn't applied to the chat and instruct versions?
It would be super useful to have more than 2048 sequence length for chat or instruct.
In the model cards we explain how to take advantage of ALiBi so that you can increase maximum sequence length during inference:
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
config = transformers.AutoConfig.from_pretrained(
'mosaicml/mpt-7b-instruct',
trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-7b-instruct',
config=config,
trust_remote_code=True
)
Many thanks @jacobfulano .
Is it possible to directly deploy with that configuration using Sagemaker, or would I have to take a route like building a docker image including the config update? Thanks
HI
@RonanMcGovern
, I'm not sure what the limitations of Sagemaker are, but as long as you can pass in a custom HF config, you can set config.max_seq_len=4096
dynamically at start time. It's just a config arg, no need for custom Docker images or anything.
Hi @abhi-mosaic I've managed to get MPT-7B running well on Google collab with:
# use cache directory while loading model and tokenizer
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
cache_dir=cache_dir,
torch_dtype=torch_dtype,
trust_remote_code=trust_remote_code,
use_auth_token=use_auth_token,
)
I'm then trying to add in the config for max_seq_len but not getting any joy:
# Load the configuration
config = transformers.AutoConfig.from_pretrained(model_name,
trust_remote_code=trust_remote_code)
# Explicitly set the max_seq_len
config.max_seq_len = 4096
# Load the model with the updated configuration
self.model = transformers.AutoModelForCausalLM.from_pretrained(
model_name,
config=config,
cache_dir=cache_dir,
torch_dtype=torch_dtype,
trust_remote_code=trust_remote_code,
use_auth_token=use_auth_token,
)