Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

Any reason why this longer context length wasn't applied to the chat and instruct versions?

#29
by RonanMcGovern - opened

It would be super useful to have more than 2048 sequence length for chat or instruct.

In the model cards we explain how to take advantage of ALiBi so that you can increase maximum sequence length during inference:

Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:

config = transformers.AutoConfig.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  trust_remote_code=True
)
config.update({"max_seq_len": 4096})
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True
)
jacobfulano changed discussion status to closed
RonanMcGovern changed discussion status to open

Many thanks @jacobfulano .

Is it possible to directly deploy with that configuration using Sagemaker, or would I have to take a route like building a docker image including the config update? Thanks

HI @RonanMcGovern , I'm not sure what the limitations of Sagemaker are, but as long as you can pass in a custom HF config, you can set config.max_seq_len=4096 dynamically at start time. It's just a config arg, no need for custom Docker images or anything.

abhi-mosaic changed discussion status to closed

Hi @abhi-mosaic I've managed to get MPT-7B running well on Google collab with:

    # use cache directory while loading model and tokenizer
    self.model = AutoModelForCausalLM.from_pretrained(
        model_name,
        cache_dir=cache_dir,
        torch_dtype=torch_dtype,
        trust_remote_code=trust_remote_code,
        use_auth_token=use_auth_token,
    )

I'm then trying to add in the config for max_seq_len but not getting any joy:

        # Load the configuration
        config = transformers.AutoConfig.from_pretrained(model_name,
                                                        trust_remote_code=trust_remote_code)
        # Explicitly set the max_seq_len
        config.max_seq_len = 4096

        # Load the model with the updated configuration
        self.model = transformers.AutoModelForCausalLM.from_pretrained(
            model_name,
            config=config,
            cache_dir=cache_dir,
            torch_dtype=torch_dtype,
            trust_remote_code=trust_remote_code,
            use_auth_token=use_auth_token,
        )
RonanMcGovern changed discussion status to open

Sign up or log in to comment