Maybe remove the `+` signs in the demo code?
The +
signs in the Python code seem to stem from a diff or something and may confuse people. Please remove them (or enlighten me about their use). thanks.
hi @petergrubercom those are used to highlight the diff that users need to apply to enable things like 4bit inference or FA-2, you can refer to the basic usage shared on the first snippet and apply manually the changes which are a 1 LoC change
I get what the team have done. It is not clear however about doing two things - for example: Load the model with Flash Attention 2 and half-precision settings.
I used:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
Load the model with Flash Attention 2 and half-precision settings
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, attn_implementation="flash_attention_2")
Define the prompt
prompt = "My name is"
Tokenize the prompt
model_inputs = tokenizer([prompt], return_tensors="pt")
Generate text based on the prompt
generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))
I am coming across quite a few issues loading this on to a Windows 11 box however. I am more than happy to write out and end to end install instruction for Windows 11 (a lot of people I know want to do this). Shall I post separately about what I have found and how I have been dociumenting it so far (e.g. minimum requirements, Cude Toolkit installation, torch, wheel etc.)
I am coming across quite a few issues loading this on to a Windows 11 box however. I am more than happy to write out and end to end install instruction for Windows 11
That would be really great
@paddyofitz
!
I think that you can post a new issue with a clear title that explains it is about how end-to-end instructions for Windows 11, it will be definitely extremely helpful for the community
I will do - just stripping my build right back to work out the dependency chain on pip as if this was the first thing going on a machine @ybelkada