InvestLM
This is the repo for a new financial domain large language model, InvestLM, tuned on meta-llama/Meta-Llama-3.1-70B, using a carefully curated instruction dataset related to financial investment. We provide guidance on how to use InvestLM for inference.
Github Link: InvestLM
Test only, not for sharing.
About AWQ
AWQ is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.
Inference
Please use the following command to log in hugging face first.
huggingface-cli login
Generation
tokenizer = AutoTokenizer.from_pretrained(base_model,
padding_side = "left",
max_seq_length = 8192)
tokenizer.pad_token = "<|finetune_right_pad_id|>"
tokenizer.pad_token_id = 128004
model = AutoModelForCausalLM.from_pretrained(base_model,
torch_dtype=torch.bfloat16,
device_map = "auto",
attn_implementation="sdpa",
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.eval()
finetune_template = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a pirate finance AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
def generate(text):
prompt = finetune_template.format(input=text)
input_ids = tokenizer(prompt,
add_special_tokens=False,
return_tensors="pt").to(device)
with torch.no_grad():
outputs = model.generate(**input_ids,
max_length=8192,
do_sample = False,
pad_token_id=tokenizer.eos_token_id)
generation_output = outputs[:, input_ids.input_ids.shape[1]:]
output = tokenizer.decode(generation_output[0], skip_special_tokens=False)
return output
- Downloads last month
- 4