InvestLM

This is the repo for a new financial domain large language model, InvestLM, tuned on meta-llama/Meta-Llama-3.1-70B, using a carefully curated instruction dataset related to financial investment. We provide guidance on how to use InvestLM for inference.

Github Link: InvestLM

Test only, not for sharing.

About AWQ

AWQ is an efficient, accurate, and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

Inference

Please use the following command to log in hugging face first.

huggingface-cli login

Generation

tokenizer = AutoTokenizer.from_pretrained(base_model,
                                          padding_side = "left",
                                          max_seq_length = 8192)
tokenizer.pad_token = "<|finetune_right_pad_id|>"
tokenizer.pad_token_id = 128004
model = AutoModelForCausalLM.from_pretrained(base_model, 
                                             torch_dtype=torch.bfloat16,
                                             device_map = "auto",
                                             attn_implementation="sdpa",
                                             )
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.eval()


finetune_template = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a pirate finance AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

def generate(text):
    prompt = finetune_template.format(input=text)
    input_ids = tokenizer(prompt, 
                                            add_special_tokens=False, 
                        return_tensors="pt").to(device)
     with torch.no_grad():
            outputs = model.generate(**input_ids,
                                    max_length=8192,
                                    do_sample = False,
                                    pad_token_id=tokenizer.eos_token_id)
        generation_output = outputs[:, input_ids.input_ids.shape[1]:]
        output = tokenizer.decode(generation_output[0], skip_special_tokens=False)
        return output

yixuantt
/

InvestLM-Llama-3.1-70B-AWQ

You need to agree to share your contact information to access this model

InvestLM

About AWQ

Inference

Generation