legolasyiu commited on
Commit
2923906
·
verified ·
1 Parent(s): 86975c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -24,6 +24,71 @@ This llama model was trained 2x faster with [Unsloth](https://github.com/unsloth
24
 
25
  Fireball-Llama-3.11-V1
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
28
 
29
  ## Responsibility & Safety
 
24
 
25
  Fireball-Llama-3.11-V1
26
 
27
+ How to use
28
+ This repository contains Fireball-Llama-3.11-V1 , for use with transformers and with the original llama codebase.
29
+
30
+ Use with transformers
31
+
32
+ # Transformer method
33
+ Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
34
+
35
+ Make sure to update your transformers installation via pip install --upgrade transformers.
36
+ Example:
37
+ ````py
38
+ !pip install -U transformers trl peft accelerate bitsandbytes
39
+ ````
40
+
41
+ ````py
42
+ import torch
43
+ from transformers import (
44
+ AutoModelForCausalLM,
45
+ AutoTokenizer,
46
+ )
47
+
48
+ base_model = "EpistemeAI/Fireball-Llama-3.11-8B-v1orpo"
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ base_model,
51
+ torch_dtype=torch.bfloat16,
52
+ device_map="auto",
53
+ )
54
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
55
+
56
+ sys = "You are help assistant " \
57
+ "(Advanced Natural-based interaction for the language)."
58
+
59
+ messages = [
60
+ {"role": "system", "content": sys},
61
+ {"role": "user", "content": "What is DPO vs ORPO fine tuning"},
62
+ ]
63
+
64
+ #Method 1
65
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
66
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
67
+ for k,v in inputs.items():
68
+ inputs[k] = v.cuda()
69
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
70
+ results = tokenizer.batch_decode(outputs)[0]
71
+ print(results)
72
+
73
+ #Method 2
74
+ import transformers
75
+ pipe = transformers.pipeline(
76
+ model=model,
77
+ tokenizer=tokenizer,
78
+ return_full_text=False, # langchain expects the full text
79
+ task='text-generation',
80
+ max_new_tokens=512, # max number of tokens to generate in the output
81
+ temperature=0.6, #temperature for more or less creative answers
82
+ do_sample=True,
83
+ top_p=0.9,
84
+ )
85
+
86
+ sequences = pipe(messages)
87
+ for seq in sequences:
88
+ print(f"{seq['generated_text']}")
89
+ ````
90
+
91
+
92
  <img src="https://huggingface.co/EpistemeAI/Fireball-Llama-3.1-8B-v1dpo/resolve/main/fireball-llama.JPG" width="200"/>
93
 
94
  ## Responsibility & Safety