A planner LLM fine-tuned on synthetic trajectories from an agent simulation. It can be used in ReAct-style LLM agents where planning is separated from function calling. Trajectory generation and planner fine-tuning are described in the bot-with-plan project.
The planner has been fine-tuned on the krasserm/gba-trajectories dataset. 8-bit and 4-bit quantized GGUF versions of this model are available at krasserm/gba-planner-7B-v0.1-GGUF
Usage example
Load the model and the tokenizer.
import json
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
GenerationConfig,
)
device = "cuda:0"
repo_id = "krasserm/gba-planner-7B-v0.1"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
quantization_config=bnb_config,
device_map=device,
)
Define a prompt that contains the user request and past task-observation pairs of the current trajectory (context information).
prompt = """User request:
```
Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies.
```
Context information:
```
Task: Find the last 5 movies released by DreamWorks.
Result: The last five movies released by DreamWorks are "The Bad Guys" (2022), "Boss Baby: Family Business" (2021), "Trolls World Tour" (2020), "Abominable" (2019), and "How to Train Your Dragon: The Hidden World" (2019).
Task: Search the internet for the Rotten Tomatoes score of "The Bad Guys" (2022)
Result: The Rotten Tomatoes score of "The Bad Guys" (2022) is 88%.
```
Plan the next step."""
Then generate a plan for the next step in the trajectory.
instruct_template = "[INST] {prompt} [/INST]{{"
instruct_prompt = instruct_template.format(prompt=prompt)
input_ids = tokenizer(instruct_prompt, return_tensors="pt", max_length=1024, truncation=True)["input_ids"]
input_ids = input_ids.to("cuda:0")
generation_config = GenerationConfig(
max_new_tokens=512,
do_sample=False,
eos_token_id=model.config.eos_token_id,
pad_token_id=model.config.pad_token_id,
)
with torch.no_grad():
result = model.generate(input_ids, generation_config=generation_config)
result = result[:, input_ids.shape[1] :]
decoded = tokenizer.batch_decode(result, skip_special_tokens=True)
decoded_dict = json.loads("{" + decoded[0])
print(json.dumps(decoded_dict, indent=2))
{
"context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
"thoughts": "Since we have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the next movie in the list, \"Boss Baby: Family Business\". This will allow us to calculate the average score for the first two movies.",
"task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).",
"selected_tool": "search_internet"
}
The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the search_internet tool, which results in the next observation on the trajectory. If the final_answer
tool is selected, a final answer is available or can be generated from the trajectory.
Tools
The planner learned a (static) set of available tools during fine-tuning. These are:
Tool name | Tool description |
---|---|
ask_user |
Useful for asking user about information missing in the request. |
calculate_number |
Useful for numerical tasks that result in a single number. |
create_event |
Useful for adding a single entry to my calendar at given date and time. |
search_wikipedia |
Useful for searching factual information in Wikipedia. |
search_internet |
Useful for up-to-date information on the internet. |
send_email |
Useful for sending an email to a single recipient. |
use_bash |
Useful for executing commands in a Linux bash. |
final_answer |
Useful for providing the final answer to a request. Must always be used in the last step. |
The framework provided by the bot-with-plan project can easily be adjusted to a different set of tools for specialization to other application domains.
- Downloads last month
- 20