AutoGPT Forge: Crafting Intelligent Agent Logic
By Craig Swift & Ryan Brandt
Hey there! Ready for part 3 of our AutoGPT Forge tutorial series? If you missed the earlier parts, catch up here:
Now, let's get hands-on! We'll use an LLM to power our agent and complete a task. The challenge? Making the agent write "Washington" to a .txt file. We won't give it step-by-step instructions—just the task. Let's see our agent in action and watch it figure out the steps on its own!
Get Your Smart Agent Project Ready
Make sure you've set up your project and created an agent as described in our initial guide. If you skipped that part, click here to get started. Once you're done, come back, and we'll move forward.
In the image below, you'll see my "SmartAgent" and the agent.py file inside the 'forge' folder. That's where we'll be adding our LLM-based logic. If you're unsure about the project structure or agent functions from our last guide, don't worry. We'll cover the basics as we go!
The Task Lifecycle
The lifecycle of a task, from its creation to execution, is outlined in the agent protocol. In simple terms: a task is initiated, its steps are systematically executed, and it concludes once completed.
Want your agent to perform an action? Start by dispatching a create_task request. This crucial step involves specifying the task details, much like how you'd send a prompt to ChatGPT, using the input field. If you're giving this a shot on your own, the UI is your best friend; it effortlessly handles all the API calls on your behalf.
When the agent gets this, it runs the create_task function. The code super().create_task(task_request)
takes care of protocol steps. It then logs the task's start. For this guide, you don't need to change this function.
async def create_task(self, task_request: TaskRequestBody) -> Task:
"""
The agent protocol, which is the core of the Forge, works by creating a task and then
executing steps for that task. This method is called when the agent is asked to create
a task.
We are hooking into function to add a custom log message. Though you can do anything you
want here.
"""
task = await super().create_task(task_request)
LOG.info(
f"📦 Task created: {task.task_id} input: {task.input[:40]}{'...' if len(task.input) > 40 else ''}"
)
return task
After starting a task, the execute_step
function runs until all steps are done. Here's a basic view of execute_step
. I've left out the detailed comments for simplicity, but you'll find them in your project.
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# An example that
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
self.workspace.write(task_id=task_id, path="output.txt", data=b"Washington D.C")
await self.db.create_artifact(
task_id=task_id,
step_id=step.step_id,
file_name="output.txt",
relative_path="",
agent_created=True,
)
step.output = "Washington D.C"
LOG.info(f"\t✅ Final Step completed: {step.step_id}")
return step
Here's the breakdown of the 'write file' process in four steps:
Database Step Creation: The first stage is all about creating a step within the database, an essential aspect of the agent protocol. You'll observe that while setting up this step, we've flagged it with
is_last=True
. This signals to the agent protocol that no more steps are pending. For the purpose of this guide, let's work under the assumption that our agent will only tackle single-step tasks. However, hang tight for future tutorials, where we'll level up and let the agent determine its completion point.File Writing: Next, we pen down "Washington D.C." using the workspace.write function.
Artifact Database Update: After writing, we record the file in the agent's artifact database.
Step Output & Logging: Finally, we set the step output to match the file content, log the executed step, and use the step object.
With the 'write file' process clear, let's make our agent smarter and more autonomous. Ready to dive in?
Building the Foundations For Our Smart Agent
First, we need to update the execute_step()
function. Instead of a fixed solution, it should use the given request.
To do this, we'll fetch the task details using the provided task_id
:
task = await self.db.get_task(task_id)
Next, remember to create a database record and mark it as a single-step task with is_last=True
:
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
Your updated execute_step
function will look like this:
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# Get the task details
task = await self.db.get_task(task_id)
# Add a new step to the database
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
return step
Now that we've set this up, let's move to the next exciting part: The PromptEngine.
The Art of Prompting
Prompting is like shaping messages for powerful language models like ChatGPT. Since these models respond to input details, creating the right prompt can be a challenge. That's where the PromptEngine comes in.
The "PromptEngine" helps you store prompts in text files, specifically in Jinja2 templates. This means you can change the prompts without changing the code. It also lets you adjust prompts for different LLMs. Here's how to use it:
First, add the PromptEngine from the SDK:
from .sdk import PromptEngine
In your execute_step
function, set up the engine for the gpt-3.5-turbo
LLM:
prompt_engine = PromptEngine("gpt-3.5-turbo")
Loading a prompt is straightforward. For instance, loading the system-format
prompt, which dictates the response format from the LLM, is as easy as:
system_prompt = prompt_engine.load_prompt("system-format")
For intricate use cases, like the task-step
prompt which requires parameters, employ the following method:
# Define the task parameters
task_kwargs = {
"task": task.input,
"abilities": self.abilities.list_abilities_for_prompt(),
}
# Load the task prompt with those parameters
task_prompt = prompt_engine.load_prompt("task-step", **task_kwargs)
Delving deeper, let's look at the task-step
prompt template in prompts/gpt-3.5-turbo/task-step.j2
:
{% extends "techniques/expert.j2" %}
{% block expert %}Planner{% endblock %}
{% block prompt %}
Your task is:
{{ task }}
Ensure to respond in the given format. Always make autonomous decisions, devoid of user guidance. Harness the power of your LLM, opting for straightforward tactics sans any legal entanglements.
{% if constraints %}
## Constraints
Operate under these confines:
{% for constraint in constraints %}
- {{ constraint }}
{% endfor %}
{% endif %}
{% if resources %}
## Resources
Utilize these resources:
{% for resource in resources %}
- {{ resource }}
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Summon these abilities:
{% for ability in abilities %}
- {{ ability }}
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Use these abilities:
{% for ability in abilities %}
- {{ ability }}
{% endfor %}
{% endif %}
{% if best_practices %}
## Best Practices
{% for best_practice in best_practices %}
- {{ best_practice }}
{% endfor %}
{% endif %}
{% endblock %}
This template is modular. It uses the extends
directive to build on the expert.j2
template. The different sections like constraints, resources, abilities, and best practices make the prompt dynamic. It guides the LLM in understanding the task and using resources and abilities.
The PromptEngine equips us with a potent tool to converse seamlessly with large language models. By externalizing prompts and using templates, we can ensure that our agent remains agile, adapting to new challenges without a code overhaul. As we march forward, keep this foundation in mind—it's the bedrock of our agent's intelligence.
Engaging with your LLM
To make the most of the LLM, you'll send a series of organized instructions, not just one prompt. Structure your prompts as a list of messages for the LLM. Using the system_prompt
and task_prompt
from before, create the messages
list:
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": task_prompt}
]
With the prompt set, send it to the LLM. This step involves foundational code, focusing on the chat_completion_request
. This function gives the LLM your prompt, and then gets the LLM's output. The other code sets up our request and interprets the feedback:
try:
# Set the parameters for the chat completion
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Get the LLM's response and interpret it
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response.choices[0].message.content)
# Log the answer for reference
LOG.info(pprint.pformat(answer))
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Can't decode chat response: {chat_response}")
except Exception as e:
# Handle other errors
LOG.error(f"Can't get chat response: {e}")
Extracting clear messages from LLM outputs can be complex. Our method is simple and works with GPT-3.5 and GPT-4. Future guides will show more ways to interpret LLM outputs. The goal? To go beyond JSON, as some LLMs work best with other response types. Stay tuned!
Using and Creating Abilities
Abilities are the gears and levers that enable the agent to interact with tasks at hand. Let's unpack the mechanisms behind these abilities and how you can harness, and even extend, them.
In the Forge folder, there's a actions
folder containing registry.py
, finish.py
, and a file_system
subfolder. You can also add your own abilities here. registry.py
is the main file for abilities. It contains the @action
decorator and the ActionRegister
class. This class actively tracks abilities and outlines their function. The base Agent class includes a default Action register available via self.abilities
. It looks like this:
self.abilities = ActionRegister(self)
The ActionRegister
has two key methods. list_abilities_for_prompt
prepares abilities for prompts. run_action
makes the ability work. An ability is a function with the @action
decorator. It must have specific parameters, including the agent and task_id
.
@action(
name="write_file",
description="Write data to a file",
parameters=[
{
"name": "file_path",
"description": "Path to the file",
"type": "string",
"required": True,
},
{
"name": "data",
"description": "Data to write to the file",
"type": "bytes",
"required": True,
},
],
output_type="None",
)
async def write_file(agent, task_id: str, file_path: str, data: bytes) -> None:
pass
The @action
decorator defines the ability's details, like its identity (name), functionality (description), and operational parameters.
Example of a Custom Ability: Webpage Fetcher
import requests
@action(
name="fetch_webpage",
description="Retrieve the content of a webpage",
parameters=[
{
"name": "url",
"description": "Webpage URL",
"type": "string",
"required": True,
}
],
output_type="string",
)
async def fetch_webpage(agent, task_id: str, url: str) -> str:
response = requests.get(url)
return response.text
This ability, fetch_webpage
, accepts a URL as input and returns the HTML content of the webpage as a string. Custom abilities let you add more features to your agent. They can integrate other tools and libraries to enhance its functions. To make a custom ability, you need to understand the structure and add technical details. With abilities like "fetch_webpage", your agent can handle complex tasks efficiently.
Running an Ability
Now that you understand abilities and how to create them, let's use them. The last piece is the execute_step
function. Our goal is to understand the agent's response, find the ability, and use it.
First, we get the ability details from the agent's answer:
# Extract the ability from the answer
ability = answer["ability"]
With the ability details, we use it. We call the run_ability
function:
# Run the ability and get the output
# We don't actually use the output in this example
output = await self.abilities.run_action(
task_id, ability["name"], **ability["args"]
)
Here, we’re invoking the specified ability. The task_id ensures continuity, ability['name'] pinpoints the exact function, and the arguments (ability["args"]) provide necessary context.
Finally, we make the step's output show the agent's thinking:
# Set the step output to the "speak" part of the answer
step.output = answer["thoughts"]["speak"]
# Return the completed step
return step
And there you have it! Your first Smart Agent, sculpted with precision and purpose, stands ready to take on challenges. The stage is set. It’s showtime!
Here is what your function should look like:
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# Firstly we get the task this step is for so we can access the task input
task = await self.db.get_task(task_id)
# Create a new step in the database
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
# Log the message
LOG.info(f"\t✅ Final Step completed: {step.step_id} input: {step.input[:19]}")
# Initialize the PromptEngine with the "gpt-3.5-turbo" model
prompt_engine = PromptEngine("gpt-3.5-turbo")
# Load the system and task prompts
system_prompt = prompt_engine.load_prompt("system-format")
# Initialize the messages list with the system prompt
messages = [
{"role": "system", "content": system_prompt},
]
# Define the task parameters
task_kwargs = {
"task": task.input,
"abilities": self.abilities.list_abilities_for_prompt(),
}
# Load the task prompt with the defined task parameters
task_prompt = prompt_engine.load_prompt("task-step", **task_kwargs)
# Append the task prompt to the messages list
messages.append({"role": "user", "content": task_prompt})
try:
# Define the parameters for the chat completion request
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Make the chat completion request and parse the response
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response.choices[0].message.content)
# Log the answer for debugging purposes
LOG.info(pprint.pformat(answer))
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Unable to decode chat response: {chat_response}")
except Exception as e:
# Handle other exceptions
LOG.error(f"Unable to generate chat response: {e}")
# Extract the ability from the answer
ability = answer["ability"]
# Run the ability and get the output
# We don't actually use the output in this example
output = await self.abilities.run_action(
task_id, ability["name"], **ability["args"]
)
# Set the step output to the "speak" part of the answer
step.output = answer["thoughts"]["speak"]
# Return the completed step
return step
Interacting with your Agent
⚠️ Heads up: The UI and benchmark are still in the oven, so they might be a tad glitchy.
With the heavy lifting of crafting our Smart Agent behind us, it’s high time to see it in action. Kick things off by firing up the agent with this command:
./run agent start SmartAgent.
Once your digital playground is all set, your terminal should light up with:
d8888 888 .d8888b. 8888888b. 88888888888
d88888 888 d88P Y88b 888 Y88b 888
d88P888 888 888 888 888 888 888
d88P 888 888 888 888888 .d88b. 888 888 d88P 888
d88P 888 888 888 888 d88""88b 888 88888 8888888P" 888
d88P 888 888 888 888 888 888 888 888 888 888
d8888888888 Y88b 888 Y88b. Y88..88P Y88b d88P 888 888
d88P 888 "Y88888 "Y888 "Y88P" "Y8888P88 888 888
8888888888
888
888
8888888 .d88b. 888d888 .d88b. .d88b.
888 d88""88b 888P" d88P"88b d8P Y8b
888 888 888 888 888 888 88888888
888 Y88..88P 888 Y88b 888 Y8b.
888 "Y88P" 888 "Y88888 "Y8888
888
Y8b d88P
"Y88P" v0.2.0
[2023-09-27 15:39:07,832] [forge.sdk.agent] [INFO] 📝 Agent server starting on http://localhost:8000
Get Started
- Click the link to access the AutoGPT Agent UI.
Login
- Log in using your Gmail or Github credentials.
Navigate to Benchmarking
- Look to the left, and you'll spot a trophy icon. Click it to enter the benchmarking arena.
Select the 'WriteFile' Test
- Choose the 'WriteFile' test from the available options.
Initiate the Test Suite
- Hit 'Initiate test suite' to start the benchmarking process.
Monitor in Real-Time
- Keep your eyes on the right panel as it displays real-time output.
Check the Console
- For additional information, you can also monitor your console for progress updates and messages.
📝 📦 Task created: 70518b75-0104-49b0-923e-f607719d042b input: Write the word 'Washington' to a .txt fi...
📝 ✅ Final Step completed: a736c45f-65a5-4c44-a697-f1d6dcd94d5c input: y
If you see this, you've done it!
- Troubleshooting
- If you encounter any issues or see cryptic error messages, don't worry. Just hit the retry button. Remember, LLMs are powerful but may occasionally need some guidance.
Wrap Up
- Stay tuned for our next tutorial, where we'll enhance the agent's capabilities by adding memory!
Keep Exploring
- Keep experimenting and pushing the boundaries of AI. Happy coding! 🚀
Wrap Up
In our next tutorial, we’ll further refine this process, enhancing the agent’s capabilities, through the addition of memory!
Until then, keep experimenting and pushing the boundaries of AI. Happy coding! 🚀