Tau LLM Unity ML Agents Project

Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.

Project Overview

This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.

Key Features

Reinforcement Learning: Utilizes the PPO algorithm for training agents.
Unity ML-Agents: Integrates with Unity ML-Agents for a seamless training experience.
Custom Reward Functions: Implements gradient-based reward functions for nuanced feedback.
Memory Networks: Incorporates memory networks to handle temporal dependencies.
TensorBoard Integration: Monitors training progress and performance using TensorBoard.

Configuration

Below is the configuration used for training the agents:

behaviors:
  TauAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 256
      buffer_size: 4096
      learning_rate: 0.00003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 10
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 4
      vis_encode_type: simple
      memory:
        memory_size: 256
        sequence_length: 256
        num_layers: 4
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        gamma: 0.995
        strength: 0.1
        network_settings:
          normalize: true
          hidden_units: 256
          num_layers: 4
          learning_rate: 0.00003
    keep_checkpoints: 10
    checkpoint_interval: 100000
    threaded: true
    max_steps: 3000000
    time_horizon: 256
    summary_freq: 10000

Model Naming Convention

The models in this repository follow the naming convention Tau_<series>_<max_steps>. This helps in easily identifying the series and the number of training steps for each model.

Getting Started

Prerequisites

Unity 6
Unity ML-Agents Toolkit
Python 3.10.11
PyTorch
Transformers

Installation

Clone the repository:

git clone https://github.com/p3nGu1nZz/Tau.git
cd tau\MLAgentsProject

Install the required Python packages:
```
pip install -r requirements.txt
```
Open the Unity project:
- Launch Unity Hub and open the project folder.

Training the Agent

To start training the agent, run the following command:

mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force

Note: The preferred way to run a build is by creating a new build into the Build directory which is referenced by the above command.

Monitoring Training

You can monitor the training progress using TensorBoard:

tensorboard --logdir results

Results

The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:

Citation

If you use this project in your research, please cite it as follows:

@misc{Tau,
  author = {K. Rawson},
  title = {Tau LLM Unity ML Agents Project},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Unity ML-Agents Toolkit
TensorFlow and PyTorch communities
Hugging Face for hosting the model repository