|
--- |
|
license: apache-2.0 |
|
inference: false |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# PlanLLM |
|
|
|
<img src="https://i.imgur.com/nHuVNAn.png" alt="drawing" style="width:300px;"/> |
|
|
|
## Model Details |
|
|
|
PlanLLM is a conversational assistant trained to assist users in completing a recipe from beginning to end and be able to answer any related or relevant requests that the user might have. |
|
The model was also tested with DIY Tasks and performed similarly. |
|
|
|
### Training |
|
|
|
PlanLLM was trained by fine-tuning a [Vicuna](https://huggingface.co./lmsys/vicuna-7b-v1.1) model on synthetic dialogue between users and an assistant about a given recipe. |
|
The model was first trained using SFT and then using Direct Preference Optimization (DPO). |
|
|
|
#### Details |
|
|
|
SFT: |
|
- Train Type: Fully Sharded Data Parallel (FSDP) with 4 A100 40GB GPUs |
|
- Batch Size: 1 |
|
- Gradient Acc. Steps: 64 |
|
- Train steps: 600 |
|
|
|
DPO: |
|
- Train Type: Low-Rank Adaptation (LoRA) with 1 A100 40GB GPU |
|
- LoRA Rank: 64 |
|
- LoRA Alpha: 16 |
|
- Batch Size: 1 |
|
- Gradient Acc. Steps: 64 |
|
- Train steps: 350 |
|
|
|
|
|
### Dataset |
|
|
|
PlanLLM was trained on synthetic user-system dialogues where the role of the system is to aid the user in completing a predetermined task. For our case, we used recipes. |
|
|
|
These dialogues were generated using the user utterances collected from Alexa users who interacted with TWIZ, our entry in the Alexa Prize Taskbot Challenge 1. |
|
Using an intent classifier we mapped each user utterance to a specific intent allowing us to collect intent-specific utterances and a dialogue graph of each dialogue (with intents being the graph nodes). |
|
For the system responses, we used a combination of templates, external knowledge sources, and Large Language Models. |
|
|
|
Using this we built a pipeline that would navigate a dialogue graph generating user requests and system responses for each turn, creating complete dialogues that follow a similar dialogue pattern used by real users. |
|
|
|
#### Details |
|
|
|
SFT: |
|
- Dialogues: 10k (90/5/5 splits) |
|
- Recipes: 1000 |
|
|
|
DPO: |
|
- Dialogues: 3k (90/5/5 splits) |
|
- Recipes: 1000 (same recipes used for SFT) |
|
|
|
|
|
### License |
|
|
|
It's the same as Vicuna. A non-commercial Apache 2.0 license. |
|
|
|
### Paper |
|
|
|
["Plan-Grounded Large Language Models for Dual Goal Conversational Settings" (Accepted at EACL 2024) |
|
Diogo Glória-Silva, Rafael Ferreira, Diogo Tavares, David Semedo, João Magalhães](https://arxiv.org/abs/2402.01053) |
|
|
|
#### Cite Us! |
|
|
|
``` |
|
@InProceedings{planllm_eacl24, |
|
author="Glória-Silva, Diogo |
|
and Ferreira, Rafael |
|
and Tavares, Diogo |
|
and Semedo, David |
|
and Magalhães, João", |
|
title="Plan-Grounded Large Language Models for Dual Goal Conversational Settings", |
|
booktitle="European Chapter of the Association for Computational Linguistics (EACL 2024)", |
|
year="2024", |
|
} |
|
``` |