Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ library_name: transformers
|
|
14 |
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/rNGTfSfFWyWc9mEgyxTGL.png" width="800"/>
|
16 |
|
17 |
-
- [Introduction: LLMs as
|
18 |
- [How to use](#how-to-use)
|
19 |
- [Safety testing](#safety-testing)
|
20 |
- [Fine-tuning setup](#fine-tuning-setup)
|
@@ -30,7 +30,7 @@ What does it take to chat with a base LLM?
|
|
30 |
Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may [limit](https://x.com/aidan_mclau/status/1860026205547954474) the creativity of LLMs.
|
31 |
LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
|
32 |
|
33 |
-
Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
|
34 |
We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
|
35 |
|
36 |
Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no instruct datasets or reward methods. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning setup](#fine-tuning-setup)).
|
|
|
14 |
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/rNGTfSfFWyWc9mEgyxTGL.png" width="800"/>
|
16 |
|
17 |
+
- [Introduction: LLMs as IRCs](#introduction-llms-as-ircs)
|
18 |
- [How to use](#how-to-use)
|
19 |
- [Safety testing](#safety-testing)
|
20 |
- [Fine-tuning setup](#fine-tuning-setup)
|
|
|
30 |
Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may [limit](https://x.com/aidan_mclau/status/1860026205547954474) the creativity of LLMs.
|
31 |
LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
|
32 |
|
33 |
+
Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats. [IRC](https://www.youtube.com/watch?v=O2rGTXHvPCQ) provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
|
34 |
We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
|
35 |
|
36 |
Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no instruct datasets or reward methods. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning setup](#fine-tuning-setup)).
|