danlou
/

relay-v0.1-Mistral-Nemo-2407

@@ -14,7 +14,7 @@ library_name: transformers
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/rNGTfSfFWyWc9mEgyxTGL.png" width="800"/>
-- [Introduction: LLMs as IRC](#introduction-llms-as-ircs)
 - [How to use](#how-to-use)
 - [Safety testing](#safety-testing)
 - [Fine-tuning setup](#fine-tuning-setup)
@@ -30,7 +30,7 @@ What does it take to chat with a base LLM?
 Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may [limit](https://x.com/aidan_mclau/status/1860026205547954474) the creativity of LLMs.
 LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
-Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
 We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
 Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no instruct datasets or reward methods. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning setup](#fine-tuning-setup)).

 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/rNGTfSfFWyWc9mEgyxTGL.png" width="800"/>
+- [Introduction: LLMs as IRCs](#introduction-llms-as-ircs)
 - [How to use](#how-to-use)
 - [Safety testing](#safety-testing)
 - [Fine-tuning setup](#fine-tuning-setup)
 Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may [limit](https://x.com/aidan_mclau/status/1860026205547954474) the creativity of LLMs.
 LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
+Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats. [IRC](https://www.youtube.com/watch?v=O2rGTXHvPCQ) provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
 We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
 Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no instruct datasets or reward methods. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning setup](#fine-tuning-setup)).