Spaces:
Running
on
CPU Upgrade
What exactly is difference between "finetuned" and "instruction-tuned"?
I read the original thread about what "instruction tuned" means.
It seemed basically if the model card mentions instruction tuning.
So most LLMs as I understand it will be instruction tuned in some ways with few exceptions. What is the utility of simple 'fine tuning"?
GPT3 was a pretrained model. InstructGPT was SFT instruction tuned which lead to GPT3.5/text-davinci-03
Flan-T5 paper showed that instruction tuning leads to many generalizations across tasks.
Tuning on dialog and RLHR is good for chat modes which is finer point above just "instruction tuning" which gave ChatGPT
Almost all modern methods have followed the OpenAI method of Instruciton tuning. I don't know which models would not be Instruction tuned except maybe the older model that already got evaluated.
There are a few specialist model that do not do instruction tuning like replit models that are just for code autocomplete but these models won't score comparatively on general leaderboard like this anyway.
So having said all that I am not sure there is a huge benefit to distinguish instruction tuning vs fine-tuning vs chat tuned.
Hi
@felixz
!
Instruction-tuning is quite "recent" (originated with Flan, T0, and the Natural Instructions papers, so around 2021?), and as you mentioned a lot of prior models are simply fine-tuned. (RLHF is even more recent).
Some models are still "only" fine-tuned today (on higher quality or in domain datasets for example, you can think about biomedical or legal LLMs).
So this is one reason for which the distinction is useful.
We also want to distinguishing between vanilla ft/rl/instruction-ft, as these different model types can have different performances on the tasks we have in the leaderboard - it makes more sense for users to be able to compare models "category wise".