@m-ric on Hugging Face: "𝗬𝗼𝘂 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 "𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗰𝗮𝗹𝗹𝗶𝗻𝗴…"

m-ric

posted an update Jul 2

Post

2659

𝗬𝗼𝘂 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 "𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗰𝗮𝗹𝗹𝗶𝗻𝗴 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴" 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝗴𝗼𝗼𝗱 𝗮𝗴𝗲𝗻𝘁𝘀 ⛔

It's trendy to share models "fine-tuned for function calling"; but from my observations, this fine-tuning is not necessary or sufficient to build good agent systems.
To name only a few:
🐦‍⬛ Nexusflow/𝗡𝗲𝘅𝘂𝘀𝗥𝗮𝘃𝗲𝗻-𝗩𝟮-𝟭𝟯𝗕
⌘ CohereForAI/𝗰𝟰𝗮𝗶-𝗰𝗼𝗺𝗺𝗮𝗻𝗱-𝗿-𝗽𝗹𝘂𝘀
⛵️ mistralai/𝗠𝗶𝘅𝘁𝗿𝗮𝗹-𝟴𝘅𝟮𝟮𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁-𝘃𝟬.𝟭
"Fine-tuned for function-calling" generally means "fine-tuned to generate function calls in correct JSON for extremely simple tasks". In other terms, it means "improve the formatting of the tool calls".

Yet I discovered two things while improving Transformers Agents:
🧐 Even when used as JSON agents, these fine-tuned models don't perform very well
🏅 𝙂𝙤𝙤𝙙 𝙗𝙖𝙨𝙚 𝙢𝙤𝙙𝙚𝙡𝙨 𝙥𝙚𝙧𝙛𝙤𝙧𝙢 𝙗𝙚𝙩𝙩𝙚𝙧 𝙬𝙞𝙩𝙝𝙤𝙪𝙩 𝙖𝙣𝙮 𝙛𝙞𝙣𝙚-𝙩𝙪𝙣𝙞𝙣𝙜, 𝙟𝙪𝙨𝙩 𝙥𝙡𝙖𝙞𝙣 𝙥𝙧𝙤𝙢𝙥𝙩𝙞𝙣𝙜. (Llama-3-70B-Instruct, GPT-4o, Claude-3.5-Sonnet)

👇 The graph below shows the count of errors for my GPT-4o validation run on the GAIA benchmark: 𝙰𝚐𝚎𝚗𝚝𝙿𝚊𝚛𝚜𝚒𝚗𝚐𝙴𝚛𝚛𝚘𝚛 and 𝙰𝚐𝚎𝚗𝚝𝙴𝚡𝚎𝚌𝚞𝚝𝚒𝚘𝚗𝙴𝚛𝚛𝚘𝚛 are the ones caused by incorrect formatting.
➤ As you can see, their count is already close to 0!
And given that GPT-4o is certainly not fine-tuned for our Code tool calling format, this shows that "function calling fine-tuning" is not necessary!

The hardest thing to get right in an agent is still to 𝙥𝙡𝙖𝙣 𝙜𝙤𝙤𝙙 𝙩𝙖𝙨𝙠-𝙨𝙤𝙡𝙫𝙞𝙣𝙜 𝙩𝙧𝙖𝙟𝙚𝙘𝙩𝙤𝙧𝙞𝙚𝙨 𝙤𝙫𝙚𝙧 𝙨𝙚𝙫𝙚𝙧𝙖𝙡 𝙨𝙩𝙚𝙥𝙨.
To improve this, we could:
- Use more powerful base models
- Make tool calling datasets with complex solving trajectories
- Use RL! cc @lvwerra

Surendar0701

Jul 2

This comment has been hidden

gxkok

Jul 3

Perhaps using GPT-4o for evaluation is not the best way to do it?

m-ric

Jul 4

It's not using GPT-4o for evaluation, evaluation is done with exact string match!

Join the conversation