Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Jul 31
Post
1100
๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ณ๐—ถ๐—ป๐—ฎ๐—น๐—น๐˜† ๐—ด๐—ฒ๐˜ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—–๐—ต๐—ฎ๐˜๐—ฏ๐—ผ๐˜ ๐—”๐—ฟ๐—ฒ๐—ป๐—ฎ ๐—ฟ๐—ฎ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐ŸŽ–๏ธ

Given the impressive benchmarks published my Meta for their Llama-3.1 models, I was curious to see how these models would compare to top proprietary models on Chatbot Arena.

Now we've got the results! LMSys released the ELO derived from thousands of user votes for the new models, and here are the rankings:

๐Ÿ’ฅ 405B Model ranks 5th overall, in front of GPT-4-turbo! But behind GPT-4o, Claude-3.5 Sonnet and Gemini-advanced.
๐Ÿ‘ 70B Model climbs up to 9th rank ! From 1206 โžก๏ธ 1244.
๐Ÿ‘ 8B Model improves from 1152 โžก๏ธ 1170.

โœ… This confirms that Llama-3.1 is a good contender for any task: any of its 3 model size is much cheaper to run than equivalent proprietary models!

For instance, here are the inference prices for the top models;
โžค GPT-4-Turbo inference price from OpenAI: $5/M input tokens, $15/M output tokens
โžค Llama-3.1-405B from HF API (for testing only): 3$/M for input or output tokens (Source linked in the first comment)
โžค Llama-3.1-405B from HF API (for testing only): free โœจ

Get a head start on the HF API (resource by @andrewrreed ) ๐Ÿ‘‰ https://huggingface.co./learn/cookbook/enterprise_hub_serverless_inference_api

It's really nice to see an open source model in the frontier territory! ๐Ÿค—