File size: 1,590 Bytes
dda5a55
a0cea6b
62c74e7
f865386
ef51540
 
 
ca9ea73
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
CAMEL-33B-Combined-Data is a chat large language model obtained by finetuning LLaMA-33B model on a total of 229K conversations collected through our CAMEL framework, 100K English public conversations from ShareGPT that can be found here, and 52K instructions from Alpaca dataset that can be found here. We evaluate our model offline using EleutherAI's language model evaluation harness used by Huggingface's Open LLM Benchmark. CAMEL-33B scores an average of 64.2.

Regarding the prompt format, we follow the same prompt as LMSYS's [FastChat](https://github.com/lm-sys/FastChat/tree/main) Vicuna-13B-1.1 conversation template. It assumes a conversation between a user and AI assistant seperated by a <\/s> at the end of every role message. More details can be found [here](https://github.com/lm-sys/FastChat/blob/daa2b9abe20597ebf34dc5df164d450456610c74/fastchat/conversation.py#LL247C1-L247C1).

---
license: cc-by-nc-4.0
---

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co./spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co./datasets/open-llm-leaderboard/details_camel-ai__CAMEL-33B-Combined-Data)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 50.79   |
| ARC (25-shot)         | 62.97          |
| HellaSwag (10-shot)   | 83.83    |
| MMLU (5-shot)         | 58.98         |
| TruthfulQA (0-shot)   | 50.21   |
| Winogrande (5-shot)   | 78.3   |
| GSM8K (5-shot)        | 14.1        |
| DROP (3-shot)         | 7.12         |