sunitha-ravi
commited on
Commit
•
c916de7
1
Parent(s):
bf52265
Update README.md
Browse files
README.md
CHANGED
@@ -98,6 +98,10 @@ The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/dataset
|
|
98 |
|
99 |
| Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
|
100 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
|
|
|
|
|
|
|
|
101 |
| RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
|
102 |
| Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
|
103 |
| Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
|
|
|
98 |
|
99 |
| Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
|
100 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
101 |
+
| GPT-4o | 87.9% | 84.3% | **85.3%** | 84.3% | 95.0% | 82.1% | 86.5% |
|
102 |
+
| GPT-4-Turbo | 86.0% | **85.0%** | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
|
103 |
+
| GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
|
104 |
+
| Claude-3.5-Sonnet | 84.5% | 79.1% | 69.3% | 69.7% | 70.8% |84.8% |83.7%|
|
105 |
| RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
|
106 |
| Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
|
107 |
| Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
|