sunitha-ravi commited on
Commit
c916de7
1 Parent(s): bf52265

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -98,6 +98,10 @@ The model was evaluated on [PatronusAI/HaluBench](https://huggingface.co/dataset
98
 
99
  | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
100
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
 
 
 
 
101
  | RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
102
  | Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
103
  | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |
 
98
 
99
  | Model | HaluEval | RAGTruth | FinanceBench | DROP | CovidQA | PubmedQA | Overall
100
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
101
+ | GPT-4o | 87.9% | 84.3% | **85.3%** | 84.3% | 95.0% | 82.1% | 86.5% |
102
+ | GPT-4-Turbo | 86.0% | **85.0%** | 82.2% | 84.8% | 90.6% | 83.5% | 85.0% |
103
+ | GPT-3.5-Turbo | 62.2% | 50.7% | 60.9% | 57.2% | 56.7% | 62.8% | 58.7% |
104
+ | Claude-3.5-Sonnet | 84.5% | 79.1% | 69.3% | 69.7% | 70.8% |84.8% |83.7%|
105
  | RAGAS Faithfulness | 70.6% | 75.8% | 59.5% | 59.6% | 75.0% | 67.7% | 66.9% |
106
  | Mistral-Instruct-7B | 78.3% | 77.7% | 56.3% | 56.3% | 71.7% | 77.9% | 69.4% |
107
  | Llama-3-Instruct-8B | 83.1% | 80.0% | 55.0% | 58.2% | 75.2% | 70.7% | 70.4% |