Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Minseok Bae
commited on
Commit
•
818ee3d
1
Parent(s):
5bcc476
modified about.py
Browse files- src/display/about.py +6 -6
src/display/about.py
CHANGED
@@ -19,7 +19,7 @@ class Tasks(Enum):
|
|
19 |
|
20 |
|
21 |
# Your leaderboard name
|
22 |
-
TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation (
|
23 |
|
24 |
# What does your leaderboard evaluate?
|
25 |
INTRODUCTION_TEXT = """
|
@@ -32,15 +32,15 @@ This leaderboard evaluates how often an LLM introduces hallucinations when summa
|
|
32 |
LLM_BENCHMARKS_TEXT = """
|
33 |
## Introduction
|
34 |
|
35 |
-
The Hughes Hallucination Evaluation Model (
|
36 |
|
37 |
Hallucinations refer to instances where a model introduces factually incorrect or unrelated content in its summaries.
|
38 |
|
39 |
## How it works
|
40 |
|
41 |
-
Using Vectara's
|
42 |
-
Given a source document and a summary generated by an LLM,
|
43 |
-
The model card for
|
44 |
|
45 |
## Evaluation Dataset
|
46 |
|
@@ -54,7 +54,7 @@ We generate summaries for each of these documents using submitted LLMs and compu
|
|
54 |
- Average Summary Length: The average word count of generated summaries
|
55 |
|
56 |
## Note on non-Hugging Face models
|
57 |
-
On
|
58 |
If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at [email protected].
|
59 |
|
60 |
## Model Submissions and Reproducibility
|
|
|
19 |
|
20 |
|
21 |
# Your leaderboard name
|
22 |
+
TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation Model (HHEM) leaderboard</h1>"""
|
23 |
|
24 |
# What does your leaderboard evaluate?
|
25 |
INTRODUCTION_TEXT = """
|
|
|
32 |
LLM_BENCHMARKS_TEXT = """
|
33 |
## Introduction
|
34 |
|
35 |
+
The Hughes Hallucination Evaluation Model (HHEM) Leaderboard is dedicated to assessing the frequency of hallucinations in document summaries generated by Large Language Models (LLMs).
|
36 |
|
37 |
Hallucinations refer to instances where a model introduces factually incorrect or unrelated content in its summaries.
|
38 |
|
39 |
## How it works
|
40 |
|
41 |
+
Using Vectara's HHEM, we measure the occurrence of hallucinations in generated summaries.
|
42 |
+
Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
|
43 |
+
The model card for HHEM can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
|
44 |
|
45 |
## Evaluation Dataset
|
46 |
|
|
|
54 |
- Average Summary Length: The average word count of generated summaries
|
55 |
|
56 |
## Note on non-Hugging Face models
|
57 |
+
On HHEM leaderboard, There are currently models such as GPT variants that are not available on the Hugging Face model hub. We ran the evaluations for these models on our own and uploaded the results to the leaderboard.
|
58 |
If you would like to submit your model that is not available on the Hugging Face model hub, please contact us at [email protected].
|
59 |
|
60 |
## Model Submissions and Reproducibility
|