DontPlanToEnd
commited on
Commit
β’
b2f72d4
1
Parent(s):
44bdb77
Update app.py
Browse files
app.py
CHANGED
@@ -3,7 +3,7 @@ import pandas as pd
|
|
3 |
|
4 |
# Define the columns for the UGI Leaderboard
|
5 |
UGI_COLS = [
|
6 |
-
'#P', 'Model', 'UGI π', '
|
7 |
]
|
8 |
|
9 |
# Load the leaderboard data from a CSV file
|
@@ -61,9 +61,9 @@ GraInter = gr.Blocks()
|
|
61 |
with GraInter:
|
62 |
gr.HTML("""<h1 align="center">UGI Leaderboard</h1>""")
|
63 |
gr.Markdown("""
|
64 |
-
UGI: Uncensored General Intelligence. The average of 5 different subjects that LLMs are commonly steered away from. The leaderboard is made
|
65 |
|
66 |
-
|
67 |
|
68 |
Unruly: Knowledge of activities that are generally frowned upon.
|
69 |
|
|
|
3 |
|
4 |
# Define the columns for the UGI Leaderboard
|
5 |
UGI_COLS = [
|
6 |
+
'#P', 'Model', 'UGI π', 'W/10 π', 'Unruly', 'Internet', 'CrimeStats', 'Stories/Jokes', 'PolContro'
|
7 |
]
|
8 |
|
9 |
# Load the leaderboard data from a CSV file
|
|
|
61 |
with GraInter:
|
62 |
gr.HTML("""<h1 align="center">UGI Leaderboard</h1>""")
|
63 |
gr.Markdown("""
|
64 |
+
UGI: Uncensored General Intelligence. The average of 5 different subjects that LLMs are commonly steered away from. The leaderboard is made of roughly 60 questions/tasks, measuring both "willingness to answer" and "accuracy" in fact-based controversial questions.
|
65 |
|
66 |
+
W/10: A more narrow, 10-point score, solely measuring the LLM's Willingness to answer controversial questions.
|
67 |
|
68 |
Unruly: Knowledge of activities that are generally frowned upon.
|
69 |
|