Spaces:
Running
Running
ango
commited on
Commit
•
d9d05e0
1
Parent(s):
8403841
fix content error
Browse files- assets/content.py +0 -22
assets/content.py
CHANGED
@@ -157,29 +157,7 @@ ABOUT_HTML = """
|
|
157 |
|
158 |
<h4>Season For Dynamic Evaluation</h4>
|
159 |
<p>Thanks to sampling strategies optimized for ANGO, we can periodically sample the test set and update the leaderboard. This prevents certain institutions or individuals from maliciously hacking ANGO to inflate the model's performance. However, due to the limited number of questions in some key areas, dynamic iteration may not be feasible for all questions.</p>
|
160 |
-
<p>There are two special attributes in ANGO:</p>
|
161 |
-
|
162 |
-
<ul>
|
163 |
-
<li>
|
164 |
-
<strong>Human Acc:</strong> Refers to the accuracy of humans in this question.
|
165 |
-
</li>
|
166 |
-
<li>
|
167 |
-
<strong>Most Wrong:</strong> Represents the option that humans are prone to get wrong.
|
168 |
-
</li>
|
169 |
-
</ul>
|
170 |
-
|
171 |
-
<p>So based on these two attributes, we have derived two new metrics for evaluation:</p>
|
172 |
|
173 |
-
<ul>
|
174 |
-
<li>
|
175 |
-
<strong>Wrong Hit:</strong> Refers to the number of times the model's incorrect predictions match the options that humans are prone to get wrong.
|
176 |
-
</li>
|
177 |
-
<li>
|
178 |
-
<strong>Wrong Value:</strong> Calculated by taking the average of the human accuracy for all the questions in wrong_hit and subtracting that value from 1.
|
179 |
-
</li>
|
180 |
-
</ul>
|
181 |
-
|
182 |
-
<p>Wrong Value and Wrong Hit do not express the model's ability to perfectly solve the problem, but rather to some extent demonstrate the similarity between the model and real humans. Due to intentional guidance or design errors in the questions, humans often exhibit a tendency for widespread errors. In such cases, if the model's predicted answer is similar to the widespread human error tendency, it indicates that the model's way of thinking is closer to that of the majority of ordinary humans.</p>
|
183 |
<h4>Question Elimination Mechanism</h4>
|
184 |
<p>In addition to the aforementioned dynamic updating of season, a new question elimination mechanism has been proposed. This mechanism calculates the average accuracy of each question across all models for each iteration. Questions with accuracies exceeding a threshold are temporarily removed by ANGO to ensure reliable discrimination among questions in ANGO.</p>
|
185 |
"""
|
|
|
157 |
|
158 |
<h4>Season For Dynamic Evaluation</h4>
|
159 |
<p>Thanks to sampling strategies optimized for ANGO, we can periodically sample the test set and update the leaderboard. This prevents certain institutions or individuals from maliciously hacking ANGO to inflate the model's performance. However, due to the limited number of questions in some key areas, dynamic iteration may not be feasible for all questions.</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
<h4>Question Elimination Mechanism</h4>
|
162 |
<p>In addition to the aforementioned dynamic updating of season, a new question elimination mechanism has been proposed. This mechanism calculates the average accuracy of each question across all models for each iteration. Questions with accuracies exceeding a threshold are temporarily removed by ANGO to ensure reliable discrimination among questions in ANGO.</p>
|
163 |
"""
|