DontPlanToEnd commited on
Commit
a4d1d48
1 Parent(s): 03dab03

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -287,6 +287,8 @@ with GraInter:
287
  **Std:** The standard deviation of the model's predicted ratings. <0.5 means the model mostly spammed one number, 0.5-0.75: ~two numbers, 0.75-1: ~three, etc. Around 1.7-2.3 is a good distribution of ratings.
288
  <br>
289
  **Score:** A combination of Dif, Cor, and Std.
 
 
290
  """)
291
 
292
  gr.Markdown("### **NA models:**")
 
287
  **Std:** The standard deviation of the model's predicted ratings. <0.5 means the model mostly spammed one number, 0.5-0.75: ~two numbers, 0.75-1: ~three, etc. Around 1.7-2.3 is a good distribution of ratings.
288
  <br>
289
  **Score:** A combination of Dif, Cor, and Std.
290
+ <br><br>
291
+ The question this leaderboard focuses on could've benefited from being multiple prediction prompts each with different input and prediction lists, then averaging the accuracy of each list of predictions together. This would have reduced the variability of prediction accuracy and created a ranking with fewer outliers. Implementing these improvements will have to wait until it is absolutely nesessary to update the leaderboard's questions due to how long it takes to retest all of the models.
292
  """)
293
 
294
  gr.Markdown("### **NA models:**")