Yotam Perlitz commited on
Commit
1138892
1 Parent(s): acd921a

update images location

Browse files
Files changed (2) hide show
  1. .gitignore +0 -2
  2. app.py +3 -3
.gitignore CHANGED
@@ -1,6 +1,4 @@
1
  .vscode/launch.json
2
  .vscode/settings.json
3
  .DS_Store
4
- # assets/ablations.png
5
- # assets/motivation.png
6
  images/*
 
1
  .vscode/launch.json
2
  .vscode/settings.json
3
  .DS_Store
 
 
4
  images/*
app.py CHANGED
@@ -280,7 +280,7 @@ st.markdown(
280
  )
281
 
282
  st.image(
283
- "assets/motivation.png",
284
  caption="Conclusions depend on the models considered. Kendall-tau correlations between the LMSys Arena benchmark and three other benchmarks: BBH, MMLU, and Alpaca v2. Each group of bars represents the correlation for different sets of top models, specifically the top 5, top 10, and top 15 (overlapping) models (according to the Arena). The results indicate that the degree of agreement between benchmarks varies with the number of top models considered, highlighting that different selections of models can lead to varying conclusions about benchmark agreement.",
285
  use_column_width=True,
286
  )
@@ -297,7 +297,7 @@ st.markdown(
297
  )
298
 
299
  st.image(
300
- "assets/pointplot_granularity_matters.png",
301
  caption="Correlations increase with number of models. Mean correlation (y) between each benchmark (lines) and the rest, given different numbers of models. The Blue and Orange lines are the average of all benchmark pair correlations with models sampled randomly (orange) or in contiguous sets (blue). The shaded lines represents adjacent sampling for the different benchmarks.",
302
  use_column_width=True,
303
  )
@@ -316,7 +316,7 @@ st.markdown(
316
 
317
 
318
  st.image(
319
- "assets/ablations.png",
320
  caption="Our recommendations substantially reduce the variance of BAT. Ablation analysis for each BAT recommendation separately and their combinations.",
321
  use_column_width=True,
322
  )
 
280
  )
281
 
282
  st.image(
283
+ "images/motivation.png",
284
  caption="Conclusions depend on the models considered. Kendall-tau correlations between the LMSys Arena benchmark and three other benchmarks: BBH, MMLU, and Alpaca v2. Each group of bars represents the correlation for different sets of top models, specifically the top 5, top 10, and top 15 (overlapping) models (according to the Arena). The results indicate that the degree of agreement between benchmarks varies with the number of top models considered, highlighting that different selections of models can lead to varying conclusions about benchmark agreement.",
285
  use_column_width=True,
286
  )
 
297
  )
298
 
299
  st.image(
300
+ "images/pointplot_granularity_matters.png",
301
  caption="Correlations increase with number of models. Mean correlation (y) between each benchmark (lines) and the rest, given different numbers of models. The Blue and Orange lines are the average of all benchmark pair correlations with models sampled randomly (orange) or in contiguous sets (blue). The shaded lines represents adjacent sampling for the different benchmarks.",
302
  use_column_width=True,
303
  )
 
316
 
317
 
318
  st.image(
319
+ "images/ablations.png",
320
  caption="Our recommendations substantially reduce the variance of BAT. Ablation analysis for each BAT recommendation separately and their combinations.",
321
  use_column_width=True,
322
  )