1024m commited on
Commit
dd01bb2
·
verified ·
1 Parent(s): 524bb32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -115,6 +115,7 @@ trained on a mixed language dataset.
115
  - ~1% better performance on English Tasks compared to the original (average benchmark scores)
116
  - ~4% better performance on Hindi Tasks compared to the original (average benchmark scores)
117
  - ~10% less emissions than the original (as reported on benchmark evaluations like open-llm-leaderboard)
 
118
 
119
  ### Model Details:
120
 
@@ -253,6 +254,14 @@ Unlike distillation from reasoining or CoT models which produced unnecessarily l
253
 
254
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/vgNk0bKthxNsxO0oAdaPD.png)
255
 
 
 
 
 
 
 
 
 
256
  ### Team
257
 
258
  - Ram Mohan Rao Kadiyala
 
115
  - ~1% better performance on English Tasks compared to the original (average benchmark scores)
116
  - ~4% better performance on Hindi Tasks compared to the original (average benchmark scores)
117
  - ~10% less emissions than the original (as reported on benchmark evaluations like open-llm-leaderboard)
118
+ - Less Biases due to ordering of choices while answering MCQs
119
 
120
  ### Model Details:
121
 
 
254
 
255
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/vgNk0bKthxNsxO0oAdaPD.png)
256
 
257
+ ### Model Responses vs Order of Choices in MCQs
258
+
259
+ As benchmarks like MMLU-Pro have upto 10 choices, while most training datasets consist of typically 4-5 choices, we modified the ordering and labelling of choices i.e re-ordering choices to create an imbalance opposing the original model's choice distribution, replacement of labels from A/B/C/D to a/b/c/d or 1/2/3/4 or w/x/y/z etc.. in 5% of the MCQ samples for better robustness
260
+ This resulted in less bias towards the earlier choices among MCQs as compared to the original phi-4. The below images are a distution of choices selected by the model while being evaluated over MMLU-pro
261
+
262
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/5DYCkLHpdk2jaTsALcwN8.png)
263
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c60dd7d655680b57ddbff/hhNNE4s8mALYsxdVf-UCq.png)
264
+
265
  ### Team
266
 
267
  - Ram Mohan Rao Kadiyala