Added thinking ablation evaluation results

#3
by ranarag - opened
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -166,7 +166,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
166
 
167
  **Evaluation Results:**
168
  <table>
169
-
170
  <thead>
171
  <tr>
172
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
@@ -300,7 +300,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
300
 
301
  <tr>
302
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
303
- <td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
304
  <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
305
  <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
306
  <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
@@ -315,7 +315,51 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
315
  </tr>
316
 
317
 
318
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
319
 
320
 
321
  </tbody></table>
 
166
 
167
  **Evaluation Results:**
168
  <table>
169
+ <caption><b>Comparison with Other Models</b></caption>
170
  <thead>
171
  <tr>
172
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
 
300
 
301
  <tr>
302
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
303
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
304
  <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
305
  <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
306
  <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
 
315
  </tr>
316
 
317
 
318
+ <table>
319
+ <caption><b>Thinking Ablation</b></caption>
320
+ <thead>
321
+ <tr>
322
+ <th rowspan="2" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
323
+ <th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
324
+ <th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
325
+ </tr>
326
+ <tr>
327
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
328
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
329
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
330
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
331
+ </tr></thead>
332
+ <tbody>
333
+ <tr>
334
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
335
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
336
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
337
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
338
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
339
+ </tr>
340
+ <tr>
341
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
342
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
343
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
344
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
345
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
346
+ </tr>
347
+ <tr>
348
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
349
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
350
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
351
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
352
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
353
+ </tr>
354
+ <tr>
355
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
356
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
357
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
358
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
359
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
360
+ </tr>
361
+ </tbody>
362
+ </table>
363
 
364
 
365
  </tbody></table>