Added thinking ablation evaluation results
#3
by
ranarag
- opened
README.md
CHANGED
@@ -166,7 +166,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
166 |
|
167 |
**Evaluation Results:**
|
168 |
<table>
|
169 |
-
|
170 |
<thead>
|
171 |
<tr>
|
172 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
@@ -300,7 +300,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
300 |
|
301 |
<tr>
|
302 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
303 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
304 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
305 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
306 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
@@ -315,7 +315,51 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
315 |
</tr>
|
316 |
|
317 |
|
318 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
319 |
|
320 |
|
321 |
</tbody></table>
|
|
|
166 |
|
167 |
**Evaluation Results:**
|
168 |
<table>
|
169 |
+
<caption><b>Comparison with Other Models</b></caption>
|
170 |
<thead>
|
171 |
<tr>
|
172 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
|
|
300 |
|
301 |
<tr>
|
302 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
303 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
|
304 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
305 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
306 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
|
|
315 |
</tr>
|
316 |
|
317 |
|
318 |
+
<table>
|
319 |
+
<caption><b>Thinking Ablation</b></caption>
|
320 |
+
<thead>
|
321 |
+
<tr>
|
322 |
+
<th rowspan="2" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
323 |
+
<th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
|
324 |
+
<th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
|
325 |
+
</tr>
|
326 |
+
<tr>
|
327 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
328 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
329 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
330 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
331 |
+
</tr></thead>
|
332 |
+
<tbody>
|
333 |
+
<tr>
|
334 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
335 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
|
336 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
|
337 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
338 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
339 |
+
</tr>
|
340 |
+
<tr>
|
341 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
342 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
|
343 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
|
344 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
345 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
346 |
+
</tr>
|
347 |
+
<tr>
|
348 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
349 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
|
350 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
|
351 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
|
352 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
|
353 |
+
</tr>
|
354 |
+
<tr>
|
355 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
356 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
|
357 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
|
358 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
|
359 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
360 |
+
</tr>
|
361 |
+
</tbody>
|
362 |
+
</table>
|
363 |
|
364 |
|
365 |
</tbody></table>
|