Edit model card

Visualize in Weights & Biases

lr_small_e5_sft

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2372

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.41e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
1.2662 0.0128 1 1.2401
1.2266 0.0256 2 1.2401
1.2594 0.0384 3 1.2401
1.1592 0.0512 4 1.2400
1.2268 0.064 5 1.2400
1.2151 0.0768 6 1.2400
1.2202 0.0896 7 1.2400
1.2694 0.1024 8 1.2400
1.2341 0.1152 9 1.2400
1.2519 0.128 10 1.2399
1.2733 0.1408 11 1.2399
1.2018 0.1536 12 1.2399
1.1774 0.1664 13 1.2399
1.2955 0.1792 14 1.2399
1.2297 0.192 15 1.2399
1.2685 0.2048 16 1.2399
1.2485 0.2176 17 1.2398
1.2604 0.2304 18 1.2398
1.2048 0.2432 19 1.2398
1.2037 0.256 20 1.2398
1.2399 0.2688 21 1.2398
1.212 0.2816 22 1.2398
1.2505 0.2944 23 1.2398
1.2966 0.3072 24 1.2397
1.2109 0.32 25 1.2397
1.2595 0.3328 26 1.2397
1.2283 0.3456 27 1.2397
1.2027 0.3584 28 1.2397
1.2004 0.3712 29 1.2397
1.189 0.384 30 1.2397
1.1946 0.3968 31 1.2396
1.2234 0.4096 32 1.2396
1.269 0.4224 33 1.2396
1.2421 0.4352 34 1.2396
1.2031 0.448 35 1.2396
1.1914 0.4608 36 1.2396
1.2188 0.4736 37 1.2396
1.2434 0.4864 38 1.2395
1.276 0.4992 39 1.2395
1.212 0.512 40 1.2395
1.2414 0.5248 41 1.2395
1.2359 0.5376 42 1.2395
1.1813 0.5504 43 1.2395
1.2172 0.5632 44 1.2395
1.2052 0.576 45 1.2395
1.2451 0.5888 46 1.2394
1.1902 0.6016 47 1.2394
1.2322 0.6144 48 1.2394
1.2201 0.6272 49 1.2394
1.2086 0.64 50 1.2394
1.2439 0.6528 51 1.2394
1.1928 0.6656 52 1.2394
1.2412 0.6784 53 1.2394
1.2058 0.6912 54 1.2393
1.1883 0.704 55 1.2393
1.2065 0.7168 56 1.2393
1.2334 0.7296 57 1.2393
1.2276 0.7424 58 1.2393
1.2009 0.7552 59 1.2393
1.245 0.768 60 1.2393
1.2835 0.7808 61 1.2393
1.2671 0.7936 62 1.2392
1.201 0.8064 63 1.2392
1.2489 0.8192 64 1.2392
1.2038 0.832 65 1.2392
1.1625 0.8448 66 1.2392
1.2959 0.8576 67 1.2392
1.1787 0.8704 68 1.2392
1.1992 0.8832 69 1.2392
1.1789 0.896 70 1.2391
1.2455 0.9088 71 1.2391
1.269 0.9216 72 1.2391
1.2519 0.9344 73 1.2391
1.1986 0.9472 74 1.2391
1.2427 0.96 75 1.2391
1.2743 0.9728 76 1.2391
1.2139 0.9856 77 1.2391
1.2297 0.9984 78 1.2390
1.2004 1.0112 79 1.2390
1.1837 1.024 80 1.2390
1.1996 1.0368 81 1.2390
1.2621 1.0496 82 1.2390
1.2061 1.0624 83 1.2390
1.1926 1.0752 84 1.2390
1.2024 1.088 85 1.2390
1.2435 1.1008 86 1.2390
1.2531 1.1136 87 1.2389
1.1794 1.1264 88 1.2389
1.2244 1.1392 89 1.2389
1.2612 1.152 90 1.2389
1.187 1.1648 91 1.2389
1.1657 1.1776 92 1.2389
1.2265 1.1904 93 1.2389
1.2354 1.2032 94 1.2389
1.2089 1.216 95 1.2388
1.2879 1.2288 96 1.2388
1.1864 1.2416 97 1.2388
1.1877 1.2544 98 1.2388
1.2365 1.2672 99 1.2388
1.2211 1.28 100 1.2388
1.1654 1.2928 101 1.2388
1.2635 1.3056 102 1.2388
1.2163 1.3184 103 1.2388
1.2451 1.3312 104 1.2388
1.1779 1.3440 105 1.2387
1.2307 1.3568 106 1.2387
1.216 1.3696 107 1.2387
1.2072 1.3824 108 1.2387
1.2682 1.3952 109 1.2387
1.2594 1.408 110 1.2387
1.2478 1.4208 111 1.2387
1.2661 1.4336 112 1.2387
1.2112 1.4464 113 1.2387
1.2198 1.4592 114 1.2386
1.2139 1.472 115 1.2386
1.2387 1.4848 116 1.2386
1.2283 1.4976 117 1.2386
1.2096 1.5104 118 1.2386
1.1933 1.5232 119 1.2386
1.2256 1.536 120 1.2386
1.2189 1.5488 121 1.2386
1.2513 1.5616 122 1.2386
1.2174 1.5744 123 1.2386
1.1715 1.5872 124 1.2386
1.2519 1.6 125 1.2385
1.3025 1.6128 126 1.2385
1.275 1.6256 127 1.2385
1.2159 1.6384 128 1.2385
1.2654 1.6512 129 1.2385
1.2578 1.6640 130 1.2385
1.2109 1.6768 131 1.2385
1.2051 1.6896 132 1.2385
1.2345 1.7024 133 1.2385
1.2252 1.7152 134 1.2385
1.2907 1.728 135 1.2384
1.2581 1.7408 136 1.2384
1.2019 1.7536 137 1.2384
1.1398 1.7664 138 1.2384
1.248 1.7792 139 1.2384
1.2405 1.792 140 1.2384
1.2377 1.8048 141 1.2384
1.2202 1.8176 142 1.2384
1.2258 1.8304 143 1.2384
1.2795 1.8432 144 1.2384
1.2337 1.8560 145 1.2383
1.2606 1.8688 146 1.2383
1.2036 1.8816 147 1.2383
1.2198 1.8944 148 1.2383
1.1932 1.9072 149 1.2383
1.2388 1.92 150 1.2383
1.2139 1.9328 151 1.2383
1.1596 1.9456 152 1.2383
1.2878 1.9584 153 1.2383
1.2157 1.9712 154 1.2383
1.2824 1.984 155 1.2382
1.2283 1.9968 156 1.2382
1.2511 2.0096 157 1.2382
1.2028 2.0224 158 1.2382
1.2694 2.0352 159 1.2382
1.2099 2.048 160 1.2382
1.1883 2.0608 161 1.2382
1.2332 2.0736 162 1.2382
1.2156 2.0864 163 1.2382
1.2147 2.0992 164 1.2382
1.1711 2.112 165 1.2381
1.2415 2.1248 166 1.2381
1.2761 2.1376 167 1.2381
1.2508 2.1504 168 1.2381
1.2206 2.1632 169 1.2381
1.2453 2.176 170 1.2381
1.2355 2.1888 171 1.2381
1.2047 2.2016 172 1.2381
1.2247 2.2144 173 1.2381
1.2012 2.2272 174 1.2381
1.1953 2.24 175 1.2381
1.2563 2.2528 176 1.2381
1.2429 2.2656 177 1.2380
1.1839 2.2784 178 1.2380
1.1571 2.2912 179 1.2380
1.2277 2.304 180 1.2380
1.2133 2.3168 181 1.2380
1.232 2.3296 182 1.2380
1.2567 2.3424 183 1.2380
1.2277 2.3552 184 1.2380
1.2397 2.368 185 1.2380
1.1894 2.3808 186 1.2380
1.1736 2.3936 187 1.2380
1.2767 2.4064 188 1.2380
1.2343 2.4192 189 1.2380
1.2082 2.432 190 1.2380
1.1952 2.4448 191 1.2379
1.2408 2.4576 192 1.2379
1.2013 2.4704 193 1.2379
1.231 2.4832 194 1.2379
1.24 2.496 195 1.2379
1.2435 2.5088 196 1.2379
1.2484 2.5216 197 1.2379
1.2466 2.5344 198 1.2379
1.2019 2.5472 199 1.2379
1.1954 2.56 200 1.2379
1.1974 2.5728 201 1.2379
1.2271 2.5856 202 1.2379
1.2532 2.5984 203 1.2379
1.1606 2.6112 204 1.2379
1.2247 2.624 205 1.2378
1.2458 2.6368 206 1.2378
1.2675 2.6496 207 1.2378
1.2415 2.6624 208 1.2378
1.227 2.6752 209 1.2378
1.1998 2.6880 210 1.2378
1.242 2.7008 211 1.2378
1.2265 2.7136 212 1.2378
1.2242 2.7264 213 1.2378
1.2559 2.7392 214 1.2378
1.2286 2.752 215 1.2378
1.2285 2.7648 216 1.2378
1.2311 2.7776 217 1.2378
1.268 2.7904 218 1.2378
1.2513 2.8032 219 1.2377
1.2834 2.816 220 1.2377
1.2535 2.8288 221 1.2377
1.2214 2.8416 222 1.2377
1.2127 2.8544 223 1.2377
1.1854 2.8672 224 1.2377
1.2651 2.88 225 1.2377
1.2434 2.8928 226 1.2377
1.1727 2.9056 227 1.2377
1.237 2.9184 228 1.2377
1.182 2.9312 229 1.2377
1.1925 2.944 230 1.2377
1.2101 2.9568 231 1.2377
1.2566 2.9696 232 1.2377
1.2056 2.9824 233 1.2377
1.2068 2.9952 234 1.2377
1.2142 3.008 235 1.2377
1.2174 3.0208 236 1.2376
1.2084 3.0336 237 1.2376
1.2535 3.0464 238 1.2376
1.2019 3.0592 239 1.2376
1.2401 3.072 240 1.2376
1.187 3.0848 241 1.2376
1.2852 3.0976 242 1.2376
1.2061 3.1104 243 1.2376
1.1962 3.1232 244 1.2376
1.197 3.136 245 1.2376
1.2596 3.1488 246 1.2376
1.2476 3.1616 247 1.2376
1.163 3.1744 248 1.2376
1.2619 3.1872 249 1.2376
1.2432 3.2 250 1.2376
1.2739 3.2128 251 1.2376
1.2259 3.2256 252 1.2376
1.2601 3.2384 253 1.2376
1.223 3.2512 254 1.2375
1.2189 3.2640 255 1.2375
1.2332 3.2768 256 1.2375
1.2355 3.2896 257 1.2375
1.2167 3.3024 258 1.2375
1.2379 3.3152 259 1.2375
1.2399 3.328 260 1.2375
1.2689 3.3408 261 1.2375
1.1815 3.3536 262 1.2375
1.2063 3.3664 263 1.2375
1.2283 3.3792 264 1.2375
1.2725 3.392 265 1.2375
1.2147 3.4048 266 1.2375
1.1906 3.4176 267 1.2375
1.2483 3.4304 268 1.2375
1.2373 3.4432 269 1.2375
1.2347 3.456 270 1.2375
1.2671 3.4688 271 1.2375
1.2533 3.4816 272 1.2375
1.1967 3.4944 273 1.2375
1.2325 3.5072 274 1.2375
1.1707 3.52 275 1.2374
1.2044 3.5328 276 1.2374
1.2309 3.5456 277 1.2374
1.2419 3.5584 278 1.2374
1.2066 3.5712 279 1.2374
1.19 3.584 280 1.2374
1.2569 3.5968 281 1.2374
1.2009 3.6096 282 1.2374
1.2628 3.6224 283 1.2374
1.2042 3.6352 284 1.2374
1.1863 3.648 285 1.2374
1.242 3.6608 286 1.2374
1.2452 3.6736 287 1.2374
1.2264 3.6864 288 1.2374
1.2975 3.6992 289 1.2374
1.2291 3.7120 290 1.2374
1.2091 3.7248 291 1.2374
1.2253 3.7376 292 1.2374
1.2382 3.7504 293 1.2374
1.2249 3.7632 294 1.2374
1.2276 3.776 295 1.2374
1.2609 3.7888 296 1.2374
1.2503 3.8016 297 1.2374
1.1788 3.8144 298 1.2374
1.2189 3.8272 299 1.2374
1.1886 3.84 300 1.2373
1.2337 3.8528 301 1.2373
1.2084 3.8656 302 1.2373
1.2366 3.8784 303 1.2373
1.183 3.8912 304 1.2373
1.2253 3.904 305 1.2373
1.2123 3.9168 306 1.2373
1.1668 3.9296 307 1.2373
1.1947 3.9424 308 1.2373
1.2131 3.9552 309 1.2373
1.2225 3.968 310 1.2373
1.2312 3.9808 311 1.2373
1.1971 3.9936 312 1.2373
1.2206 4.0064 313 1.2373
1.1943 4.0192 314 1.2373
1.251 4.032 315 1.2373
1.217 4.0448 316 1.2373
1.2485 4.0576 317 1.2373
1.1592 4.0704 318 1.2373
1.2168 4.0832 319 1.2373
1.2029 4.096 320 1.2373
1.2057 4.1088 321 1.2373
1.2338 4.1216 322 1.2373
1.2558 4.1344 323 1.2373
1.2744 4.1472 324 1.2373
1.2222 4.16 325 1.2373
1.2182 4.1728 326 1.2373
1.2555 4.1856 327 1.2373
1.238 4.1984 328 1.2373
1.2253 4.2112 329 1.2373
1.2597 4.224 330 1.2373
1.1844 4.2368 331 1.2373
1.2253 4.2496 332 1.2373
1.2254 4.2624 333 1.2373
1.23 4.2752 334 1.2373
1.283 4.288 335 1.2373
1.2561 4.3008 336 1.2372
1.2368 4.3136 337 1.2372
1.2204 4.3264 338 1.2372
1.204 4.3392 339 1.2372
1.2165 4.352 340 1.2372
1.2591 4.3648 341 1.2372
1.2301 4.3776 342 1.2372
1.2 4.3904 343 1.2372
1.228 4.4032 344 1.2372
1.2481 4.416 345 1.2372
1.2043 4.4288 346 1.2372
1.2609 4.4416 347 1.2372
1.2163 4.4544 348 1.2372
1.257 4.4672 349 1.2372
1.2073 4.48 350 1.2372
1.2602 4.4928 351 1.2372
1.2639 4.5056 352 1.2372
1.2102 4.5184 353 1.2372
1.2104 4.5312 354 1.2372
1.1943 4.5440 355 1.2372
1.223 4.5568 356 1.2372
1.2168 4.5696 357 1.2372
1.2054 4.5824 358 1.2372
1.2042 4.5952 359 1.2372
1.1875 4.608 360 1.2372
1.188 4.6208 361 1.2372
1.2374 4.6336 362 1.2372
1.2418 4.6464 363 1.2372
1.2363 4.6592 364 1.2372
1.2319 4.672 365 1.2372
1.2349 4.6848 366 1.2372
1.2357 4.6976 367 1.2372
1.1544 4.7104 368 1.2372
1.1897 4.7232 369 1.2372
1.2462 4.736 370 1.2372
1.1635 4.7488 371 1.2372
1.2841 4.7616 372 1.2372
1.2488 4.7744 373 1.2372
1.2466 4.7872 374 1.2372
1.23 4.8 375 1.2372
1.199 4.8128 376 1.2372
1.2095 4.8256 377 1.2372
1.1816 4.8384 378 1.2372
1.3457 4.8512 379 1.2372
1.2052 4.864 380 1.2372
1.233 4.8768 381 1.2372
1.2699 4.8896 382 1.2372
1.2544 4.9024 383 1.2372
1.221 4.9152 384 1.2372
1.2299 4.928 385 1.2372
1.1672 4.9408 386 1.2372
1.1694 4.9536 387 1.2372
1.2394 4.9664 388 1.2372
1.2328 4.9792 389 1.2372
1.1979 4.992 390 1.2372

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for stojchet/lr_small_e5_sft

Adapter
(100)
this model