File size: 4,817 Bytes
39280b3 dea2ad9 39280b3 dea2ad9 8d7bf6a 5e5b023 8d7bf6a f945167 8d7bf6a b737d33 8d7bf6a a50b6da 8d7bf6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
library_name: transformers
license: llama3.1
datasets:
- euclaise/reddit-instruct-curated
- BintangFortuna/Reddit-Writing-SGPT
base_model:
- mergekit-community/mergekit-ties-svidyqt
---
### 1 Kaggle Account Fine-Tuning Challenge:
I just realized that abusing free services isn't cool, so I set myself a challenge—to fine-tune this model using only one Kaggle account
[Placeholder for image, maybe... or not]
Base model: [mergekit-community/mergekit-ties-svidyqt](https://huggingface.co./mergekit-community/mergekit-ties-svidyqt)
The dataset is already listed, with just a small addition of persona-like data generated with Gemma, and some instruction following data, probably less than 1000 examples, added for better generalization, since the two don’t have system turns (honestly, I just wanted to round it up from 24K to 25K, it looks nicer when tokenizing)
```
#TRAINING: STAGE ONE
layers = [
{'layer': 0, 'components': ['v_proj', 'o_proj', 'down_proj', 'gate_proj']},
{'layer': 1, 'components': ['o_proj', 'down_proj','gate_proj']},
{'layer': 2, 'components': ['v_proj', 'o_proj', 'gate_proj']},
{'layer': 3, 'components': ['o_proj', 'down_proj', 'gate_proj']},
{'layer': 4, 'components': ['v_proj', 'o_proj', 'down_proj', 'gate_proj']}
]
trainable_lm_head=True,
trainable_embed_tokens=True,
trainable_model_norm=True
#TRAINING: STAGE TWO
layers = [
{'layer': 5, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 6, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 7, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 11, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 12, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 13, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 17, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 18, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 19, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 23, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 24, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 25, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 28, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 29, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']}
]
trainable_lm_head=False,
trainable_embed_tokens=False,
trainable_model_norm=False
#TRAINING: STAGE THREE
#I changed the dataset seed at training stage 3, because... why not? The training was already a mess, might as well make it even more interesting
layers = [
{'layer': 8, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 9, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 10, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 14, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 15, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 16, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 20, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 21, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 22, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 26, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 27, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
#
{'layer': 30, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']},
{'layer': 31, 'components': ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj']}
]
trainable_lm_head=False,
trainable_embed_tokens=False,
trainable_model_norm=False
``` |