Gryphe commited on
Commit
763e27d
1 Parent(s): 1fbc79a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -11,7 +11,7 @@ I am currently in the process of cleaning up the code before publishing it, much
11
 
12
  ## Final merge composition
13
 
14
- After processing 12 models my algorithm ended up with the following (approximated) final composition, which are spread almost randomly throughout the final model due to the way my new method works.
15
 
16
  | Model | Contribution |
17
  |--------------------------|--------------|
@@ -28,6 +28,8 @@ After processing 12 models my algorithm ended up with the following (approximate
28
  | Mistral-7B-v0.1 | 2% |
29
  | Openchat_3.5 | 2% |
30
 
 
 
31
  This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
32
 
33
  ## Prompt Format
 
11
 
12
  ## Final merge composition
13
 
14
+ After processing 12 models my algorithm ended up with the following (approximated) final composition:
15
 
16
  | Model | Contribution |
17
  |--------------------------|--------------|
 
28
  | Mistral-7B-v0.1 | 2% |
29
  | Openchat_3.5 | 2% |
30
 
31
+ There is no real logic in how these models were divided throughout the merge - Small bits and pieces were taken from each and then mixed in with other models on a layer by layer basis, using a pattern similar to my MythoMax recipe in which underlying tensors are mixed in a criss-cross manner.
32
+
33
  This new process only decides on the model's layers, not the singular lm_head and embed_tokens layers which influence much of the model's output. I ran a seperate script for that, picking the singular tensors that create the longest responses, which settled on Toppy-M-7B.
34
 
35
  ## Prompt Format