can we have some more details?
#1
by
Tom-Neverwinter
- opened
What are the changes? would love to hear more about this version
Hmm sorry, I don't have exact details for this release. Let me see...
- Finetuned with more care in retaining intelligence
- Removed examples written after 2022 (Filtered out 30% of the dataset)
- Removed examples containing slop like "shivers" (Filtered out 20% afterwards) (I overlooked "chills" so you might see a lot of it)
- Removed examples shorter than 512 tokens (Filtered out another 20%)
I'm pretty sure the major pruning skewed the genre stats but I haven't checked
Hmm sorry, I don't have exact details for this release. Let me see...
- Finetuned with more care in retaining intelligence
- Removed examples written after 2022 (Filtered out 30% of the dataset)
- Removed examples containing slop like "shivers" (Filtered out 20% afterwards) (I overlooked "chills" so you might see a lot of it)
- Removed examples shorter than 512 tokens (Filtered out another 20%)
I'm pretty sure the major pruning skewed the genre stats but I haven't checked
Сan this model be improved by extending its maximum context to 16-32?