About the data used
#1
by
Ansemia
- opened
Realized I forgot to make this one public, heh. Don't have the settings used for training this anymore, sorry.
Can you remember which categories were picked and roughly guess how many rows from the dataset were used? Where they filtered for gibberish and other junk?
I used this in a merge and the results were interesting, but I'd like to know more to get a general sense of how this model influenced it.
100% of bigdata-pw/the-x-files over 2 epochs (so effectively 200%)
100% of the small private set of RP data (should be properly filtered & gibberish-free) over 2 epochs (so effectively 200%)
roughly 20-30% of c2, filtered for the worst junk but some likely slipped through