About the data used

by Ansemia - opened 11 days ago

11 days ago

Realized I forgot to make this one public, heh. Don't have the settings used for training this anymore, sorry.

Can you remember which categories were picked and roughly guess how many rows from the dataset were used? Where they filtered for gibberish and other junk?

I used this in a merge and the results were interesting, but I'd like to know more to get a general sense of how this model influenced it.

rAIfle

Owner 11 days ago

•

edited 11 days ago

100% of bigdata-pw/the-x-files over 2 epochs (so effectively 200%)
100% of the small private set of RP data (should be properly filtered & gibberish-free) over 2 epochs (so effectively 200%)
roughly 20-30% of c2, filtered for the worst junk but some likely slipped through

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment