AiAF/SCPWiki-Archive-02-March-2025-Datasets Update

#1
by AiAF - opened

Good... uhhh ... Morning! (It's 2:00 a.m. where I live). I noticed you used One of my repos for this model. Likely the pre-training. jsonl file specifically. I updated it around 6 hours ago to include several other data sets. Including the limited to a question-answer dataset and a "no rag" dataset, among several others. I even included a giant zip archive containing all the output files augmentoolkit (The tool I use to make these data sets: https://github.com/e-p-armstrong/augmentoolkit ). I feel you may find the additions very useful.

https://huggingface.co./datasets/AiAF/SCPWiki-Archive-02-March-2025-Datasets/commit/92ea90a1b45326803380ce385535f66b70d49f69

https://huggingface.co./datasets/AiAF/SCPWiki-Archive-02-March-2025-Datasets/commit/efdde8cff820a994882cd38b508e093383145e90

Sign up or log in to comment