@takarajordan on Hugging Face: "I'm super excited to release my first open-source text dataset: WorldScenario…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

takarajordan

posted an update 13 days ago

Post

2191

I'm super excited to release my first open-source text dataset:

WorldScenario 20K is a novel dataset of 20,000 synthetically generated multi-stakeholder scenarios designed to simulate real-world decision-making processes. Each scenario explores a unique environmental, societal, or economic issue.

I used the brand new meta-llama/Llama-3.3-70B-Instruct model to generate this dataset and I put the dataset through some post processing to clean and evaluate the dataset for diversity.

I'd appreciate some feedback and thoughts on my new release! Thanks!

takarajordan/WorldScenario_20K

clem

12 days ago

congrats!

takarajordan

6 days ago

Thanks Clem!!

midrees2806

10 days ago

Sir how do u preprocess the dataset as i have also created a dataset for my university to fine tune llama 2 model but it does not giving me good output so please help me

takarajordan

6 days ago

I preprocessed this into ChatML format to train the model takarajordan/WorldScenario-3.2B_GGUF and I used Unsloth to finetune it!

If you want more help join the HuggingFace discord, I'm always in there.

In this post