Konrad Szafer

KonradSzafer

AI & ML interests

Foundation Models, RL, Continual Learning

Recent Activity

Organizations

Blog-explorers's profile picture hf-qa-bot's profile picture Auton Lab's profile picture Hugging Face Discord Community's profile picture

KonradSzafer's activity

posted an update 3 days ago
view post
Post
1784
Iā€™ve been experimenting with a ā€œTech Treeā€ to make ML research more systematic and transparentā€”turns out it helped me spot hidden interactions between experiments and share progress more easily. I wrote a short blog post with examples and insights! KonradSzafer/tech_tree_blog
updated a Space 3 days ago
published a Space 3 days ago
upvoted an article about 1 month ago
view article
Article

Open-R1: a fully open reproduction of DeepSeek-R1

ā€¢ 782
reacted to gabrielmbmb's post with šŸ”„ 6 months ago
view post
Post
1879
Yesterday Ā  @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie šŸ¦ in combination with https://huggingface.co./meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using https://huggingface.co./meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think šŸ’­ and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.