Open Reproduction of DeepSeek-R1

	Open Reproduction of DeepSeek-R1(github.com)
	241 points by yogthos 46 days ago \| 18 comments
	tl;dr: Hugging Face's Open-R1 is an open-source effort to reproduce DeepSeek-R1's full pipeline, including SFT distillation, GRPO reinforcement learning, and evaluation. The project has completed Step 1 with the release of OpenR1-Distill-7B and the Mixture-of-Thoughts dataset (350k reasoning traces), matching DeepSeek-R1-Distill-Qwen-7B's performance on benchmarks like AIME 2024 and MATH-500. It also provides tooling for code-execution reward functions (via E2B/Morph sandboxes), dataset decontamination, and Slurm-based multi-node training.
	HN Discussion: ↓Project is outdated with no recent updates, making it less relevant ↓Other projects like OLMo and Nemotron offer more fully open training pipelines ↓OpenThoughts is a better alternative with superior datasets and models ~Skepticism about glossed-over difficulty of curating large reasoning datasets •Curious about the practical training costs involved