Open Reproduction of DeepSeek-R1(github.com)
233 points by yogthos 22 hours ago | 18 comments
tl;dr: Hugging Face's Open-R1 is an open-source effort to reproduce DeepSeek-R1's full pipeline, including SFT distillation, GRPO reinforcement learning, and evaluation. The project has completed Step 1 with the release of OpenR1-Distill-7B and the Mixture-of-Thoughts dataset (350k reasoning traces), matching DeepSeek-R1-Distill-Qwen-7B's performance on benchmarks like AIME 2024 and MATH-500. It also provides tooling for code-execution reward functions (via E2B/Morph sandboxes), dataset decontamination, and Slurm-based multi-node training.
HN Discussion:
  • Project is outdated with no recent updates, making it less relevant
  • Other projects like OLMo and Nemotron offer more fully open training pipelines
  • OpenThoughts is a better alternative with superior datasets and models
  • ~Skepticism about glossed-over difficulty of curating large reasoning datasets
  • Curious about the practical training costs involved