| Open Reproduction of DeepSeek-R1(github.com) | |
| 233 points by yogthos 22 hours ago | 18 comments | |
tl;dr: Hugging Face's Open-R1 is an open-source effort to reproduce DeepSeek-R1's full pipeline, including SFT distillation, GRPO reinforcement learning, and evaluation. The project has completed Step 1 with the release of OpenR1-Distill-7B and the Mixture-of-Thoughts dataset (350k reasoning traces), matching DeepSeek-R1-Distill-Qwen-7B's performance on benchmarks like AIME 2024 and MATH-500. It also provides tooling for code-execution reward functions (via E2B/Morph sandboxes), dataset decontamination, and Slurm-based multi-node training. | |
HN Discussion:
| |