Hard Examples Are All You Need for GRPO
This post summarizes a paper I co-authored with Benjamin Pikus and Pratyush Ranjan Tiwari. The full paper is on arXiv. Fine-tuning a language model with GRPO is expensive. Collecting and annotating training data is expensive. So if you can only afford to train on 10% of your data, which 10% should you pick? The intuitive answer might be: a representative sample. Maybe some easy, some hard, some in the middle. That’s what random selection gives you. ...