Can a Model Teach Itself With Prompts Instead of Gradients?
The question I’ve been thinking about: can a LLM model, a stateless machine, teach itself? Does it have the introspection to understand its mistakes and know how to improve? I spent the last few days running an experiment based on a paper called Training-Free GRPO. The core idea: instead of fine-tuning a model with reward signals, you extract natural-language “experiences” from its own successes and failures and inject them back into future prompts. ...