Can a Model Teach Itself With Prompts Instead of Gradients?

The question I’ve been thinking about: can a LLM model, a stateless machine, teach itself? Does it have the introspection to understand its mistakes and know how to improve? I spent the last few days running an experiment based on a paper called Training-Free GRPO. The core idea: instead of fine-tuning a model with reward signals, you extract natural-language “experiences” from its own successes and failures and inject them back into future prompts. ...

March 26, 2026 · 11 min · Burton Ye

Which Models Actually Benefit From Prompt-Injected Experiences?

The previous experiment ended with an unresolved anomaly. Three models improved when a strong teacher (DeepSeek V3.2) injected procedural experiences into their prompts. One — Qwen 2.5 7B — regressed, and kept regressing regardless of what experiences it received or how much token budget it was given. The cross-injection experiments showed it wasn’t the content; it was something about how Qwen handles injected lists at all. The question that left open: is this a Qwen thing, or does it happen to any model that’s already competent at the task? ...

March 26, 2026 · 5 min · Burton Ye

What is OpenAI's Operator Good For?

Mar 2026 Update: Computer Use has gotten a lot better. I myself have been very impressed with Manus’s ability to take over a browser tab and architect actions. While this has weakened the initial premise, I still think this is an interesting viewpoint on how much can change in a year. And more importantly, the original argument — that enterprises with workflows worth automating have a SOP would not choose an unorchestrated agent over an orchestrated agent (which can make use of more repeatable blocks like Playwright MCP). ...

February 1, 2025 · 3 min · Burton Ye