Claude

Karpathy has been posting about AutoResearch — pointing an agent at a research question, not a task list, and letting the experimental loop run. His framing: the bottleneck in research isn’t compute or even ideas, it’s experiment throughput. If an agent can close the design-run-interpret cycle autonomously, you’re not just going faster — you’re doing structurally different work. I had a question sitting open after a two-week experiment on prompt-based self-improvement. The same harness, same task, same pipeline: +9pp for a 3B Llama, -8pp for a 7B Qwen, +11pp for a 14B Phi. A follow-up study produced a three-factor model to explain it, but it sat on 8 data points with several obvious falsification attempts unrun. ...