I think there was a major jump in AI capabilities from Anthropic and OpenAI between the end of 2025 and the start of 2026 that made them far more reliable at programming correctly. I wonder what changed in the secret sauce.
I suspect the big jump came from the release of Claude Opus 4.5/4.6 and GPT-5.x-Codex between Nov ‘25 and Feb ‘26, which were trained with heavy reinforcement learning on long coding projects, rewarding only real success (like running code, using terminals, self-fixing bugs, and passing tests) while adding better memory for huge codebases and extra coding-specific training.
Nothing drastic I'd say. It's a continuous stream of small improvements just accumulating with each release, and someone just noticed a few releases away from a previous publicized-bad-capabilities release that there's major improvement between those points. So it looks like something major only due to the spacing between the capability surveys on the release timeline.
I think there was a major jump in AI capabilities from Anthropic and OpenAI between the end of 2025 and the start of 2026 that made them far more reliable at programming correctly. I wonder what changed in the secret sauce.
I suspect the big jump came from the release of Claude Opus 4.5/4.6 and GPT-5.x-Codex between Nov ‘25 and Feb ‘26, which were trained with heavy reinforcement learning on long coding projects, rewarding only real success (like running code, using terminals, self-fixing bugs, and passing tests) while adding better memory for huge codebases and extra coding-specific training.
Nothing drastic I'd say. It's a continuous stream of small improvements just accumulating with each release, and someone just noticed a few releases away from a previous publicized-bad-capabilities release that there's major improvement between those points. So it looks like something major only due to the spacing between the capability surveys on the release timeline.
It was drastic and immediate. It switched with the latest versions of opus and codex. It's why openclaw is popping off. The models became usable.