Beating Opus 4.6 and coming within striking distance of gpt-5.4 is impressive! Particularly given larger labs like Meta are struggling to catch up to OpenAI/Anthropic.
More competition among model vendors is great for developers!
Cursor is in a very tough situation right now. They don't have SOTA models (see the lack of benchmarks in the release), and they likely cannot subsidize usage through cheap subscriptions like claude code and openai do.
I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.
They seem to be pushing users away from using OpenAI and Anthropic models. On March 16th, they released pricing changes to stop subsidizing the state of the art models for team/enterprise users on legacy pricing models.
GPT 5.4 and Claude Opus/Sonnet 4.5/4.6 are now billed at API rates for all users, even enterprise customers. Previously, they were subsidizing these models around a factor of 10x, billing per request and not per token. Composer 2 bills for $0.08 per request on the fast model, and $0.04 per request on the slower model - no matter the tokens used.
It seems like they are targeting Enterprise above all else, relying on a enterprises signing up a bunch of paying users that rarely touch Cursor to subsidize the power users using an excessive amount of Composer tokens. It's a fair strategy, as cursor seems to increase output by 20-30% so the price is well worth it for Enterprise customers.
To be fair these frontier models have been seriously increasing their pricing as of late. Opus 4.6 requests regularly cost over $5 now, with average requests costing ~$1-2. If Composer is benchmarking better than Opus and costs $0.08 per request that's a win for everyone.
I know people like to hate Composer but competition is a benefit to all of us, and I don't doubt Composer will take it's own chunk of the consumer market.
Beating Opus 4.6 and coming within striking distance of gpt-5.4 is impressive! Particularly given larger labs like Meta are struggling to catch up to OpenAI/Anthropic.
More competition among model vendors is great for developers!
Oh boy, it's really fast
Just when I increased my subscription with CC for more Opus 4.6 usage :)
Cursor is in a very tough situation right now. They don't have SOTA models (see the lack of benchmarks in the release), and they likely cannot subsidize usage through cheap subscriptions like claude code and openai do.
I wonder what's their plan moving forward, they have been releasing a ton of random features lalely.
They seem to be pushing users away from using OpenAI and Anthropic models. On March 16th, they released pricing changes to stop subsidizing the state of the art models for team/enterprise users on legacy pricing models.
GPT 5.4 and Claude Opus/Sonnet 4.5/4.6 are now billed at API rates for all users, even enterprise customers. Previously, they were subsidizing these models around a factor of 10x, billing per request and not per token. Composer 2 bills for $0.08 per request on the fast model, and $0.04 per request on the slower model - no matter the tokens used.
It seems like they are targeting Enterprise above all else, relying on a enterprises signing up a bunch of paying users that rarely touch Cursor to subsidize the power users using an excessive amount of Composer tokens. It's a fair strategy, as cursor seems to increase output by 20-30% so the price is well worth it for Enterprise customers.
To be fair these frontier models have been seriously increasing their pricing as of late. Opus 4.6 requests regularly cost over $5 now, with average requests costing ~$1-2. If Composer is benchmarking better than Opus and costs $0.08 per request that's a win for everyone.
I know people like to hate Composer but competition is a benefit to all of us, and I don't doubt Composer will take it's own chunk of the consumer market.
Are there other coding benchmarks we should include next time? We included Teminal-Bench 2.0 and SWE-bench Mulitilingual.
We don't plan on reporting SWE-bench Verified, for similar reasons to OpenAI: https://openai.com/index/why-we-no-longer-evaluate-swe-bench...
...you're looking at their plan