You really have to use the exact prompts and agent scaffolding that you RL trained with. If you try Cline or Codex CLI with Devstral 2507, you'll be reasonably impressed, but then try OpenHands, and it's just a whole different ballgame. Convinced this is why Claude Code is beating cursor. My conclusion: Cursor needs a model, Claude needs an IDE, and they are definitely direct competitors.

Follow

@ericflo 💯 this.
Otherwise you make the agent being constantly confused about what's going on and fighting its own training.

@ericflo If you're lucky, you can guess from its outputs what was its training regime and maybe get your setup attuned to it.

And this is a case for true open source - not only the weights, but also the training setups, docs etc.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.