You really have to use the exact prompts and agent scaffolding that you RL trained with. If you try Cline or Codex CLI with Devstral 2507, you'll be reasonably impressed, but then try OpenHands, and it's just a whole different ballgame. Convinced this is why Claude Code is beating cursor. My conclusion: Cursor needs a model, Claude needs an IDE, and they are definitely direct competitors.
@ericflo If you're lucky, you can guess from its outputs what was its training regime and maybe get your setup attuned to it.
And this is a case for true open source - not only the weights, but also the training setups, docs etc.