Videos
I've been thinking a lot about how useful background coding agents actually are in practice. A lot of the same arguments get repeated like "parallel tasks" and "run things in the background" but I'm not sure how applicable that really is for individual contributors on a team that might be working on a ticket at a time
From my experience so far, they shine most with small to medium, ad hoc tasks that pop up throughout the day. Things that are trivial but still consume mental bandwidth and context switching. That said, this feels most relevant to people at early stage startups where there's high autonomy and you're constantly jumping on whatever needs doing next
I'm curious how others think about this
What kinds of tasks do you feel are genuinely well suited for background coding agents like Codex Web?
Or do you find them not particularly useful in your workflow at all?
I've been putting the new web-based Codex through its paces over the last 24 hours. Here's some thoughts;
-
The pricing is wild - completely revolutionary and probably unsustainable
-
It's better than most of my existing tools at writing code, but still pretty bad at planning or architecting solutions
-
No web access once the session starts is a huge limitation, and it's buggy and poorly documented
For context: I'm working on an open source autonomous coding agent because I love this space, not because I'm trying to monetize it. I've spent serious time with Claude Code, Cline, Roo Code, Cursor, and pretty much every shiny new thing. Until now, Cline was my go-to, though Claude still has the edge in some areas.
Running these kinds of agents at scale often racks up $100+ a day in API usage - even if you're smart about it. Codex being included in a Pro subscription with no rate limits is completely nuts. I haven't hit any caps yet, and I've thrown a lot at it. I've easily put in $100 worth of equivalent usage in a single day. Multiple coding tasks running in parallel, no throttling. I have no idea how that is supposed to hold.
As for performance: when it comes to implementing code from a clear plan, it's the best tool I've used. If it was available inside Cline, it'd be my default Act agent. That said, it's clearly not the full o3 model - it really struggles with high-level planning or designing complex systems.
What's working well for me right now is doing the planning in o3, then passing that plan to Codex to execute. That combo gets solid results.
The GitHub integration is slick - write code, create commits, open pull requests - all within the browser. This is clearly the future of autonomous coding agents. I've been "coding" all day from my phone - queueing up 10 tasks, going about my day, then reviewing, merging, and deploying from wherever I am.
The ability to queue up a bunch of tasks at once is honestly incredible. For tougher problems, I've even tried sending the same task 5-10 times, then taking the git patches and feeding them into o3 to synthesize the best version from the different attempts. It works surprisingly well.
The big issues:
- No web access once the session starts which means testing anything with API calls or package installs is a nightmare
- Config is surprisingly confusing the docs hint that you can prep the environment (e.g. install dependencies at the start), but they don't explain how. If you can't use their prebuilt tools, testing is basically a no-go right now, which kills the build -> test -> iterate workflow that's essential for SWE agents
Still, despite all that, Codex spits out some amazing code with the right prompting. Once the testing and environment setup limitations are fixed, this thing will be game-changing. Honestly, it already kind of is.
How to enable websearch in codex