WebDevPro #138: The agentic coding loop: PRD → task planner → parallel agents
By Alexandre Zajac
Catch the latest HubSpot Developer Platform updates in Spring Spotlight
Spring Spotlight 2026 is live and we’ve rounded up the top updates for developers. See what’s new for the HubSpot Developer Platform!
Ship faster with AI coding tools like Cursor, Claude Code, and Codex. Build MCP-powered AI connectors, run serverless functions with support for UI extensions, and use date-based versioning to streamline roadmap planning.
Explore Now
Welcome to this week’s issue of WebDevPro!
Today’s piece comes from Alexandre Zajac, an engineer from Prime Video who’s been thinking deeply about how AI coding workflows actually behave in the real world.
Alex doesn’t just experiment at the surface level; he digs into the friction points, the failure modes, and the patterns that emerge when you push agents beyond toy problems.
He built this PDF starter kit to help you go through it!
In this article, he walks through a practical system for making agentic coding reliable, drawing from firsthand experience and real constraints. It’s thoughtful, grounded, and immediately useful if you’ve ever felt your AI sessions spiral after a few dozen turns.
Before we get into it, here’s a sneak peek of this week’s highlights:
👋 Hi! It’s Alex!
Let’s imagine this scene. You open a new chat. You describe the feature.
For twenty minutes, the AI is cooking.
Good code, too.
Around turn 30, it starts fixing a bug it introduced three turns ago.
It imports a module you deleted. It references a file path you renamed ten messages back.
You’re not building anymore. You’re managing.
Feeding corrections back in, watching the agent regenerate the same broken pattern with different variable names.
This is the death loop, and it’s not a prompt quality problem.
Session length is the variable that matters.
I created a simple system to ship simple changes with agents (almost) autonomously, and in this article, I want to show the why and how I built it.
Context erosion
The Anthropic Claude Code docs say it directly: a single debugging session can generate tens of thousands of tokens, and when the context window fills up, Claude starts forgetting earlier instructions and making more mistakes.
A 10-turn session is fine.
A 50-turn session drifts.
At 80 turns, you’re spending more time correcting than building, and at that point, you might as well have written the code yourself.
Birgitta Böckeler, writing in the Martin Fowler engineering series, said it after months of daily agentic coding: the longer a session gets, the more hit-and-miss it becomes, regardless of the rigor in prompting.
Assumption stacking
Agents make assumptions about your codebase.
Most are correct.
When one is wrong and goes unchallenged, the agent builds on it. Then you build on the agent’s code. Three levels in, the wrong assumption is load-bearing.
Documented example from the same Martin Fowler series: an agent diagnosed a Docker build failure as an architecture mismatch and changed the Docker settings. The actual cause was node_modules built for the wrong platform. Classic error. Any developer who’s been burned by it once would catch it in 30 seconds. Without that catch, the agent would have spent hours deepening a wrong diagnosis into increasingly creative fixes.
Agents are confidently wrong with no built-in correction mechanism.
Filesystem collisions
Two agents, one working directory:
agent-1:
git add . && git commit -m “fix auth middleware”
agent-2:
git add . && git commit -m “refactor API layer”
# agent-2 just committed agent-1’s half-finished auth changes
Every agent sees every other agent’s uncommitted changes.
Files get overwritten mid-edit.
One agent’s npm install blows away another’s node_modules.
Commits become incoherent.
Open two Cursor tabs on the same repo and give it 20 minutes. Productive at first. Then a mess that takes longer to untangle than doing the work sequentially.
So how do we make this consistently okay?
Stage 1: The PRD
A PRD is not a 47-page corporate artifact. It’s a short document you write before opening your coding tool. Its job: prevent assumption stacking before the agent gets a chance to start.
I always use these for medium-to-large features.
It’s similar to the Harper Reed workflow published in February 2025.
It got traction because it was concretely different from how most people worked at the time. His first move is a conversational spec session where an LLM asks him one question at a time until the idea is fully elaborated.
The prompt:
“Ask me one question at a time so we can develop a thorough,
step-by-step spec for this idea. Each question should build
on my previous answers, and our end goal is to have a detailed
specification I can hand off to a developer. Let’s do this
iteratively and dig into every relevant detail. Remember,
only one question at a time.”
Here’s the idea: [IDEA]
When the questions run dry:
“Now that we’ve wrapped up the brainstorming process, can you
compile our findings into a comprehensive, developer-ready
specification? Include all relevant requirements, architecture
choices, data handling details, error handling strategies,
and a testing plan so a developer can immediately begin
implementation.”
Output goes to spec.md. The whole thing takes 15 minutes.
The Doozy founders shipped a 300k-line Next.js monorepo with 3 to 6 parallel agents at any given time. They do it differently. They built a /discussion command that runs before any code is written. Subagents explore the codebase and look up dependencies. The command produces no edits, only a written summary in .context/context.md. Other agents reference that file.
What goes in the spec
# spec.md
## Problem
What you’re building and why, in one sentence.
## Constraints
What the agent should NOT touch.
What patterns to follow.
## Files to read
src/auth/middleware.ts
src/routes/users.ts
prisma/schema.prisma
## Definition of done
npm run typecheck && npm test passes with no new failures.
150 to 300 words. The format matters less than writing it before you start prompting.
One test: if you can’t describe what the feature should not do, keep writing.
Where it lives: spec.md in the repo root, a context section in CLAUDE.md, or .context/context.md. Pick whatever your tool loads by default.
Watch out for CLAUDE.md files that balloon. The Claude Code docs are explicit about this: bloated CLAUDE.md files cause Claude to ignore your actual instructions. Treat it like code. Prune it. If you added a rule and Claude’s behavior didn’t change, the rule is noise. Delete it.
Stage 2: The task planner
The task planner converts a spec into a sequenced list of bounded tasks. Harper Reed uses a reasoning model for this step, and I do too. Not for code, just for decomposition.
A possible prompt:
Draft a detailed, step-by-step blueprint for building this project.
Then break it down into small, iterative chunks that build on each
other. Go another round to break it into small steps. Review and
make sure the steps are small enough to implement safely, but big
enough to move the project forward.
Make sure each prompt builds on the previous prompts, and ends with wiring things together. There should be no hanging or orphaned code that isn’t integrated into a previous step.
[SPEC]
Output goes to prompt_plan.md. Then he asks for a todo.md checklist the execution agent can check off. That’s how state persists across sessions without re-reading the full conversation history.
Anthropic describes this as the orchestrator-workers pattern: a central LLM breaks down tasks, delegates them to worker LLMs, and synthesizes results. The reasoning is that you can’t always predict the subtasks, especially in code, where the number of files and the nature of changes depend on the task.
What “agent-sized” means
A task is agent-sized if it fits in one context window, produces a green lint/typecheck, and has a binary done/not-done signal.
These will send your agent spiraling:
❌ “Refactor the authentication system”
❌ “Add tests”
❌ “Improve performance”
These work:
✅ “Add a useAuth hook to src/hooks/auth.ts that wraps the
existing authService and exposes login, logout, and
isAuthenticated. Update src/components/LoginButton.tsx
to use it. Done when: npm run typecheck passes.”
✅ “Add a POST /api/users/:id/preferences endpoint to
src/routes/users.ts that validates the body against
UserPreferencesSchema and writes to user_preferences.
Done when: the new route test passes.”
Specific file. Specific interface. Specific check.
The Pane team scores every plan on a 1-10 confidence scale for one-pass implementation success. Below 8, iterate. Their rules: no aspirations, only instructions. No open questions (if something is unresolved, stop and research first). And if a plan is too big for one session, split it. Oversized plans are where the death loop starts.
Stage 3: Parallel agents
Here are the 3 concepts I use the most to make parallel agents work.
Once you have a dependency-ordered task list, tasks that don’t share files can run at the same time. The thing that used to block this was filesystem collisions. The fix has been sitting in git since 2015.
Git worktrees
One command. Sub-second. The new directory is a fully independent working tree on its own branch, sharing the same .git object store. No re-cloning. Disk cost is just for your working files.
git worktree add .worktrees/feature-auth -b session/feature-auth
Each agent gets a clean branch, zero visibility into other agents’ uncommitted changes, and full ability to run tests independently.
When done:
git worktree remove --force .worktrees/feature-auth
The branch stays if there are commits to review. Otherwise, it’s gone.
This primitive from 2015 that nobody cared about until this year now underpins several independent tools:
amux runs up to 30 Claude Code agents in parallel. Uses SQLite compare-and-swap for atomic task claiming. No Redis. No Kubernetes. Just a WHERE clause and SQLite’s write lock.
Emdash (YC W26) is an open-source desktop app supporting 20+ CLI providers, including Claude Code, Codex, and Gemini CLI, with direct integration into Linear, GitHub, and Jira, plus SSH to remote machines.
Pane is open-source, from the Doozy founders. Same team behind the 300k-line monorepo. Cross-platform, agent-agnostic, keyboard-first.
All of these solutions are great, but in reality, to get started, none of them are required.
A task file and a terminal per worktree gets it done.
Scoping agents
Every agent session starts with the task description from the plan, the specific files listed in the task, and the relevant section of the spec.
# Task: Add useAuth hook
Files to read:
- src/services/authService.ts (existing service to wrap)
- src/hooks/ (existing hook patterns to follow)
- src/components/LoginButton.tsx (file to update)
Constraints: Do not modify authService.ts.
Done: npm run typecheck passes, LoginButton uses useAuth.
The agent can’t drift into adjacent work because adjacent work isn’t in its context. It either completes the task or fails clearly.
Ten 20-turn sessions instead of one 200-turn session.
Integration checks
TypeScript strict mode, npm run typecheck, and lint are your coordination layer across parallel agents. An agent’s work isn’t done until the checks pass. This prevents a type error in one worktree from silently propagating when branches merge. This scales across languages.
Try to enforce this per task: implement, typecheck, lint, format, and fix all issues before proceeding. After all tasks finish, a reviewer subagent reruns everything and verifies that the plan was completed.
Human review stays in the loop for architectural decisions, ambiguous acceptance criteria, and code that’s syntactically valid but semantically wrong. Automated checks catch everything describable as a rule. They don’t catch everything.
What this still doesn’t fix
Agents still hallucinate library APIs. They’ll write syntactically correct code against an API that changed in the last major version. The planning stage reduces this by including research tasks with URLs, but it doesn’t eliminate it. Verify unfamiliar library usage against docs before shipping.
The spec quality ceiling is your knowledge. If you don’t understand your codebase well enough to write a clear spec, the agent won’t either. The planning phase surfaces this. If you can’t finish the plan without open questions, you need to learn more before you start.
Merge conflicts still happen when two tasks share a file the plan didn’t account for. Worktrees stop filesystem collisions. They don’t do design work.
The tooling layer is moving fast. Git worktrees are stable. The orchestration tools built on top of them aren’t. Build your workflow around git worktree add, not the wrapper tool.
Context pollution goes both ways. Too little context and the agent hallucinates your conventions. Too much, and the real rules get buried. The same principle applies to task files: dense, explicit, and short.
Entropy compounds. Incomplete refactors, mixed conventions, dead code. All of it degrades agent performance on the next feature because the agent has to reason about a messier codebase. This workflow doesn’t fix that.
Where to start this week
Don’t adopt all three stages at once.
Start with Stage 2. Before your next AI coding session, spend 10 minutes writing a task plan. Three things:
Which files does this touch (name them)
What it should NOT do (scope the blast radius)
The done signal (what command passes when it’s complete?)
No new tools. No worktrees yet. Write it down first.
Once that’s a habit, add Stage 1. The spec forces you to resolve assumptions before the agent makes them for you. This + a 15-minute planning session will save you 2+ hours of correction.
For parallel execution: git worktree add before you open a second tab.
One command, isolated branch, no collisions.
I put together a starter kit with every template from this post: the spec, the task planner prompt, the agent scoping format, the worktree setup script, and a lean CLAUDE.md ready to drop into any repo.
This Week in the News
🧠 A new contender just entered the coding model race: Jason Warner, GitHub’s former CTO’s startup, Poolside has released a new set of agentic coding models built specifically for software engineering workflows. This is a focused bet on where development is heading, with models designed for control, performance, and real-world coding tasks rather than general-purpose use.
⚙️ Cursor just turned its agents into an API: Cursor’s new TypeScript SDK lets you invoke its coding agents from CI pipelines, backend services, or even your own product. This moves agents out of the editor and into the system itself, where they can run tasks, automate workflows, and become part of how software gets built.
⚛️ React is still trying to automate performance: A year into the React Compiler experiment, the direction is becoming clearer. The goal is to shift performance optimization away from developers by letting the compiler handle memoization and rendering decisions, reducing the need for manual tuning in complex applications.
🟢 Node 26 hit a delay, but Temporal is still close: Node 26.0 was expected to land with the Temporal API enabled by default, but a macOS-related problem led to a last-minute delay. A fix is already in progress and a new release candidate is available, keeping Temporal on track to become part of the standard Node experience.
🧠 Where the goblins in ChatGPT actually came from: OpenAI dug into a strange behavior where its models kept referencing goblins and other creatures in unrelated contexts. The cause wasn’t a bug in the usual sense. It came from subtle training incentives, especially around personality tuning, where certain kinds of metaphors were rewarded more than expected and then spread across the model. It’s a small, almost funny example, but it points to something deeper. Model behavior is shaped by many tiny signals, and those signals don’t always stay contained.
Beyond the Headlines
🧩 Coding is shifting from prompts to orchestration: This piece makes it clear that the real shift is not better prompts, but better coordination. Codex is evolving into a system that orchestrates multiple steps, tools, and agents to complete work. The focus is no longer on generating code, but on managing how that code gets produced.
⚠️ When output looks right but meaning drifts: A small example shows how generated output can subtly drift from the original intent while still looking correct. Nothing is obviously broken, which is exactly why it gets through. This is the kind of failure that does not show up in syntax or tests, but in meaning.
🧠 Work that looks complete but lacks depth: This piece explores how modern tools can produce outputs that resemble real knowledge work without the depth behind them. The results look finished, but the reasoning is often shallow. That gap becomes clear the moment judgment is required.
🔐 SVGs are still an easy way to get security wrong: SVGs look harmless, but they can carry scripts and unexpected behavior if not handled carefully. This write-up shows how common sanitization approaches fail and why this issue keeps slipping into production systems.
⚛️ Most React accessibility issues are small and avoidable: Accessibility problems in React apps rarely come from complex logic. They come from small gaps like missing labels, broken semantics, and poor keyboard support. These are easy to miss and just as easy to fix once you know where to look.
Tool of the Week
🎬 Scroll-driven animations that finally feel natural
Most scroll animations feel disconnected because they rely on timelines instead of user input. This guide shows how to tie motion directly to scroll position, which makes interactions smoother and easier to reason about. It’s a small shift in approach that fixes a surprisingly common UI problem.
That’s all for this week. Have any ideas you want to see in the next article? Hit Reply!
Cheers!
Editor-in-chief,
Kinnari Chohan
👋 Advertise with us
Interested in sponsoring this newsletter and reaching a highly engaged audience of tech professionals? Simply reply to this email, and our team will get in touch with the next steps.



