Two ways to spend more than one model inference on a hard problem. One puts the plan in deterministic code; the other lets the model improvise the plan at runtime. They solve overlapping problems from opposite directions.
A dynamic workflow is a small plain-JavaScript program the model writes and the runtime executes in a sandbox (no filesystem, no network — just orchestration hooks). The script doesn't do the reasoning; it schedules reasoning. A handful of primitives are injected:
agent(prompt, opts) spawns a subagent and returns its text — or, with a
schema, a validated object. pipeline(items, …stages) streams each item through stages
with no barrier between them. parallel(thunks) is a barrier — it waits for all.
phase(title) groups work; budget caps tokens; the run is resumable and cached.
// the model emits this; the sandbox runs it deterministically const DIMENSIONS = ['bugs', 'perf', 'security'] const results = await pipeline(DIMENSIONS, // stage 1 — review each dimension (runs concurrently) d => agent(`review the diff for ${d}`, {schema: FINDINGS}), // stage 2 — adversarially verify each finding, no barrier: // 'bugs' findings verify while 'perf' is still being reviewed review => parallel(review.findings.map(f => () => agent(`try to REFUTE: ${f.title}`, {schema: VERDICT}))) ) return results.flat().filter(f => f.verdict.isReal)
Because control flow lives in code, it is deterministic and inspectable: same script + same
inputs ⇒ same structure of work. You get explicit parallelism, concurrency caps
(≈ min(16, cores−2) agents at once), token budgets, live phase progress,
and journal-based resume. Each agent runs in a fresh, scoped context — long-context degradation
is avoided by isolation: no worker ever sees the whole problem.
A Recursive Language Model flips the locus of control. Instead of a script driving the model, the model is dropped into an environment — typically a REPL — where its enormous context is just a variable it can inspect, slice, and act on with code. Rather than read 10M tokens in one shot (where quality rots), a root model writes code to split the context and recursively calls a language model — often itself — on each piece, then combines the answers.
# the ROOT model improvises this in a REPL, at inference time ctx = ENV["context"] # may be far larger than the window if len(ctx) < THRESHOLD: return call_llm(ctx + question) # base case: just answer chunks = split(ctx, ~50_000) # the model decides how notes = [call_llm(f"distil for Q: {c}") for c in chunks] # ↑ each call_llm may ITSELF be a recursive LM return call_llm(f"using {notes}, answer: {question}")
Nothing here is fixed in advance. Depth, fan-out, and decomposition are decided by the model as it reasons — it might recurse twice on a dense section and not at all on boilerplate. The win is near-unbounded context handled by deferral: the root never holds the whole input in its active window; it pulls in only what each sub-question needs. The cost is that the plan is stochastic and opaque — the same prompt can decompose differently run to run.
Same job — “answer a question that needs six pieces of work.” Toggle the execution model and run it. The workflow fans out in structured phases you declared up front. The RLM grows a recursion tree the model shapes as it goes.
Hover a row to focus it. Emphasise one side:
| Axis | Dynamic Workflows | Recursive Language Models |
|---|---|---|
| Who plans | Code the model writes before execution | The model, during inference, via a REPL |
| Orchestrator | A sandboxed JS program (not a model) | A model interacting with an environment |
| Unit of work | Stateless subagent, structured I/O (schema) | An LM call over a context slice — possibly recursive |
| Determinism | Same script+inputs → same structure; resumable, cached | Stochastic decomposition; not naturally reproducible |
| Long context | Avoided by isolation — each agent sees only its slice | Avoided by deferral — root reads slices on demand |
| Parallelism | Explicit: pipeline (no barrier), parallel (barrier), concurrency cap | Implicit/sequential unless the model writes parallel calls |
| Verification | First-class stages — adversarial verify, judge panels | The model's own recursive judgement |
| Cost control | Token budget, concurrency caps, logged drops | Bounded by the recursion depth the model chooses |
| Observability | Phases, labels, live progress, journal/resume | Mostly opaque recursion trace |
| Failure mode | Rigidity — a bad script structure caps the outcome | Instability — runaway / insufficient recursion, compounding error |
| Best fit | Broad fan-out: review, migration, research, audits | Deep reasoning over one enormous input |
This isn't a contest — they nest. A workflow's agent() could itself be a recursive
language model when one step must chew through a giant document. An RLM, conversely, could call a
whole workflow as a tool when a sub-question needs structured fan-out and adversarial checks.
Both are inference-time scaffolding: structure wrapped around raw next-token prediction so a model
can do more than a single pass over a problem.
The deciding question is simply where you want the plan to live — pinned in deterministic, inspectable code, or grown by the model's own judgement at runtime. Predictability and breadth pull one way; adaptivity and unbounded context pull the other.