orchestration · inference-time scaffolding

Dynamic Workflows vs Recursive Language Models

Two ways to spend more than one model inference on a hard problem. One puts the plan in deterministic code; the other lets the model improvise the plan at runtime. They solve overlapping problems from opposite directions.

◆ Dynamic Workflows

A sandboxed JS script orchestrates subagents.
The plan is code — loops, fan-out, pipelines — written ahead of execution. Agents are stateless workers that return structured data.

◆ Recursive Language Models

A model recursively calls models over its context.
The plan is emergent — the model, inside a REPL, decides how to split a giant input and when to recurse on the pieces.

paradigm 01 The JS sandbox as an orchestrator

A dynamic workflow is a small plain-JavaScript program the model writes and the runtime executes in a sandbox (no filesystem, no network — just orchestration hooks). The script doesn't do the reasoning; it schedules reasoning. A handful of primitives are injected:

agent(prompt, opts) spawns a subagent and returns its text — or, with a schema, a validated object. pipeline(items, …stages) streams each item through stages with no barrier between them. parallel(thunks) is a barrier — it waits for all. phase(title) groups work; budget caps tokens; the run is resumable and cached.

// the model emits this; the sandbox runs it deterministically
const DIMENSIONS = ['bugs', 'perf', 'security']
const results = await pipeline(DIMENSIONS,
  // stage 1 — review each dimension (runs concurrently)
  d => agent(`review the diff for ${d}`, {schema: FINDINGS}),
  // stage 2 — adversarially verify each finding, no barrier:
  //   'bugs' findings verify while 'perf' is still being reviewed
  review => parallel(review.findings.map(f => () =>
    agent(`try to REFUTE: ${f.title}`, {schema: VERDICT})))
)
return results.flat().filter(f => f.verdict.isReal)

Because control flow lives in code, it is deterministic and inspectable: same script + same inputs ⇒ same structure of work. You get explicit parallelism, concurrency caps (≈ min(16, cores−2) agents at once), token budgets, live phase progress, and journal-based resume. Each agent runs in a fresh, scoped context — long-context degradation is avoided by isolation: no worker ever sees the whole problem.

JS script orchestrator PHASE: review PHASE: verify bugs perf security verify ×3 verify ×2 verify ×4
pipeline() — fan out by phase, verify each item the moment its review lands (no barrier)

paradigm 02 The model as its own recursion

A Recursive Language Model flips the locus of control. Instead of a script driving the model, the model is dropped into an environment — typically a REPL — where its enormous context is just a variable it can inspect, slice, and act on with code. Rather than read 10M tokens in one shot (where quality rots), a root model writes code to split the context and recursively calls a language model — often itself — on each piece, then combines the answers.

# the ROOT model improvises this in a REPL, at inference time
ctx = ENV["context"]              # may be far larger than the window
if len(ctx) < THRESHOLD:
    return call_llm(ctx + question)   # base case: just answer

chunks = split(ctx, ~50_000)         # the model decides how
notes  = [call_llm(f"distil for Q: {c}") for c in chunks]
#                ↑ each call_llm may ITSELF be a recursive LM
return call_llm(f"using {notes}, answer: {question}")

Nothing here is fixed in advance. Depth, fan-out, and decomposition are decided by the model as it reasons — it might recurse twice on a dense section and not at all on boilerplate. The win is near-unbounded context handled by deferral: the root never holds the whole input in its active window; it pulls in only what each sub-question needs. The cost is that the plan is stochastic and opaque — the same prompt can decompose differently run to run.

root LM+ REPL / ctx LM · chunk 1 LM · chunk 2 LM · chunk 3 LM · 2a LM · 2b depth grows where needed
recursion — the model decides where to descend; results bubble back up the tree

Watch the difference

Same job — “answer a question that needs six pieces of work.” Toggle the execution model and run it. The workflow fans out in structured phases you declared up front. The RLM grows a recursion tree the model shapes as it goes.

deterministic fan-out · phases declared in code
ready.

The comparison, axis by axis

Hover a row to focus it. Emphasise one side:

AxisDynamic WorkflowsRecursive Language Models
Who plansCode the model writes before executionThe model, during inference, via a REPL
OrchestratorA sandboxed JS program (not a model)A model interacting with an environment
Unit of workStateless subagent, structured I/O (schema)An LM call over a context slice — possibly recursive
DeterminismSame script+inputs → same structure; resumable, cachedStochastic decomposition; not naturally reproducible
Long contextAvoided by isolation — each agent sees only its sliceAvoided by deferral — root reads slices on demand
ParallelismExplicit: pipeline (no barrier), parallel (barrier), concurrency capImplicit/sequential unless the model writes parallel calls
VerificationFirst-class stages — adversarial verify, judge panelsThe model's own recursive judgement
Cost controlToken budget, concurrency caps, logged dropsBounded by the recursion depth the model chooses
ObservabilityPhases, labels, live progress, journal/resumeMostly opaque recursion trace
Failure modeRigidity — a bad script structure caps the outcomeInstability — runaway / insufficient recursion, compounding error
Best fitBroad fan-out: review, migration, research, auditsDeep reasoning over one enormous input

They compose

This isn't a contest — they nest. A workflow's agent() could itself be a recursive language model when one step must chew through a giant document. An RLM, conversely, could call a whole workflow as a tool when a sub-question needs structured fan-out and adversarial checks. Both are inference-time scaffolding: structure wrapped around raw next-token prediction so a model can do more than a single pass over a problem.

The deciding question is simply where you want the plan to live — pinned in deterministic, inspectable code, or grown by the model's own judgement at runtime. Predictability and breadth pull one way; adaptivity and unbounded context pull the other.