Same code, three clocks — letting a quant agent trade on its own without losing the audit trail


In the last post I argued that an LLM should never hold the approval token on a trade. A human approves. The model only proposes. That works as long as a human is in the loop on every order.

Then a user does the obvious thing. They take a strategy the agent wrote, like the backtest, and say “put it on the paper account.”

They expect it to trade: follow the market in, follow it out, update positions while they sleep.

The honest truth at that point: status = 'promoted' was a database flag. Nobody was ticking the strategy’s on_bar. The account didn’t move. That gap was the whole feature.

Closing it means the machine now places orders on live bars with no human clicking approve each time. Which sounds like exactly the thing the last post said not to do.

This post is how you close the gap without throwing away the audit trail. And the four places the trust boundary has to be redesigned the moment no human is in the chair.

The easy half: same code, three clocks

Inalpha holds one invariant tight: the Python file you backtest is the file you paper-trade. No fork for production. You swap two things underneath the strategy — the Clock and the Gateway — and the business logic doesn’t move.

The invariant itself isn’t rare. What’s rare is the thing standing on top of it here.

The author of that file is an LLM. It was vetted by a human. And it’s now running itself on live bars.

Quant engines hold the invariant, but don’t assume an agent wrote the strategy. Agent frameworks assume the LLM, but have nowhere to put a trading harness. Inalpha sits in that seam. And the same-code invariant is exactly what makes the audit chain mean anything: there’s precisely one file to point a signature at.

How it runs — three deployment modes, two clocks, one file:

  • Backtest: a TestClock driven by historical bars; fills simulated against a reference price.
  • Paper (live runner): a LiveClock on real wall-clock time, bars pulled fresh on the strategy’s timeframe, the same matching engine, the order routed out through the real plan/exec path — the only simulated part is that fills are matched locally instead of sent to a broker.
  • Live (real capital): architecturally the same seam — LiveClock, same kernel, same plan/exec path, only the Gateway swapping to a real broker. But real-money trading is deliberately out of scope for this project; holding the invariant isn’t about chasing it. The payoff is narrower and real: backtest and paper are literally one code path, so the audit chain has exactly one file to point a signature at.

So “three clocks” is shorthand: two clock implementations (TestClock / LiveClock), the third mode (real capital) a seam the architecture leaves open but the project doesn’t pursue — and the strategy file never notices which one it’s running under.

The live runner (services/paper/.../live_runner.py) is one long-lived task per running strategy. Each tick it does three things:

  1. pull the latest closed bar;
  2. feed it to a session that reuses the exact backtest kernel, firing the strategy’s on_bar;
  3. intercept the order the strategy emits and hand it to the guarded order path — it does not match locally.

When the fill comes back, it’s replayed into the session. So the strategy’s view of its own position stays consistent with what actually filled.

Why this matters for audit-grade, not just convenience: if your backtest and live code are two different files, no signature chain will tell you which one ran when the $93k order happened. Same code, three clocks is the precondition. It’s also the boring half. Here’s the half that kept me up.

The hard half: who approves the order?

Last post’s thesis was a three-step state machine. The LLM drives step one. A human drives the approval:

trade.create_plan       → plan: pending_approval
trade.approve_plan      → mints a single-use token
trade.execute_plan(tok) → places the order

A runner that trades while you sleep can’t stop and wait for a click on every bar. So the naive fix is to delete the approval step for the automated path. That’s the fix that quietly turns “audit-grade” back into “trust me.”

We did the opposite. The automated path goes through the same plan/exec state machine. The approval is just stamped approved_by = "system:live_runner".

Machine approval. The order still creates a plan. Still mints and consumes a single-use token. Still writes the same signed audit line. Nothing on the order path got a shortcut.

Machine approval is only honest if it’s earned. Ours rests on two human gates upstream, and the agent can’t route around either:

  1. A human promotes the candidate. promote is a deliberate human action, with permission: ask on the agent side. The model can’t self-promote a strategy into the runnable set.
  2. A human starts the run. paper.start_strategy is an explicit call a person makes for a specific market and timeframe.

So the chain reads: a person vetted this strategy, a person chose to run it here. Given those two signatures, having the machine approve each later order on live bars is the expected behavior, not a bypass. The audit line records system:live_runner as the approver for exactly this reason — a replay shows where the human gates were and where the machine took over.

Every order the runner places also writes a decision record (strategy_run_decisions): the bar context, the order intent, and the outcome (filled, rejected, or risk_rejected), cross-referenced to the plan and the trade.

The point of the autonomous path isn’t just that it trades. It’s that the next morning you can read, line by line, every bar where it wanted to act and what the harness did about it.

The trust boundary moves when the human leaves the chair

This isn’t a bug list. It’s four faces of one architectural question.

With a human in the loop, a lot of guarantees are propped up implicitly by “someone is at the screen.” Designing the unattended path means asking that again, on purpose: which of those props has to become something the system holds up on its own?

Four answers.

1. Identity has to become explicit.
When a human starts each run, ownership is implicit — whoever clicked owns it. Automate it, and ownership has to live in the data model, or there’s no boundary at all.

Concretely: the start path checked that a candidate was promoted, not that the caller owned it. So you could run someone else’s strategy on your own account.

The trap in fixing it was real. The candidate’s author_id is only set for UUID identities, while the account id falls back to uuid5 for everyone else. A naive author_id == account_id would lock out every non-UUID user. The fix derives an owner_account_id through the same function as the account id (migration 0013), so ownership is comparable for everyone.

2. Resource bounds are part of the trust boundary, not an ops detail.
A human starting runs self-limits. An API doesn’t. Each run is a long-lived task polling the data service on a timer, and the only limit was one instance per candidate — but a user can promote arbitrarily many. So the boundary grows a per-account cap (default 10) that returns 429, instead of letting one account quietly melt the event loop.

3. With no human, the default has to invert.
Fail-open is a default that assumes a backstop. Letting risk checks fail open in dev is fine when a human is at the screen.

The unattended runner is not at a screen. A risk engine that’s disabled or fails to load becomes an autonomous order loop with zero risk checks — the worst possible default. So on this path the default inverts: fail closed. No risk guard, no run, unless you explicitly opt out.

4. Backtest/live parity has to reach down to data shape.
A human wouldn’t trade a half-formed bar. The machine will, unless the architecture forbids it.

The latest bar each tick is often still forming — its close isn’t final. Acting on it silently diverges from the backtest, which only ever saw closed bars. So the runner decides only on closed bars, matching backtest semantics exactly.

(One implementation detail rides along. The loop treated every exception as retryable with backoff, so a determined-wrong error — a delisted symbol, a constraint violation — burned the whole retry budget before giving up. It now splits retryable from non-retryable and stops immediately on the latter. Plumbing, not architecture.)

Some of these I saw clearly only after an adversarial review of the shipped runner. But they aren’t scattered bugs. They’re four corollaries of one sentence: the trust boundary of an autonomous path is not the same boundary as one with a human in the loop.

What this still costs, and what we punted

Honesty section, same as last time.

  • The runner runs candidate code in the main event loop. The backtest path isolates strategy code in a resource-limited subprocess. The live session compiles and runs it inline. The AST audit is a static gate, not a runtime one — it won’t stop a pure-compute infinite loop from hanging the service. We lean on the two human gates to keep the code trusted. Subprocess/watchdog hardening is filed, not done.
  • A crash mid-fill can drift the in-memory position from the DB. The fill is committed to the DB first, then replayed into the session. If the process dies in between, a restart rebuilds the session from empty cash, not from the DB positions. “Resume a run from its real position” is the next robustness item, deliberately not faked in this release.
  • Single-instance only. Startup reconciliation marks every stranded running row as errored. Correct for one process, wrong the moment you run two. Multi-instance leasing is a Phase-F item, flagged in the code where it bites.

I’d rather ship the gates that are load-bearing now and name the ones that aren’t yet, than imply the autonomous path is hardened against things it isn’t.

So what

The cheap version of “let the agent trade for you” deletes the approval step and calls it autonomy.

The audit-grade version keeps the entire order path intact. It stamps the approver as the machine. It earns that stamp with two human gates the model can’t route around. Then it redesigns the trust boundary, so every guarantee the human used to backstop is one the system now holds on its own.

Autonomy isn’t the absence of the harness. It’s the harness running without you in the chair.

If this resonated:

  • 📬 Subscribe to Inalpha on Substack — one long-form post a month, ADRs and post-mortems, no algorithm between us and you
  • github.com/mirror29/inalpha — the live runner, the plan/exec path, and the four boundary changes above are all in services/paper
  • 👉 Next post: Sandboxed strategy evolution — three gates + multi-objective fitness. What happens when you actually let the LLM mutate trading code, and what catches it when it shouldn’t have. (Yes, the one I promised last time — it’s next.)