SoloEngine: How to Let AI Run Every Industry

As someone with three years of experience in large language model algorithms, agent development, and knowledge base construction, I’ve recently had a thought: Vibe Coding has emerged in the programming industry simply because programmers know how to write code. Other industries don’t have Cursor or Claude Code, not because they lack the need for Agentic AI, but because they don’t use LangChain or CrewAI. I wanted to build a tool that lowers the barrier to Agentic AI development to the same simplicity as workflow tools like Dify. Thus, SoloEngine was born.

SoloEngine, as the first low‑code Agentic AI development platform, fully encapsulates mechanisms such as ReAct, Tool, MCP, Skill, and SubAgent into backend services. When using it, you simply drag an agent onto the canvas, connect collaboration relationships, configure the required tools, and click Run. The backend then automatically compiles everything into your very own Claude Code — planning, execution, and delivery are all autonomously completed by the agent.

Comparison: SoloEngine vs Other Solutions

Feature Dify, n8n, Zapier LangChain, CrewAI, LangGraph SoloEngine
Agentic AI ✗ Scripted workflows only ✓ ReAct / Multi‑Agent ✓ ReAct / Multi‑Agent
No coding required ✗ Python mandatory
Visual orchestration Partial support ✓ Full canvas experience
Domain experts can build independently ✓ (but workflows are not truly Agentic)
Multi‑agent collaboration

Core Design

For compilation efficiency, all agent nodes adopt a unified ReAct architecture. The platform parses superior‑subordinate relationships through topology, enabling connections and SubAgent calls. The visual design on the canvas is directly compiled into an executable agent team.

At runtime, each agent employs progressive disclosure, loading only the MCPs and Skills it needs on demand — token consumption can be reduced by over 85%.

On the model side, SoloEngine covers commonly used AI models such as OpenAI, Anthropic, Ollama, DeepSeek, Qwen, and Zhipu — a unified interface for seamless switching.

Release Updates

After more than a dozen development iterations, the v0.2 file change tracking and rollback mechanism has been released and is relatively stable. An official release build will be available soon. v0.3‘s one‑click deployment feature for Agentic AI is in its final stages, allowing compiled agent teams to be packaged as standalone products for self‑deployment or distribution and sales. Meanwhile, long‑term memory and autonomous evolution are also on the roadmap.

Quick Start

git clone https://github.com/Sh4r1ock/SoloEngine.git
cd SoloEngine

# Backend (Python 3.11+)
cd backend
pip install -r requirements.txt
python main.py

# Frontend (Node.js 18+) — run in another terminal
cd frontend 
npm install
npm run dev

Open http://localhost:8991 to build your first agent team.

Get Involved

The project is currently in a phase of rapid iteration. More participants are welcome to help AI drive every industry. We hope that in the future, AI will evolve from Vibe Coding into Vibe Everything.

Project repository: https://github.com/Sh4r1ock/SoloEngine

TLS Fingerprinting: How JA3 and JA4 Identify You Before You Send a Byte

Encryption hides the contents of your HTTPS connection — but the negotiation that sets up that encryption happens in the clear. The very first message your client sends, before a single byte of application data, has a distinctive shape. JA3 and JA4 turn that shape into a fingerprint that can identify your software, and sometimes route, throttle, or block you on the spot.

Every HTTPS connection starts with a TLS handshake, and the handshake starts with a message called the ClientHello. It is sent unencrypted, because the two sides have not yet agreed on a key. Inside it, your client announces everything it is willing to do: which TLS versions it supports, which cipher suites it prefers and in what order, which extensions it understands, which elliptic curves and signature algorithms it offers.

None of that is secret. None of it has to be. But taken together, the exact set and ordering of those parameters is remarkably specific to a particular piece of software at a particular version. Chrome 124 produces a different ClientHello from Firefox, which produces a different one from Python’s requests library, which differs from Go’s standard library, which differs from a curl built against a specific OpenSSL version. TLS fingerprinting is the practice of hashing that ClientHello into a short, stable identifier and looking it up.

What Goes Into the Fingerprint

The original technique, JA3, was published by three engineers at Salesforce in 2017 — John Althouse, Jeff Atkinson, and Josh Atkins, whose initials gave it the name. JA3 builds a string from five fields of the ClientHello, in order:

  • The TLS version offered
  • The list of cipher suites
  • The list of extensions
  • The list of supported elliptic curves (named groups)
  • The list of elliptic-curve point formats

Each field is rendered as its numeric values joined by hyphens, the fields are joined by commas, and the whole string is hashed with MD5 to produce a 32-character fingerprint. A companion technique, JA3S, does the same for the server’s ServerHello, so you can fingerprint both ends of a conversation. Pairing a client JA3 with a server JA3S is a common way to identify specific malware command-and-control channels, because the malware and its server both produce consistent, unusual hashes.

Why ordering matters: Two clients can support the exact same cipher suites and still fingerprint differently, because they offer them in a different preference order. That ordering is baked into the TLS library and rarely changes between builds — which is exactly what makes it a stable signal.

Why JA3 Started to Break

JA3 worked well for years, but two developments eroded it. The first was GREASE (RFC 8701), a mechanism Google introduced to keep the TLS ecosystem flexible. GREASE makes clients insert random reserved values into their cipher and extension lists, so that middleboxes don’t hard-code assumptions about what they see. The side effect is that a naive JA3 implementation produces a different hash on every connection unless it explicitly strips the GREASE values out.

The second was TLS 1.3 and the rise of extension shuffling. Chrome began randomizing the order of some ClientHello extensions on each connection specifically to discourage fingerprinting and ossification. Against a technique that depends on extension ordering, that is fatal: the same browser now yields many different JA3 hashes.

JA4: The Redesign

In 2023, John Althouse — one of the original JA3 authors, now at FoxIO — released JA4, the centerpiece of a broader suite called JA4+ that fingerprints not just TLS but HTTP, TCP, SSH, and more. JA4 was designed to survive the things that broke JA3.

The biggest structural change is that JA4 is partly human-readable. Instead of one opaque MD5, a JA4 fingerprint is divided into sections you can read at a glance:

  • A prefix describing the transport and TLS version, whether SNI is present, the count of cipher suites, the count of extensions, and the first ALPN value — for example, whether the client is speaking HTTP/2 or HTTP/1.1
  • A truncated hash of the cipher suites, sorted numerically so that order-shuffling no longer changes the result
  • A truncated hash of the extensions and signature algorithms, also handled so that cosmetic reordering doesn’t matter

GREASE values are stripped by definition. Because the cipher and extension lists are sorted before hashing, Chrome’s randomization no longer produces a moving target. The result is a fingerprint that is both more stable than JA3 and more informative, because a human analyst can read meaningful structure out of the prefix without consulting a lookup table.

Property JA3 (2017) JA4 (2023)
Output Single MD5 hash Structured, partly human-readable
Handles GREASE Only if implementation strips it Yes, by design
Survives extension shuffling No — order-dependent Yes — lists are sorted
Scope TLS ClientHello / ServerHello TLS, HTTP, TCP, SSH and more (JA4+)

Who Uses This, and For What

TLS fingerprinting is genuinely dual-use. On the defensive side, it is one of the more useful tools a network operator has. A fingerprint that claims to be Chrome in its User-Agent header but whose ClientHello matches Python’s requests is almost certainly a bot lying about itself. Security teams use JA3/JA4 to spot malware beaconing, to cluster automated traffic, and to flag scrapers that don’t match any real browser. Because the fingerprint is computed from bytes the client cannot easily fake without rebuilding its TLS stack, it is harder to spoof than a header.

That same strength is what makes it a censorship and tracking tool. A national firewall or a corporate middlebox can fingerprint every outbound connection and treat traffic differently based on what software produced it — throttling or blocking a circumvention tool whose handshake doesn’t look like a mainstream browser, even though it cannot read the encrypted payload. Anti-bot vendors and CDNs fingerprint connections to decide who gets served and who gets a challenge. The fingerprint becomes a passive selector applied before you have proven anything about who you are.

The encryption is doing its job perfectly. The leak is in the envelope, not the letter — and the envelope is, by necessity, written in the clear.

Can You Defend Against It?

Not cleanly, and that is the uncomfortable part. Because the fingerprint is derived from how your TLS library behaves, the only thorough defense is to make your traffic produce a common, unremarkable fingerprint — to look like everyone else. Circumvention tools increasingly do exactly this through uTLS, a Go library that lets a client mimic the precise ClientHello of a mainstream browser, GREASE and ordering included, so its JA3/JA4 blends into the crowd.

For an ordinary user, the practical reality is simpler: using a current, mainstream browser is itself a form of crowd-blending, because millions of others produce a near-identical handshake. The danger zone is unusual software — a custom client, an old library, a niche tool — that produces a rare fingerprint precisely because few others share it. This is the same logic that governs browser fingerprinting at the application layer: distinctiveness is the vulnerability, and the anonymity set is the defense.

The Broader Lesson

TLS fingerprinting is a clean illustration of a pattern that runs through nearly all privacy engineering: encrypting the contents of a channel does not hide the channel’s metadata, and the metadata is often enough. The handshake has to be in the clear so two strangers can agree on a key. The shape of that handshake leaks the identity of the software making it. No amount of payload encryption closes that gap, because the gap exists before encryption begins.

The honest takeaway is not that TLS is broken — it isn’t — but that “the connection is encrypted” answers a narrower question than most people think. Knowing what your tools reveal in the clear, and choosing tools whose visible behavior is common rather than distinctive, is the part of the threat model that fingerprinting forces you to take seriously.

Originally published at havenmessenger.com

RAG with Postgres pgvector in 2026: the full TypeScript pipeline.

RAG with Postgres pgvector in 2026: the full TypeScript pipeline.

I spent a week evaluating dedicated vector databases before deciding to just use the Postgres instance I already had. The pgvector extension handles similarity search well enough for most production workloads, and it collapses three infrastructure components into one. This walkthrough covers everything from schema to answer: chunk your docs, embed them, store in pgvector, retrieve by cosine similarity, and wire the results into an LLM call.

TL;DR

Step Tool Why
Enable vector store pgvector 0.8.x, HNSW index Runs in your existing Postgres, no extra infra
Embed text-embedding-3-small (1,536 dims) $0.02 per million tokens, fast
Query <=> cosine distance, top-k Works with both OpenAI and Voyage models
Augment Claude or GPT-4o with retrieved docs Context window stuffed, hallucination rate drops

1. Why pgvector instead of a dedicated vector database

Pinecone and Weaviate are good products. If you need multi-tenant isolation, sub-millisecond p99 at 100M+ vectors, or native hybrid search with BM25, they earn their place. For most teams, those are future problems.

The cost calculus changes when you consider ops burden. A dedicated vector DB means a new billing line, a new set of credentials to rotate, a new failure mode to track, and a new SDK to keep current in your application. pgvector runs as a Postgres extension: one connection string, one backup strategy, one source of truth. At 10M documents with 1,536-dimensional embeddings, an HNSW index on a reasonably sized Postgres instance returns top-10 results in under 10ms. That covers the overwhelming share of RAG use cases.

pgvector 0.8.0 added iterative HNSW scans. That release made filtered similarity search practical without falling back to sequential scans every time a WHERE clause got specific. The 0.8.0 release was what tipped my team from “maybe later” to “ship it.”

2. Schema setup

Enable the extension once per database, then create your table.

-- enable pgvector (run once per database)
CREATE EXTENSION IF NOT EXISTS vector;

-- documents table
CREATE TABLE documents (
  id         BIGSERIAL PRIMARY KEY,
  source     TEXT NOT NULL,          -- filename, URL, or ID of source doc
  chunk_idx  INT NOT NULL,           -- chunk number within the source
  content    TEXT NOT NULL,          -- raw text of the chunk
  embedding  vector(1536) NOT NULL,  -- OpenAI text-embedding-3-small
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Choosing between HNSW and IVFFlat

HNSW builds a navigable small-world graph. Queries scan the graph instead of comparing all rows. Build once, query immediately. The tradeoff is that the index takes more memory: roughly 8 bytes per dimension per row for a 1,536-dim column at default settings.

IVFFlat partitions the embedding space into centroid clusters. Faster to build, smaller memory footprint, but you must load rows before building the index or the centroid assignment is useless. If you are starting from zero rows, build HNSW.

-- HNSW index (recommended default)
-- m = connections per layer (default 16), higher = better recall at higher memory cost
-- ef_construction = candidate list during build (default 64), higher = better recall at slower build
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- IVFFlat alternative (only after loading rows)
-- lists = sqrt(row_count) is a good starting point for large tables
-- CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

Use vector_cosine_ops with the <=> operator when your embedding model normalizes vectors (OpenAI and Voyage both do). Use vector_l2_ops with <-> for raw Euclidean distance when vectors are not normalized. Use vector_ip_ops with <#> for inner product, which equals cosine similarity on normalized vectors and saves one normalization step.

3. Ingest pipeline in TypeScript

The ingest function chunks a document, calls the embedding API, and bulk inserts rows. Use postgres (the npm package, not pg) for its tagged-template SQL and native array support.

import postgres from "postgres";
import OpenAI from "openai";

const sql = postgres(process.env.DATABASE_URL!);
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! });

const CHUNK_SIZE = 512;   // tokens, not characters
const CHUNK_OVERLAP = 64; // tokens of overlap between adjacent chunks

function chunkText(text: string, size: number, overlap: number): string[] {
  // naive word-boundary chunker — swap for tiktoken in production
  const words = text.split(/s+/);
  const chunks: string[] = [];
  let start = 0;
  while (start < words.length) {
    const end = Math.min(start + size, words.length);
    chunks.push(words.slice(start, end).join(" "));
    start += size - overlap;
  }
  return chunks;
}

async function embedBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts,
  });
  return response.data.map((d) => d.embedding);
}

export async function ingestDocument(source: string, text: string): Promise<void> {
  const chunks = chunkText(text, CHUNK_SIZE, CHUNK_OVERLAP);

  // embed in batches of 100 (OpenAI max batch size)
  const BATCH = 100;
  for (let i = 0; i < chunks.length; i += BATCH) {
    const batch = chunks.slice(i, i + BATCH);
    const embeddings = await embedBatch(batch);

    const rows = batch.map((content, j) => ({
      source,
      chunk_idx: i + j,
      content,
      embedding: JSON.stringify(embeddings[j]),
    }));

    await sql`
      INSERT INTO documents (source, chunk_idx, content, embedding)
      SELECT
        r.source,
        r.chunk_idx::int,
        r.content,
        r.embedding::vector
      FROM jsonb_to_recordset(${JSON.stringify(rows)}::jsonb)
        AS r(source text, chunk_idx text, content text, embedding text)
    `;
  }

  console.log(`[ingest] ${source}: ${chunks.length} chunks stored`);
}

A note on chunk size: 512 words is a starting point. The right size depends on your source material. Legal documents with dense paragraphs do better at 256 words. Code files need at least 300 lines or you lose function context. The overlap prevents the embedding from missing a sentence that straddles a chunk boundary.

4. Query pipeline in TypeScript

Embed the user’s question, run a top-k cosine similarity search, return the matching chunks.

export async function queryDocuments(
  question: string,
  topK = 5,
): Promise<Array<{ source: string; content: string; distance: number }>> {
  // embed the question with the same model used at ingest time
  const [embedding] = await embedBatch([question]);
  const embeddingStr = JSON.stringify(embedding);

  const rows = await sql<{ source: string; content: string; distance: number }[]>`
    SELECT
      source,
      content,
      (embedding <=> ${embeddingStr}::vector) AS distance
    FROM documents
    ORDER BY embedding <=> ${embeddingStr}::vector
    LIMIT ${topK}
  `;

  return rows;
}

The <=> operator returns cosine distance (0 = identical, 2 = opposite). Lower numbers win. If you add metadata filters, add them in the WHERE clause before ORDER BY so the planner can use the HNSW iterative scan introduced in 0.8.0.

// filtered query example — same model must have returned results for this source
const rows = await sql<{ source: string; content: string; distance: number }[]>`
  SELECT source, content, (embedding <=> ${embeddingStr}::vector) AS distance
  FROM documents
  WHERE source = ${filterSource}
  ORDER BY embedding <=> ${embeddingStr}::vector
  LIMIT ${topK}
`;

5. Wiring retrieved docs into an LLM call

Concatenate the retrieved chunks into a context block, then call your model of choice. Claude 3.5 Sonnet or GPT-4o both handle long contexts well. Keep the context block under 80,000 tokens for cost reasons.

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });

export async function answerWithRAG(question: string): Promise<string> {
  const docs = await queryDocuments(question, 5);

  if (docs.length === 0) {
    return "No relevant documents found.";
  }

  const context = docs
    .map((d, i) => `[${i + 1}] (${d.source})n${d.content}`)
    .join("nn---nn");

  const prompt = `You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain the answer, say so.

Context:
${context}

Question: ${question}`;

  const response = await anthropic.messages.create({
    model: "claude-sonnet-4-6-20250929",
    max_tokens: 1024,
    messages: [{ role: "user", content: prompt }],
  });

  const block = response.content[0];
  return block.type === "text" ? block.text : "";
}

The “answer using only the provided context” instruction is load-bearing. Without it, the model mixes retrieval with parametric memory and you cannot tell which is which. If the answer comes from the context, citations work. If it comes from training data, they do not. Force the distinction at the prompt level.

One more thing worth noting: rerank before you send to the LLM. A fast cosine search returns the 5 closest chunks by vector distance, but distance does not always equal usefulness. A cross-encoder reranker (Cohere Rerank costs about $1 per 1,000 queries) takes your top-20 candidates and scores them for actual relevance before you trim to 5. The quality jump is noticeable. Skip the reranker while prototyping, add it before you hit production.

6. Two gotchas that bite everyone

Chunk size drives recall more than index parameters

Most teams spend hours tuning HNSW m and ef_construction and see marginal gains. The actual lever is chunk size and overlap. A chunk that is too short loses context (the model cannot answer a cross-sentence question). A chunk that is too long pulls in noise, dilutes the embedding, and wastes context window in the LLM call. Run a quick eval: take 20 representative questions, retrieve top-5, then manually score whether the answer appeared in the returned chunks. Adjust chunk size in 100-word steps until recall tops 85%. Then tune the index.

Build the index after bulk loading, not before

HNSW indexing at insert time is slow. If you load 500,000 documents and the HNSW index exists, every INSERT pays the graph update cost. The fast path: load all rows with the index dropped, then build it once with CREATE INDEX. On a table of 500,000 rows with 1,536-dim embeddings, a cold HNSW build takes roughly 8 to 12 minutes on 4 vCPUs. That is far cheaper than the cumulative insert overhead.

-- drop the index before bulk load
DROP INDEX IF EXISTS documents_embedding_idx;

-- ... run your ingest pipeline ...

-- rebuild once after load
CREATE INDEX documents_embedding_idx
  ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The bottom line

The full pipeline is about 120 lines of TypeScript and three SQL statements. pgvector 0.8.x is stable enough for production, HNSW is the right default index for most teams, and the two things that matter most for answer quality are chunk size and staying consistent between embed-at-ingest and embed-at-query time (same model, same preprocessing). Dedicated vector DBs are not wrong, they are just a layer you do not need until your row count passes 50M or your recall requirements get strict enough to warrant a tuning team.

What chunk size worked best for your use case? Drop it in the comments.

GDS K S · thegdsks.com · follow on X @thegdsks

Good retrieval beats a better model every time.

Same code, three clocks — letting a quant agent trade on its own without losing the audit trail


In the last post I argued that an LLM should never hold the approval token on a trade. A human approves. The model only proposes. That works as long as a human is in the loop on every order.

Then a user does the obvious thing. They take a strategy the agent wrote, like the backtest, and say “put it on the paper account.”

They expect it to trade: follow the market in, follow it out, update positions while they sleep.

The honest truth at that point: status = 'promoted' was a database flag. Nobody was ticking the strategy’s on_bar. The account didn’t move. That gap was the whole feature.

Closing it means the machine now places orders on live bars with no human clicking approve each time. Which sounds like exactly the thing the last post said not to do.

This post is how you close the gap without throwing away the audit trail. And the four places the trust boundary has to be redesigned the moment no human is in the chair.

The easy half: same code, three clocks

Inalpha holds one invariant tight: the Python file you backtest is the file you paper-trade. No fork for production. You swap two things underneath the strategy — the Clock and the Gateway — and the business logic doesn’t move.

The invariant itself isn’t rare. What’s rare is the thing standing on top of it here.

The author of that file is an LLM. It was vetted by a human. And it’s now running itself on live bars.

Quant engines hold the invariant, but don’t assume an agent wrote the strategy. Agent frameworks assume the LLM, but have nowhere to put a trading harness. Inalpha sits in that seam. And the same-code invariant is exactly what makes the audit chain mean anything: there’s precisely one file to point a signature at.

How it runs — three deployment modes, two clocks, one file:

  • Backtest: a TestClock driven by historical bars; fills simulated against a reference price.
  • Paper (live runner): a LiveClock on real wall-clock time, bars pulled fresh on the strategy’s timeframe, the same matching engine, the order routed out through the real plan/exec path — the only simulated part is that fills are matched locally instead of sent to a broker.
  • Live (real capital): architecturally the same seam — LiveClock, same kernel, same plan/exec path, only the Gateway swapping to a real broker. But real-money trading is deliberately out of scope for this project; holding the invariant isn’t about chasing it. The payoff is narrower and real: backtest and paper are literally one code path, so the audit chain has exactly one file to point a signature at.

So “three clocks” is shorthand: two clock implementations (TestClock / LiveClock), the third mode (real capital) a seam the architecture leaves open but the project doesn’t pursue — and the strategy file never notices which one it’s running under.

The live runner (services/paper/.../live_runner.py) is one long-lived task per running strategy. Each tick it does three things:

  1. pull the latest closed bar;
  2. feed it to a session that reuses the exact backtest kernel, firing the strategy’s on_bar;
  3. intercept the order the strategy emits and hand it to the guarded order path — it does not match locally.

When the fill comes back, it’s replayed into the session. So the strategy’s view of its own position stays consistent with what actually filled.

Why this matters for audit-grade, not just convenience: if your backtest and live code are two different files, no signature chain will tell you which one ran when the $93k order happened. Same code, three clocks is the precondition. It’s also the boring half. Here’s the half that kept me up.

The hard half: who approves the order?

Last post’s thesis was a three-step state machine. The LLM drives step one. A human drives the approval:

trade.create_plan       → plan: pending_approval
trade.approve_plan      → mints a single-use token
trade.execute_plan(tok) → places the order

A runner that trades while you sleep can’t stop and wait for a click on every bar. So the naive fix is to delete the approval step for the automated path. That’s the fix that quietly turns “audit-grade” back into “trust me.”

We did the opposite. The automated path goes through the same plan/exec state machine. The approval is just stamped approved_by = "system:live_runner".

Machine approval. The order still creates a plan. Still mints and consumes a single-use token. Still writes the same signed audit line. Nothing on the order path got a shortcut.

Machine approval is only honest if it’s earned. Ours rests on two human gates upstream, and the agent can’t route around either:

  1. A human promotes the candidate. promote is a deliberate human action, with permission: ask on the agent side. The model can’t self-promote a strategy into the runnable set.
  2. A human starts the run. paper.start_strategy is an explicit call a person makes for a specific market and timeframe.

So the chain reads: a person vetted this strategy, a person chose to run it here. Given those two signatures, having the machine approve each later order on live bars is the expected behavior, not a bypass. The audit line records system:live_runner as the approver for exactly this reason — a replay shows where the human gates were and where the machine took over.

Every order the runner places also writes a decision record (strategy_run_decisions): the bar context, the order intent, and the outcome (filled, rejected, or risk_rejected), cross-referenced to the plan and the trade.

The point of the autonomous path isn’t just that it trades. It’s that the next morning you can read, line by line, every bar where it wanted to act and what the harness did about it.

The trust boundary moves when the human leaves the chair

This isn’t a bug list. It’s four faces of one architectural question.

With a human in the loop, a lot of guarantees are propped up implicitly by “someone is at the screen.” Designing the unattended path means asking that again, on purpose: which of those props has to become something the system holds up on its own?

Four answers.

1. Identity has to become explicit.
When a human starts each run, ownership is implicit — whoever clicked owns it. Automate it, and ownership has to live in the data model, or there’s no boundary at all.

Concretely: the start path checked that a candidate was promoted, not that the caller owned it. So you could run someone else’s strategy on your own account.

The trap in fixing it was real. The candidate’s author_id is only set for UUID identities, while the account id falls back to uuid5 for everyone else. A naive author_id == account_id would lock out every non-UUID user. The fix derives an owner_account_id through the same function as the account id (migration 0013), so ownership is comparable for everyone.

2. Resource bounds are part of the trust boundary, not an ops detail.
A human starting runs self-limits. An API doesn’t. Each run is a long-lived task polling the data service on a timer, and the only limit was one instance per candidate — but a user can promote arbitrarily many. So the boundary grows a per-account cap (default 10) that returns 429, instead of letting one account quietly melt the event loop.

3. With no human, the default has to invert.
Fail-open is a default that assumes a backstop. Letting risk checks fail open in dev is fine when a human is at the screen.

The unattended runner is not at a screen. A risk engine that’s disabled or fails to load becomes an autonomous order loop with zero risk checks — the worst possible default. So on this path the default inverts: fail closed. No risk guard, no run, unless you explicitly opt out.

4. Backtest/live parity has to reach down to data shape.
A human wouldn’t trade a half-formed bar. The machine will, unless the architecture forbids it.

The latest bar each tick is often still forming — its close isn’t final. Acting on it silently diverges from the backtest, which only ever saw closed bars. So the runner decides only on closed bars, matching backtest semantics exactly.

(One implementation detail rides along. The loop treated every exception as retryable with backoff, so a determined-wrong error — a delisted symbol, a constraint violation — burned the whole retry budget before giving up. It now splits retryable from non-retryable and stops immediately on the latter. Plumbing, not architecture.)

Some of these I saw clearly only after an adversarial review of the shipped runner. But they aren’t scattered bugs. They’re four corollaries of one sentence: the trust boundary of an autonomous path is not the same boundary as one with a human in the loop.

What this still costs, and what we punted

Honesty section, same as last time.

  • The runner runs candidate code in the main event loop. The backtest path isolates strategy code in a resource-limited subprocess. The live session compiles and runs it inline. The AST audit is a static gate, not a runtime one — it won’t stop a pure-compute infinite loop from hanging the service. We lean on the two human gates to keep the code trusted. Subprocess/watchdog hardening is filed, not done.
  • A crash mid-fill can drift the in-memory position from the DB. The fill is committed to the DB first, then replayed into the session. If the process dies in between, a restart rebuilds the session from empty cash, not from the DB positions. “Resume a run from its real position” is the next robustness item, deliberately not faked in this release.
  • Single-instance only. Startup reconciliation marks every stranded running row as errored. Correct for one process, wrong the moment you run two. Multi-instance leasing is a Phase-F item, flagged in the code where it bites.

I’d rather ship the gates that are load-bearing now and name the ones that aren’t yet, than imply the autonomous path is hardened against things it isn’t.

So what

The cheap version of “let the agent trade for you” deletes the approval step and calls it autonomy.

The audit-grade version keeps the entire order path intact. It stamps the approver as the machine. It earns that stamp with two human gates the model can’t route around. Then it redesigns the trust boundary, so every guarantee the human used to backstop is one the system now holds on its own.

Autonomy isn’t the absence of the harness. It’s the harness running without you in the chair.

If this resonated:

  • 📬 Subscribe to Inalpha on Substack — one long-form post a month, ADRs and post-mortems, no algorithm between us and you
  • github.com/mirror29/inalpha — the live runner, the plan/exec path, and the four boundary changes above are all in services/paper
  • 👉 Next post: Sandboxed strategy evolution — three gates + multi-objective fitness. What happens when you actually let the LLM mutate trading code, and what catches it when it shouldn’t have. (Yes, the one I promised last time — it’s next.)