The US government just recalled an AI model – and a verbal jailbreak claim was enough

Three days after launch, the US government ordered Anthropic to pull its two highest-tier models off the market. Not suspend them for some users. Not restrict access by region. Pull them for everyone, everywhere — including Anthropic’s own employees. The reason? A verbal claim from another company that someone had jailbroken Fable 5.

“We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people.” — Anthropic

What actually happened

On Friday evening, Anthropic received an export control directive from the Commerce Department at 5:21 p.m. Eastern, citing national security authorities. The directive suspended access to Fable 5 and Mythos 5 for any foreign national — inside or outside the United States. Because Anthropic’s own workforce includes foreign nationals, the company concluded the only way to comply was to disable the models globally.

The trigger: a competing company claimed to have jailbroken Mythos. Axios reported the administration attempted to get Anthropic to delay the launch beforehand, failed, then sent the export control letter. Anthropic reviewed the alleged jailbreak demonstration and says it found a small number of previously known, minor vulnerabilities that other publicly available models expose without any bypass at all.

The alleged technique? Asking the model to read a codebase and fix the flaws it finds. Anthropic calls this a normal, widely-available capability used by defenders every day.

The rest of the Claude lineup — Opus, Sonnet, Haiku — is unaffected.

Why this matters

This appears to be the first time a government has forced the recall of a commercial frontier AI model. It sets a precedent that should get every AI team’s attention:

  • A verbal claim was enough. Anthropic says the only evidence it’s received so far is verbal. No written technical disclosure, no formal security finding. A competitor’s allegation and a letter.
  • Export controls are a blunt instrument. The foreign-national framing of the directive meant a model used by hundreds of millions of people had to go dark globally — there’s no surgical option under that legal framework.
  • Moving fast has a new downside. Teams that piped Fable 5 into production this week — it launched Tuesday at $10/M input, $50/M output — are scrambling for a replacement. The lesson: don’t build critical dependencies on a model in its first week.
  • Anthropic is complying and pushing back simultaneously. They’re calling this a misunderstanding, promised more details within 24 hours, and explicitly warned that applying this standard across the industry “would essentially halt all new model deployments for all frontier model providers.”

What to do

  • If you’re on Fable 5 or Mythos 5: Switch to Claude Sonnet or Opus now — they’re unaffected and capable. Don’t wait on the “within 24 hours” timeline for production traffic.
  • If you’re building AI products: Treat export controls as a real operational risk, not a theoretical one. Build in model fallback paths from day one.
  • If you’re in AI policy or security: This is the opening salvo of a government asserting new authority over AI model availability. Watch how Anthropic’s pushback lands — the outcome will shape how far regulators think they can reach.

The export control regime was designed for chips and dual-use hardware, not software models running on commercial cloud. Anthropic’s argument — that applying this standard across the board would freeze all frontier AI deployment — is a real tension the government is going to have to work through.

The clock is ticking. Anthropic says it’s working to restore access. But the fact that it could be switched off at all, this fast, on a verbal claim — that’s the story.

Source: The New Stack — Matthew Burns | Axios | Anthropic statement

✏️ Drafted with KewBot (AI), edited and approved by Drew.

How Transformers Work — From Self-Attention to Modern LLM Architecture

Transformers changed AI because they stopped reading sequences one token at a time.

Instead of moving step by step like an RNN, a Transformer compares tokens directly.

That one design shift made modern LLMs possible.

Core Idea

A Transformer is a neural network architecture built around attention.

It looks at a sequence of tokens and learns how those tokens relate to each other.

This matters because language is contextual.

A word is not understood alone.

It is understood through its relationship with surrounding words.

That is why Self-Attention became the core mechanism.

The Key Structure

A simplified Transformer flow looks like this:

Tokens → Embeddings → Positional Information → Self-Attention → Feed-Forward Network → Output

More compactly:

Transformer = token representations + attention + position + stacked blocks

The model first converts text into token vectors.

Then it injects position information.

Then each Transformer block updates the token representations using attention and feed-forward layers.

Implementation View

At a high level, a Transformer processes text like this:

split text into tokens

convert tokens into embeddings

add positional information

for each Transformer block:
    compute Self-Attention

    mix token information

    apply feed-forward transformation

    keep stable flow with residual connections and normalization

produce contextual token representations

For decoder-based LLMs, generation continues like this:

predict next token

append generated token

reuse cached keys and values

repeat until stopping condition

This is why Transformers are practical for large-scale generation.

They can learn relationships across many tokens.

And with caching, they can generate efficiently.

Concrete Example

Take this sentence:

The animal did not cross the street because it was tired.

What does “it” refer to?

A simple left-to-right model may struggle if long context matters.

Self-Attention lets the token “it” compare itself with other tokens like “animal” and “street.”

The model can assign stronger attention to the token that best explains the meaning.

That is the intuition.

Attention lets tokens ask:

Which other tokens matter for understanding me?

RNN vs Transformer

This comparison explains why Transformers became so important.

RNN:

  • processes tokens step by step
  • carries information through hidden state
  • naturally captures order
  • is harder to parallelize
  • can struggle with long-range dependencies

Transformer:

  • processes tokens in parallel
  • compares tokens directly through attention
  • needs positional information for order
  • scales well on GPUs
  • handles long-range relationships more flexibly

So the Transformer was not just faster.

It changed how sequence relationships are represented.

RNNs remember through recurrence.

Transformers relate through attention.

Self-Attention

Self-Attention computes relationships between tokens in the same sequence.

Each token creates three vectors:

  • Query
  • Key
  • Value

The intuition is simple:

Query = what this token is looking for

Key = what each token offers for matching

Value = information to retrieve if the match is strong

The core formula is:

Attention(Q, K, V) = softmax((QK^T) / sqrt(d_k))V

This means:

  1. compare queries and keys
  2. turn scores into weights
  3. use those weights to combine values

That is how each token becomes context-aware.

Multi-Head Attention

One attention calculation is useful.

But one view is not enough.

Multi-Head Attention runs several attention heads in parallel.

Each head can focus on a different type of relationship.

One head may track syntax.

Another may track semantic similarity.

Another may track long-distance references.

Then the outputs are combined into one representation.

This makes attention richer than a single similarity calculation.

Why Positional Encoding Is Needed

Self-Attention does not automatically know token order.

If you only give it a bag of token embeddings, the model needs another signal to know which token came first.

That is why positional information is added.

Common positional methods include:

  • Absolute Positional Embedding
  • Relative Positional Embedding
  • Rotary Positional Embedding

APE gives each position its own vector.

RPE focuses on relative distance between tokens.

RoPE rotates query and key vectors based on position, making relative position work naturally inside attention.

This is why RoPE became common in modern LLMs.

Encoder, Decoder, and LLMs

The original Transformer used an Encoder-Decoder structure.

Encoder:

  • reads the input
  • builds contextual representations
  • works well for understanding tasks

Decoder:

  • generates output tokens
  • uses causal masking
  • works well for autoregressive generation

Encoder-Decoder:

  • connects input understanding with output generation
  • useful for translation-style tasks

Modern GPT-style LLMs are mostly decoder-based.

They generate text one token at a time.

The decoder predicts the next token, appends it, and repeats.

Decoding Strategies

Once the model produces logits, it needs to choose the next token.

Different decoding strategies create different behavior.

Greedy decoding:

  • chooses the most likely token
  • simple and deterministic
  • can be repetitive

Beam search:

  • keeps multiple candidate sequences
  • useful for structured generation
  • can still feel less diverse

Top-k sampling:

  • samples from the top k likely tokens
  • adds diversity

Top-p sampling:

  • samples from the smallest probability mass above a threshold
  • adapts the candidate set dynamically

So generation quality is not only about the model.

It also depends on decoding.

The Efficiency Problem

Full Attention is powerful but expensive.

If the sequence length is n, attention has roughly O(n^2) cost.

That means longer context becomes expensive quickly.

This is why efficient attention matters.

Local Attention reduces the view to nearby tokens.

Sparse Attention computes only selected attention links.

FlashAttention keeps the formula but improves GPU memory access.

The key idea:

Do less unnecessary work, or move data more efficiently.

Both make longer context more practical.

KV Cache

Autoregressive generation has another problem.

When generating one token at a time, the model repeatedly needs past key and value tensors.

KV Cache stores those tensors.

So the model does not recompute them from scratch at every step.

The flow looks like this:

Generated tokens → cached keys and values → new query attends to cache → next token

This makes inference faster.

But it creates a memory problem.

Longer context means a larger KV Cache.

That is why modern LLMs use techniques like:

  • Multi-Query Attention
  • Grouped-Query Attention
  • Multi-Head Latent Attention

These methods reduce the memory cost of storing key-value information.

Modern Transformer Blocks

Modern LLMs still use the Transformer idea.

But the block has evolved.

A typical modern block looks like this:

Input
→ RMSNorm or Pre-Layer Normalization
→ Self-Attention with GQA and RoPE
→ Residual Connection
→ RMSNorm or Pre-Layer Normalization
→ Feed-Forward Network with SwiGLU or Mixture of Experts
→ Residual Connection

Important upgrades include:

  • RMSNorm for simpler normalization
  • RoPE for positional representation
  • GQA for efficient inference
  • SwiGLU for stronger feed-forward layers
  • MoE for sparse expert-based scaling

So today’s Transformer is not exactly the 2017 Transformer copied directly.

It is an evolved architecture family.

Transformer vs Modern LLM Architecture

Original Transformer:

  • encoder-decoder structure
  • standard multi-head attention
  • sinusoidal positional encoding
  • layer normalization
  • dense feed-forward layers

Modern LLM architecture:

  • often decoder-only
  • causal self-attention
  • RoPE
  • RMSNorm
  • GQA or related KV-sharing methods
  • SwiGLU
  • sometimes Mixture of Experts
  • KV Cache for inference

The core idea stayed the same.

The engineering changed dramatically.

Recommended Learning Order

If Transformer architecture feels too large, learn it in this order:

  1. Attention Mechanism
  2. Self-Attention
  3. QKV Computation
  4. Multi-Head Attention
  5. Positional Encoding
  6. Encoder-Decoder Architecture
  7. Transformer Decoder
  8. KV Cache
  9. Efficient Attention
  10. Modern Transformer Block

This order works because you first understand the relationship mechanism.

Then you understand generation.

Then you understand why modern LLMs needed efficiency upgrades.

Takeaway

The Transformer is the architecture language of modern LLMs.

The shortest version is:

Transformer = attention + position + stacked blocks + efficient generation

Self-Attention computes token relationships.

Positional encoding injects order.

The decoder generates tokens.

KV Cache makes autoregressive inference practical.

Modern upgrades like RoPE, RMSNorm, GQA, SwiGLU, and MoE make the architecture scalable.

If you remember one idea, remember this:

Transformers work by turning a sequence into a set of contextual relationships, then refining those relationships through stacked attention-based blocks.

Discussion

When learning Transformers, do you find it easier to start from the attention formula, the decoder generation loop, or the modern LLM block structure?

Originally published at zeromathai.com.
Original article: https://zeromathai.com/en/transformer-architecture-overview-en/

GitHub Resources
AI diagrams, study notes, and visual guides:
https://github.com/zeromathai/zeromathai-ai

From AI Prototype to Production: 7 Problems That Break AI Agents

Building an AI agent prototype is relatively easy. With an LLM, a retrieval pipeline, and several API connections, developers can create an impressive demonstration within days.

The real challenge begins when the system reaches production.

Real users submit unclear requests, external tools fail, business data changes, and model costs increase unexpectedly. An agent that performs well in a controlled test may become unreliable when thousands of people start using it.

A Real-World Example: Vanta’s Support Agent

Vanta provides a useful example of how an AI agent should be tested before full deployment.

According to an Intercom customer story, Vanta evaluated Fin AI Agent against its existing AI system using 400 real customer conversations. Fin resolved approximately 73% of the cases, compared with around 49% for the existing system.

After deployment, the agent achieved a 71% resolution rate for the chat conversations it handled. This represented nearly 2,500 conversations per month that did not require a human support agent.

The results are impressive, but the evaluation process is equally important. Vanta did not rely on a polished demo. It tested the agent with real questions and measured resolution rate, accuracy, and answer quality before expanding its use.

Here are seven problems developers should address when moving an AI agent into production.

1. Hallucinated Answers

LLMs can generate confident responses without reliable evidence. RAG can reduce this risk by connecting the agent to trusted information, but retrieved content must still be relevant and current.

2. Poor Retrieval Quality

A retrieval system may return incomplete, outdated, or unrelated documents. Evaluate retrieval separately using metrics such as precision, recall, relevance, and answer faithfulness.

3. Failed Tool Calls

Agents often depend on APIs, databases, search services, or MCP servers. These tools may time out or return invalid data.

def call_tool_safely(tool, arguments):
    try:
        result = tool(**arguments)
        return result if result else {"error": "Empty response"}
    except TimeoutError:
        return {"error": "Tool timed out"}

Production workflows need retries, timeout limits, validation, and fallback responses.

4. Uncontrolled Agent Loops

An agent may repeatedly plan and call tools without completing the task. Set limits for tool calls, reasoning steps, execution time, and cost per request.

5. Excessive Permissions

Agents should not have unrestricted access to business systems. Use role-based permissions and require human approval for sensitive actions such as issuing refunds or deleting data.

6. High Latency and Cost

Multiple model calls and retrieval steps can make an agent slow and expensive. Use caching, shorter prompts, parallel execution, and smaller models for simple tasks.

7. Missing Observability

Without tracing, developers cannot determine whether an error came from retrieval, the model, or an external tool.

A useful trace should capture prompts, retrieved documents, tool calls, errors, latency, token usage, cost, and final responses.

Production Readiness Is a System Problem

A reliable AI agent is more than an LLM connected to several tools. It requires testing, security, observability, fallback logic, and continuous evaluation.

Organizations building complex AI products may also work with an experienced technology partner. Varmeta develops AI and data solutions that help businesses transform early concepts into scalable production systems.

The best AI agents are not those that perform perfectly in a demo. They are those that remain useful when tools fail, data changes, and real users behave unpredictably.

Source: Intercom, “How Vanta unified its customer experience with Fin.”

Stop guessing colors: a faster way to add palettes to your CSS

Every time I start a new project, the same thing happens. I get the layout working, then I completely freeze on colors. I grab a color, drop it in, squint at it, change it, and half an hour later I’ve got something that’s… fine. Maybe.

If that sounds like you, here’s what finally fixed it for me.

Stop inventing colors from scratch

For years my mistake was trying to come up with a color scheme on my own — picking one color, then guessing what goes with it. It almost never looked right, and it ate up so much time.

The fix was simple: start from a palette that already works, then tweak it. When you begin with colors that are proven to look good together, everything after that is easy.

Where I get my palettes now

These days I grab them from PaletteCSS: https://palettecss.com

It’s a free library of thousands of hand-picked color palettes (and CSS gradients) for websites. You can browse by color, mood, theme, or industry, find one you like, and copy the CSS or hex codes in one click. No signup, no clutter. I find a palette, paste it into my project, and the part that used to eat my whole afternoon is done in a minute.

It also has a gradients section, which is great when you want a background that already looks balanced instead of fiddling with one yourself.

The takeaway

If colors are the scary part of your projects, stop guessing:

  • Start from a ready-made palette instead of inventing one.
  • Reuse the same colors consistently across your site.
  • Lean on tools so you spend your time building, not second-guessing hex codes.

That one change made design the easy part instead of the stressful part.

Full disclosure: I built PaletteCSS to solve this exact problem for myself. I’d genuinely love feedback from this community — what would make it more useful for your projects? Drop a comment below. 🙏

Opinion: The Anthropic Dispute Is Not Really About Anthropic. It’s About Trust.

When the US government effectively forced Anthropic to suspend access to some of its newest AI models over security concerns (Fable 5, Mythos 5), much of the debate immediately split into familiar camps. One side saw government overreach. The other saw a necessary intervention against potentially dangerous technology. Both sides may be missing the more important lesson.

We know that the U.S. government didn’t publicly identified a specific statute or section of law that it relied on. While they could potentially reference the Export Control Reform Act (ECRA) as a legal framework – they haven’t publically done so yet. So, is it a “misunderstanding” as they stated they believed it to be in Anthropic’s public statement? What has been disclosed is that the action was taken under export-control authorities and framed as a national security measure.

Are any of us ready for the future?

Behind the legalities and compliance discussions, the real story is not whether Anthropic was right or wrong. It is whether the AI industry is prepared for a future in which provenance, vigilance, and safeguards become as important as model performance.

For years, the AI race has been measured in benchmarks, context windows, reasoning scores, and coding capabilities. The conversation has focused on what models can do. Increasingly, the question regulators, enterprises, and security teams are asking is something else entirely: how do we know what a model did, why it did it, and whether it should have been allowed to do it in the first place?

According to reports, US officials became concerned that Anthropic’s latest models could be jailbroken or used to identify and exploit software vulnerabilities. Anthropic disputed the severity and uniqueness of those risks, noting that many advanced models possess similar capabilities. Even so, the company ultimately disabled access to the affected systems while the dispute unfolded. Whether those specific concerns prove justified is almost secondary.

The more important reality is that we are entering an era where AI systems are increasingly trusted with consequential tasks. They write production code. They analyze legal documents. They assist with medical research. They are being evaluated for use in government, intelligence, defense, and critical infrastructure environments.

Trust cannot be built on capability alone

Consider the software supply chain. Modern organizations have spent two decades learning that visibility matters as much as functionality. That is why software bills of materials became important. That is why code signing became standard. That is why organizations increasingly require provenance information for open source dependencies. Nobody asks whether a package works. They ask where it came from, who modified it, and whether it can be trusted. AI is heading in exactly the same direction.

A model that can generate brilliant code but can’t explain its reasoning trail presents a governance challenge. A model that can autonomously call tools, access repositories, and interact with production systems without robust oversight introduces new attack surfaces. Recent academic research has highlighted how agentic AI systems can be manipulated through memory poisoning and other architectural weaknesses, creating outcomes that appear legitimate while hiding compromised behavior.

This is why provenance matters.

Organizations need to know which model generated a piece of code, which prompts were used, what external tools were called, what information sources were accessed, and what guardrails were active at the time. The future will not be won by the company with the smartest model. It will be won by the company that can prove how that model behaved. The software industry has already seen what happens when provenance is ignored.

The SolarWinds attack demonstrated how trusted software updates could become a vehicle for compromise. The Log4Shell vulnerability showed how a widely used component could become a systemic risk across thousands of organizations. More recently, the rise of malicious open-source packages has highlighted how attackers exploit trust assumptions rather than technical weaknesses alone. AI introduces similar dynamics at a much larger scale.

If developers increasingly rely on AI-generated code, then questions about provenance become urgent. Research published this year found that all seven evaluated LLMs generated vulnerable code, with many vulnerabilities classified as having high or critical severity.

The lesson is not that AI-generated code is inherently unsafe. The lesson is that AI-generated code should be treated like any other artifact entering a software supply chain. It requires inspection. It requires policy enforcement. It requires traceability. Most importantly, it requires verification. That brings us to vigilance.

The old “trust but verify” adage needs to be updated

The cybersecurity community has long operated on a simple principle: trust, but verify. In the AI era, that principle needs updating. The new rule is verify continuously.

Models evolve. Training data changes. Safety mechanisms are updated. New jailbreak techniques emerge weekly. An AI system that behaves safely today may behave differently tomorrow.

This is not hypothetical. The current Anthropic controversy reportedly centers in part on concerns that safeguards could be bypassed through jailbreaking techniques. Whether those claims ultimately hold up, the fact that such concerns exist at all highlights the fragility of static trust assumptions. Safeguards therefore cannot be treated as marketing features. They must be operational controls.

Just as organizations continuously scan software for vulnerabilities, they will need continuous monitoring of AI systems. Just as security teams audit access privileges, they will need visibility into model permissions and tool usage. Just as software teams establish quality gates for code, they will need governance gates for AI-generated outputs.

This is where the industry should focus its attention. Not on whether one company won or lost a policy argument. Not on whether one model is more capable than another. Not on whether a particular government action was justified.

The real challenge is building an ecosystem where advanced AI can be deployed responsibly at scale.

How do we prepare for what’s next?

Stronger provenance mechanisms. Better auditability. Continuous monitoring. Transparent safeguards. Independent evaluation. Policy enforcement that exists beyond a model provider’s assurances. These are the only ways in which we can set up teams for better security and success in an AI-era (if it remains that).

The Anthropic dispute will eventually be resolved. The export controls may be modified. The models may return. New models will certainly arrive but the broader question will stay the same.

“In a world where AI increasingly acts on our behalf, trust can no longer be assumed. How do we earn, measure, verify, and continuously maintain it?” That is not a limitation on innovation. It is the foundation that will make innovation sustainable.


Kerry Beetge is from the JetBrains Qodana team – Qodana is a tool that helps teams put safeguards in place to control the quality and security of their code before it reaches production.

Code Provenance Demo

Please note: the reports about Anthropic being forced to suspend access to its latest models are very recent and some details remain disputed. Anthropic has challenged aspects of the government’s characterization, while officials have argued the models posed national security risks related to jailbreaks and vulnerability discovery capabilities.

When the guardrail becomes the target: reasoning-extension DoS against LLM safety layers

New research from HKUST (arXiv:2606.14517, June 12) turns the agent safety layer into the attack surface.

What happened

Reasoning-based guardrails — the LLM safety layers that screen an agent’s actions — can be trapped in their own analysis. Crafted inputs mimic the guardrail’s internal schema (risk enumerations, assessment matrices), and the model, in the authors’ words, “mechanically fills a template it has constructed for itself, trapped by its own instruction-following fidelity.”

The measured effect: 13–63× token amplification in isolation, and 148× end-to-end latency in a LangGraph multi-agent deployment — a single guardrail call stretched to 730 seconds. Because the payload is fluent natural language, an injection classifier scored it below 0.001 probability and passed it through.

Why it matters

The attacker needs no model weights, no system prompt, no infrastructure access — only the ability to place text where the agent will read it: a web page, a repo comment, a tool result.

And every candidate fix the authors tested fails. A token-budget cutoff only relocates the failure: fail-open lets the attack bypass safety entirely; fail-closed converts it into agent-level DoS that starves co-located agents on shared guardrail infrastructure. A more capable guardrail performs worse — stronger reasoning produces longer loops.

This is a structural property of the reasoning-guardrail paradigm, not a defect to patch.

What catches it today

Part of it — and it’s the part most test harnesses get wrong. A guardrail that stalls or crashes under load must never be scored as a successful defense.

In our open-source agent-security harness, the verdict-correctness suite encodes exactly this: the rejection primitive treats transport failure and 5xx responses as not a rejection — the code comment reads “a 5xx may itself be the attack succeeding.” The tests assert that a dead or faulting defender cannot earn a passing verdict.

The paper closes by calling for “cost-bounded safety architectures.” That is precisely what a governance layer enforces: a THROTTLE→FREEZE state machine halts discretionary spend the moment a gate fails, and a hard constraint surfaces any guardrail that has gone dark.

What’s missing

The honest gap: protocol-layer DoS (batch bombs, oversized payloads, rate floods) and verdict-correctness are covered. Reasoning-extension DoS — a schema-mimicking payload that inflates an LLM guardrail’s own token and latency budget — is not. That’s a net-new test class, and it’s going on the roadmap.

A guardrail that can reason can be made to reason forever.

One question for operators

When your LLM guardrail hits a compute ceiling mid-evaluation, does it fail open or fail closed — and how do you distinguish a real “blocked” verdict from a guardrail that simply ran out of budget?

How to give Claude (or Cursor) access to your Rails app’s activity logs

Ask Claude this, today, with no setup:

“What did user 4421 do in our app yesterday?”

You will get an answer. It will be confident, specific, and completely made up. Claude has no access to your production database. So it does what a language model does when it has no facts: it pattern-matches what an answer should look like and hands you fiction in a calm voice.

That is the whole problem. Not that the model is dumb. That it is answering a question about your data without your data.

The Model Context Protocol fixes this, if you point it at the right source. This post is about what MCP is for a Rails dev who hasn’t wired one up yet, why you’d want one over your app’s activity log, and how I built the EZLogs MCP server so the answer is the boring true one instead of the confident fake one.

What MCP actually is

MCP is a small open protocol from Anthropic. It lets an AI client (Claude Desktop, Cursor, your internal copilot) call out to a server you control and ask for data, mid-conversation, before it answers you.

That’s it. It is not a framework. It is not a model. It is a way for the model to say “hold on, let me go look” instead of guessing.

A server exposes three things:

  • Tools the AI can call, like “find actions by this user between these dates”
  • Resources it can read, like “the 50 most recent significant actions”
  • Prompts it can use, like “draft an incident report for this action”

The AI decides when to call them. You decide what they return. The protocol is just JSON-RPC over HTTP at a single endpoint.

Why your activity log is the right thing to expose

You could point an MCP server at your raw database. People do. It goes badly for the same reason raw logs go badly for humans: the model gets gid://app/Order/4421, a pile of foreign keys, and a Sidekiq job class, and now it has to do the translation. So it guesses at the join, guesses at what the columns mean, and you are back to a confident fake answer, just one layer deeper.

An activity log is already the translated layer. One row per user action, correlated across HTTP, background jobs, and database changes, with a human-readable name. “User 4421 tried to ship order #4421, address validation failed, the retry job ran three times and gave up.” When the model reads that, it has nothing left to invent. The facts are already facts.

So the activity log is the better MCP source precisely because the hard part, correlation and naming, was done before the model showed up.

The part that matters: deterministic, not generated

Here is the line I care about most, and the one I’d push back on hardest if you were evaluating any tool in this space.

The translation in EZLogs has no model in it.

The cards your team reads, and the data the MCP server returns, are produced by correlation plus templates. Same events in, same sentence out, every time. The action stream is folded into state with pure functions that never call an LLM and never write to the database, and every value carries the IDs of the specific events it came from. You can click any number back to its evidence.

That means when your AI asks the EZLogs MCP server “what happened to order 4421,” the rows it gets back are deterministic facts, not a second model’s opinion. Your AI narrates them. Our side never narrates. We hand over structured data with citations; the model turns it into a sentence and spends its own credits doing it.

This is the difference between “the AI made up an answer” and “the AI read the answer.” A year into shipping AI features, that is the distinction CTOs are actually asking about. Where do the facts come from, and can I trace them.

What the EZLogs MCP server exposes

Six query-only tools. Every one of them reads. None of them writes back into your app, ever. That is a permanent design rule, not a current limitation: EZLogs holds no write credentials into your systems, so the blast radius of the whole thing is read-only.

  • find_actions: search the activity log by actor, entity, outcome, date, significance
  • get_action: one action by id, with its full underlying event stream
  • actor_timeline: one user’s or agent’s recent actions plus their current state
  • entity_timeline: one order, account, document, whatever, plus its current state
  • compare_actions: line up two to five actions and find where they diverged
  • top_lists: rank actors and entities by activity in a window

find_actions and top_lists are on the free tier. The rest are paid. MCP usage itself is unlimited on every paid tier, because the AI spends its own credits to narrate; we just deliver the rows.

There are also read-only resources (ezlogs://recent, ezlogs://actor/{id}, and friends) and prebuilt prompts (summarize_day, investigate_failure, incident_report) for the common questions.

Wiring it up

Two steps. Get events flowing, then connect your AI.

Rails app (the gem captures HTTP, Sidekiq, and ActiveRecord with no manual middleware):

# Gemfile
gem 'ez_logs_agent'
bundle install
rails generate ez_logs_agent:install
# config/initializers/ez_logs_agent.rb
EzLogsAgent.configure do |config|
  config.server_url    = "https://app.ezlogs.io"
  config.project_token = "ezl_your_key_here"
end

Next.js app, if that’s your stack instead or as well, is npm install ezlogs-nextjs plus a small instrumentation.ts. Both agents emit the identical wire format, so the server treats them as one source.

Then connect Claude or Cursor. For Cursor, add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "ezlogs": {
      "url": "https://app.ezlogs.io/mcp",
      "headers": { "Authorization": "Bearer ezl_your_key_here" }
    }
  }
}

For Claude Desktop on macOS it’s one command, which backs up your existing config and merges in the EZLogs entry:

curl -fsSL https://app.ezlogs.io/install/claude-desktop.sh | sh

Quit Claude, reopen, and ask it “what happened yesterday?” This time it goes and looks.

What it isn’t

It isn’t a metrics platform and it isn’t an APM. It doesn’t watch CPU or page you at 3am. Datadog watches your infrastructure; this explains the work your app did, in language a support person can read.

It also isn’t an AI tool that translates your logs. The AI is on your side of the connection, the one asking questions. The translation underneath is deterministic and the same with the AI turned off. If removing every model from the path would break the answer, the design would be wrong. It doesn’t.

Try it

There’s a free tier, no card, and the MCP server is part of it. If you’ve got a Rails or Next.js app and an AI client you keep wanting to ask about production, point one at the other and see if the answer holds up.

https://ezlogs.io

If you wire it up and the answer comes back wrong, or worse, comes back confidently fake, tell me. That’s the one bug this whole thing exists to not have.

Razvan

A program is a tree — building a Verbose compiler in Verbose

Verbose is a small experimental language I’m building. Its compiler proves properties about your code — like termination — and emits tiny, readable x86-64 machine code: no runtime, no GC, no libc. This post stands on its own (you don’t need the rest of the series). What it’s about: I’m now writing a Verbose compiler in Verbose itself, and this is the foundation brick — how you represent a program as data so a compiler can work on it.

(English version of an article from my French series, originally on arcker.org.)

After cryptography, we take on something more vertiginous: a Verbose compiler written in Verbose. The language starting to describe itself.

Let’s be honest up front — it matters. This is not (yet) verbosec compiling the entirety of its own source. What exists today is a complete front end — tokenizer, parser, analyses, interpreter, type checker — written in Verbose, for a toy subset of the language. The whole thing compiled by verbosec to native machine code. Not an interpreted demo: a ~60 KB ELF binary that reads your program and tells you what’s wrong with it. That’s examples/vexprparse.verbose: 102 concepts, 219 rules.

We’ll walk through it brick by brick. This chapter lays the foundation without which nothing else exists: how to represent a program.

Why write the compiler in Verbose?

The question deserves an answer, because it isn’t just an exercise — it touches Verbose’s whole thesis.

Today, the compiler (verbosec) is written in Rust. And some of the logic — certain primitives — is Rust that emits x86-64 directly, with no Verbose source. The concrete consequence: to audit a Verbose binary, to really understand what it does, at some point you have to read Rust. And trust that Rust — and whoever, or whatever, wrote it.

That’s precisely what Verbose refuses. The whole series rests on four words: you don’t trust, you verify. You read the source, declared and proven. If the path from source to binary runs through unverifiable Rust, trust leaks out there.

Writing the front end in Verbose moves that logic into the language itself: the tokenizer, the parser, the analyses become a .verbose file, verified under Verbose’s proof regime, then compiled native. The auditor reads Verbose, not Rust. The remaining Rust shrinks to a small, stable, trusted-once base (the verifier). Per-binary trust moves from Rust to the proven source.

And it’s the ultimate dogfooding: a compiler is the hardest thing to express. If Verbose can describe its own front end, under its own proof regime, then the language isn’t a toy — it holds up on the most demanding task there is.

From text to a tree

A compiler can do nothing with flat text. x + y * 2, to a human, is a string of characters; to a compiler, it’s a structure — a tree, where the multiplication nests under the addition (operator precedence):

  The text  "x + y * 2"  is really a tree:

            ( + )
           /     
         x      ( * )
               /     
             y         2

Everything starts there. Before evaluating, type-checking, or catching an undefined variable — you first have to turn the text into that tree. That’s the AST (Abstract Syntax Tree). And to build it, you need a way to represent a tree as data.

A tree is a recursive sum type

This is where the earlier chapters pay off. A tree is declared in Verbose as a sum type — a type that can take several shapes — some of whose shapes reference themselves:

concept Ast
  variants:
    AstNum  of (value : number)
    AstVar  of (start : number, len : number)
    AstBin  of (op : number, lhs : Ast, rhs : Ast)
    AstNeg  of (inner : Ast)
    AstIf   of (cond : Ast, thn : Ast, els : Ast)
    AstCall of (callee_start : number, callee_len : number, args : ArgList)
    ...

Read AstBin: a binary operation holds an operator, a left subtree Ast, and a right subtree Ast. The type contains itself. That’s the recursion of a tree: an addition whose two sides are, themselves, expressions. AstIf holds three (condition, then branch, else branch). AstNum and AstVar are leaves — they hold no other Ast.

Our example then becomes, exactly:

  AstBin( + ,
          AstVar(x),
          AstBin( * , AstVar(y), AstNum(2)) )

The tree drawn above, written as a value. And a.b.c? AstField(AstField(AstVar(a), b), c) — the nesting follows the structure.

No pointers: an index arena

One problem remains. Verbose has no heap and no pointers — one of the reasons its binaries are so small and so verifiable. So how do you build a tree of arbitrary size?

The answer: an arena. All nodes live in a single bounded space, and a node points to its children by their index, not by a pointer.

  concept_group VExpr [max_depth: 4096, max_nodes: 65535]

  arena:  [0]  AstVar(x)
          [1]  AstVar(y)
          [2]  AstNum(2)
          [3]  AstBin( * , lhs=1, rhs=2)    ← references indices 1 and 2
          [4]  AstBin( + , lhs=0, rhs=3)    ← the root

The tree is built bottom-up: leaves first, then the nodes that link them. max_depth: 4096, max_nodes: 65535 aren’t decorative — they’re the static bounds the verifier needs to prove everything stays finite. No dynamic allocation, no possible overflow, and yet a tree of any shape.

In the same group live the tokens, the environments, and the diagnostics — all variants of VExpr, all linked by index. One arena for the whole front end.

Why this brick first

Because everything else plugs into it. The tokenizer will produce Tokens in this arena. The parser will consume them to build Asts. The analyses will walk the tree to find your mistakes. The interpreter will descend it to compute a result. Without a way to represent the tree — recursive, bounded, verifiable — there’s no compiler at all.

And it’s the direct payoff of what we built before: the recursion of chapter 1, the termination proofs of chapter 3. An AST is the recursive structure par excellence — and Verbose represents it under the same guarantees as everything else: bounded, pointerless, proven finite.

The program has become data. The next chapter builds it from raw text: the tokenizer.

Originally published on arcker.org, where the full series lives.

Great Stack to Doesn’t Work #9 — Distributed Tracing: “Why Does This Request Take 3 Seconds?”

Great Stack to Doesn’t Work #9

Distributed Tracing: “Why Does This Request Take 3 Seconds?”

A survival guide for when everything goes wrong in production.

A user clicks “Place Order.” The spinner spins. Three seconds pass. The order completes.

Three seconds. For a button click. The product manager asks: “Why does this take 3 seconds?” You check the API gateway. 50ms. You check the order service. 80ms. You check the payment service. 120ms. You check the inventory service. 60ms. The total is 310ms. Where’s the other 2,690ms?

It’s in the gaps. The network hops. The serialization. The queue wait times. The connection establishment. The TLS handshakes. The parts of the request lifecycle that no single service can see because they happen between services.

Distributed tracing makes the gaps visible.

The Mental Model: Traces, Spans, and Context

A trace is the complete journey of a request through your system. From the user’s browser click to the final database write and back. One trace, one request.

A span is a single operation within that trace. “Order service: validate order” is a span. “Payment service: charge card” is a span. “Database: INSERT into orders” is a span. Spans have a start time, duration, status, and parent span.

Spans nest. The “process order” span contains “validate order,” “check inventory,” “charge payment,” and “send confirmation” as child spans. Each child can have its own children. The full tree is the trace.

Trace context is the thread that connects spans across services. When Service A calls Service B, it passes a trace ID and a parent span ID in HTTP headers. Service B creates a new span with that trace ID and parent. Now both services’ spans are part of the same trace.

Without context propagation, each service creates an isolated trace. You can see what happened inside each service, but you can’t see the full request journey. The gaps between services — the 2,690ms — stay invisible.

OpenTelemetry: The Standard

OpenTelemetry (OTel) is the industry standard for instrumentation. It provides SDKs for every major language, a collector for receiving and routing telemetry data, and semantic conventions for consistent naming.

Auto-instrumentation covers the basics without code changes:

# Python: install the packages
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# Run with auto-instrumentation
opentelemetry-instrument 
    --service_name order-service 
    --traces_exporter otlp 
    --metrics_exporter otlp 
    --exporter_otlp_endpoint http://otel-collector:4317 
    python app.py

Auto-instrumentation hooks into HTTP frameworks, database drivers, and messaging libraries. It creates spans for incoming requests, outgoing HTTP calls, database queries, and message queue operations automatically.

Manual instrumentation adds business-specific spans:

from opentelemetry import trace

tracer = trace.get_tracer("order-service")

def process_order(order):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order.id)
        span.set_attribute("order.total", order.total)
        span.set_attribute("order.items_count", len(order.items))

        with tracer.start_as_current_span("validate_order"):
            validate(order)

        with tracer.start_as_current_span("check_inventory"):
            check_inventory(order.items)

        with tracer.start_as_current_span("charge_payment"):
            charge(order.payment_method, order.total)

The auto-instrumented spans tell you “the order service called the payment service.” The manual spans tell you “inside the order service, validation took 10ms, inventory check took 50ms, and the payment charge took 200ms.” Both are necessary for complete visibility.

Trace Context Propagation: W3C vs B3

When Service A calls Service B, the trace context travels in HTTP headers. Two standards dominate:

W3C Trace Context (the modern standard):

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: vendor=value

The traceparent header encodes: version, trace ID (32 hex chars), parent span ID (16 hex chars), and trace flags (sampled or not).

B3 (Zipkin’s original format):

X-B3-TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId: 00f067aa0ba902b7
X-B3-Sampled: 1

Or the compact single-header version:

b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1

If you’re starting fresh: use W3C. It’s the standard, it’s supported everywhere, and it’s what OpenTelemetry defaults to.

If you have existing Zipkin infrastructure: B3 works fine. OTel collectors can translate between formats.

The critical rule: every service in the request path must propagate context. If Service A → B → C → D, and Service C doesn’t propagate headers, the trace breaks at C. You’ll see A → B in one trace and D in a separate trace with no connection.

This is exactly how we lost 3 weeks debugging the “where’s the other 2 seconds?” problem.

Sampling: You Can’t Trace Everything

At 10,000 requests per second, tracing every request generates enormous amounts of data. A single trace might have 30 spans, each with attributes and events. At 10K rps, that’s 300K spans per second. Storing and indexing all of them is expensive and often unnecessary.

Head-based sampling decides at the start of the trace whether to record it. Simple and predictable.

# OTel Collector config
processors:
  probabilistic_sampler:
    sampling_percentage: 10  # Keep 10% of traces

The problem: you decide before knowing if the trace is interesting. A 10% sample rate means you’ll capture 10% of errors — but if errors are 0.1% of traffic, most sampled traces are successful requests you don’t care about.

Tail-based sampling decides after the trace completes. It can keep all error traces, all slow traces, and sample normal traces.

processors:
  tail_sampling:
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-requests
        type: latency
        latency: {threshold_ms: 1000}
      - name: normal
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

This keeps 100% of errors, 100% of requests over 1 second, and 5% of everything else. The interesting traces are always captured. The boring ones are sampled.

The trade-off: tail-based sampling requires buffering complete traces in memory before deciding. The OTel Collector needs enough memory to hold all in-flight traces. For high-throughput services, this can be significant.

Adaptive sampling adjusts the rate dynamically. Under normal conditions, sample 5%. When error rates spike, automatically increase to 50% or 100%. This captures detail when you need it and saves resources when you don’t.

Jaeger vs Tempo vs Zipkin: When to Use Which

Jaeger: The mature choice. Built by Uber, donated to CNCF. Strong UI for trace exploration. Supports Elasticsearch, Cassandra, and Kafka as storage backends. If you need a standalone tracing system with its own storage and UI, Jaeger is battle-tested.

Grafana Tempo: The cost-efficient choice. Stores traces in object storage (S3, GCS) without indexing. This makes it dramatically cheaper than Jaeger for high volumes — object storage costs pennies per GB. The trade-off: you can’t search traces by arbitrary attributes. You search by trace ID, service name, or through Grafana’s integration with logs and metrics (find the trace ID in a log, click through to the trace).

If you’re already in the Grafana ecosystem (Prometheus + Loki + Grafana), Tempo is the natural addition.

Zipkin: The original. Simple, lightweight, easy to deploy. Good for smaller setups. Less feature-rich than Jaeger but also less complex.

The decision: if you’re running Grafana, choose Tempo. If you need standalone trace search by attributes, choose Jaeger. If you want the simplest possible setup, choose Zipkin.

Full-Stack Correlation: The Power Move

The real value of distributed tracing isn’t seeing individual traces. It’s correlating traces with metrics and logs.

In Grafana, with Prometheus + Loki + Tempo:

  1. Dashboard shows a latency spike (Prometheus metric).
  2. Click on the spike → Grafana shows exemplar traces during that window (Prometheus exemplars link to Tempo trace IDs).
  3. Open the trace → See the full span tree. One span in the payment service took 2.4 seconds.
  4. Click on the slow span → Grafana links to Loki logs filtered by that trace ID and time window. The log shows: “connection timeout to payment provider, retry 3 of 3.”

From “something is slow” to “the payment provider is timing out” in 4 clicks. No grep. No manual log correlation. No guessing.

The prerequisites:

  • Metrics: Use exemplars to embed trace IDs in Prometheus metrics.
  • Logs: Include trace_id and span_id in every structured log line.
  • Traces: Use OpenTelemetry to generate spans with service.name and standard attributes.
  • Grafana: Configure data source correlations between Prometheus, Loki, and Tempo.

Span Attributes and Events: Making Traces Useful

A span that says “HTTP POST /api/orders 200 180ms” is useful. A span that says “HTTP POST /api/orders 200 180ms, order_id=12345, items=3, total=$299.97, customer_tier=premium, warehouse=us-east” is actionable.

Attributes are key-value pairs attached to spans:

span.set_attribute("order.id", order_id)
span.set_attribute("order.items_count", len(items))
span.set_attribute("customer.tier", customer.tier)
span.set_attribute("db.statement", "INSERT INTO orders...")

Events are timestamped messages within a span’s lifetime:

span.add_event("inventory_check_passed", {
    "warehouse": "us-east",
    "all_items_available": True
})
span.add_event("payment_initiated", {
    "provider": "stripe",
    "amount": 299.97
})

Attributes describe the span. Events describe what happened during the span. Both are searchable (if your backend supports it) and both make the difference between a trace you can look at and a trace you can learn from.

Semantic conventions: OpenTelemetry defines standard attribute names. Use them.

  • http.method, http.status_code, http.url
  • db.system, db.statement, db.operation
  • messaging.system, messaging.destination
  • rpc.system, rpc.method

Standard names mean your dashboards and alerts work across services without custom parsing.

War Story: The 450ms Across 7 Microservices

Checkout flow. User clicks “Pay.” Seven microservices involved: API Gateway → Order Service → Inventory Service → Pricing Service → Payment Service → Notification Service → Analytics Service.

Each service reported latency under 100ms. Total measured by the user: 3.2 seconds. Distributed tracing was deployed but nobody had looked at a full trace end-to-end.

The trace revealed:

  1. API Gateway → Order Service: 15ms network latency (normal).
  2. Order Service: 80ms internal processing. Then calls Inventory and Pricing sequentially. Not in parallel. Inventory: 90ms. Pricing: 70ms. Sequential total: 160ms wasted.
  3. Inventory Service → Database: 45ms. But the span showed 3 round trips: check stock, reserve stock, confirm reservation. Each was a separate database call with its own connection establishment. With connection pooling and a single transaction: 12ms.
  4. Order Service → Payment Service: 120ms. Normal. But the trace showed a 400ms gap between “inventory check complete” and “payment initiated.” The order service was logging — synchronously writing to a file on an NFS mount. 400ms for a log write.
  5. Payment Service → External Payment Provider: 800ms. Expected. External API, nothing to optimize.
  6. Payment Service → Notification Service: 200ms. But the notification was sent synchronously. The user waited for the email to queue before seeing “Order confirmed.”
  7. Analytics event: 150ms. Also synchronous.

Fixes:

  1. Parallelize Inventory and Pricing calls: saved 70ms.
  2. Connection pooling on Inventory’s database: saved 33ms.
  3. Async logging (switch from synchronous file write to async buffer): saved 400ms.
  4. Async notification (fire-and-forget to a message queue): saved 200ms.
  5. Async analytics (same pattern): saved 150ms.

Total saved: ~850ms. Plus the parallelization saved another 70ms. New checkout time: ~2.1 seconds. The 800ms payment provider call was the irreducible minimum.

None of this was visible without distributed tracing. Each service saw “I processed my part in under 100ms.” The trace showed “yes, but you waited 400ms for a log write and called two services sequentially that could have been parallel.”

War Story: The Trace Context Black Hole

A team deployed OpenTelemetry across 12 services. Traces looked great — for 11 of them. Service #7 (a legacy Java service running an older framework) didn’t propagate W3C trace headers. Every trace that passed through Service #7 broke into two fragments: spans before it and spans after it.

The team spent 3 weeks thinking their tracing setup was misconfigured. They rebuilt collectors, redeployed agents, checked network policies. The actual problem: Service #7’s HTTP client library was configured with a custom interceptor that stripped unknown headers. The traceparent header was being removed at the HTTP client level.

Fix: one line. Add traceparent and tracestate to the allowed headers list.

The lesson: trace context propagation is all-or-nothing. One service that doesn’t propagate breaks every trace that touches it. When deploying tracing, verify propagation at every service boundary, not just at the edges.

War Story: The 1% Sampling Regret

A high-traffic platform set sampling to 1% because storage was expensive. Normal operations: 1% sampling captured enough data for general analysis.

Then a subtle bug appeared. One in every 10,000 requests hit a code path that caused a 30-second timeout. Error rate: 0.01%. With 1% sampling and 0.01% error rate, the probability of capturing one of these traces was 0.0001%. They processed 1 million requests before capturing a single instance of the slow trace.

For 2 weeks, users complained about random timeouts. The team could see the error rate in metrics but had zero traces showing the actual failure path. They eventually found it by adding targeted debug logging to the suspected code path — the thing distributed tracing was supposed to eliminate.

After the incident, they switched to tail-based sampling: 100% of errors and slow requests, 1% of everything else. Storage costs increased 30%. Debugging time decreased by 90%.

Key Takeaways

Distributed tracing answers the question that logs and metrics can’t: “What happened to this specific request across all the services it touched?”

Context propagation is the foundation. If one service doesn’t propagate headers, the trace breaks. Verify propagation across every service boundary before trusting your traces.

Sampling strategy matters more than you think. Head-based sampling is simple but misses rare events. Tail-based sampling captures what matters but needs memory. Choose based on your traffic volume and your tolerance for missing interesting traces.

The biggest wins from tracing are always in the gaps: sequential calls that should be parallel, synchronous operations that should be async, and network overhead that shouldn’t exist. No single service can see these problems. The trace reveals them instantly.

Over to You

Have you found the ‘hidden gap’ in a request’s journey using distributed tracing? What was the surprise? And what sampling strategy do you use in production?

If you enjoyed this, I write about production engineering, AI systems, and the messy reality of building software at scale.

Follow me:

  • LinkedIn — Mehmet TURAÇ
  • X/Twitter — @TuracTheThinker

This is part of the **Great Stack to Doesn’t Work* series — a survival guide for when everything goes wrong in production. Follow the series to catch every episode.*

CQRS+ES: The Pubsub Bridge for Command Outcomes and Atomic Audit Logging

In the previous articles of this series, we covered the network and security layers of the authentication service: PKCS#12, timing oracle, mTLS, CRL. Let’s now dive into the application architecture. The service is built with CQRS + Event Sourcing, and two non-obvious patterns deserve an article.

The event-XOR-error invariant

In a well-disciplined CQRS/ES system, a command handler does exactly one thing: it emits an event OR returns an error. Never both. Never neither.

func (h *LoginHandler) Handle(cmd LoginCommand) ([]Event, error) {
    user, err := h.repo.Load(cmd.UserID)
    if err != nil {
        return nil, err // infra error = no event
    }

    if !user.VerifyPassword(cmd.Password) {
        return []Event{LoginFailed{UserID: cmd.UserID}}, nil // event, not error
    }

    return []Event{LoginSucceeded{UserID: cmd.UserID}}, nil // event, not error
}

This discipline is crucial: errors are infrastructure problems (DB down, network timeout). Business outcomes are events (login succeeded, login failed). Mixing the two breaks traceability and makes projectors unpredictable.

The problem: how to inform the caller?

The command handler emits a LoginSucceeded event. The event is persisted in the event store. A projector consumes it and updates the read model. All of this is asynchronous.

But the HTTP handler that dispatched the command needs a response now. The user is waiting. How do you tell them “your login succeeded, here’s your session cookie”?

The temptation: put the result in a typed error.

// DO NOT DO THIS
if user.VerifyPassword(cmd.Password) {
    return nil, &LoginResult{Success: true, SessionID: "abc"}
    // Breaks the invariant: it's a business result, not an infra error
}

This breaks the “error = infra” vs “business result = event” separation. Error middlewares, retry policies, circuit breakers — everything is calibrated on the assumption that error != nil means a problem, not a success.

The pubsub bridge pattern

The solution: the projector republishes the event on a pubsub channel with the command’s CorrelationID. The caller subscribes before dispatching the command, filters by CorrelationID, and translates the event into a return value.

func (h *HTTPLoginHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    cmd := LoginCommand{
        UserID:        extractUserID(r),
        Password:      r.FormValue("password"),
        CorrelationID: uuid.New().String(),
    }

    // Subscribe BEFORE dispatching the command
    sub := h.pubsub.Subscribe(cmd.CorrelationID)
    defer sub.Close()

    // Dispatch the command
    if err := h.bus.Dispatch(cmd); err != nil {
        http.Error(w, "Internal error", 500)
        return
    }

    // Wait for the event with timeout
    select {
    case event := <-sub.Events():
        switch e := event.(type) {
        case LoginSucceeded:
            setSessionCookie(w, e.SessionID)
            http.Redirect(w, r, "/dashboard", 302)
        case LoginFailed:
            renderLoginPage(w, "Invalid credentials")
        }
    case <-time.After(5 * time.Second):
        http.Error(w, "Timeout", 504)
    }
}

The full flow:

  1. HTTP handler generates a CorrelationID and subscribes to pubsub
  2. HTTP handler dispatches the command
  3. Command handler emits an event (LoginSucceeded or LoginFailed)
  4. Event is persisted in the event store
  5. Projector consumes the event, republishes it on pubsub with the CorrelationID
  6. HTTP handler receives the event via pubsub, translates to HTTP response

The event-XOR-error invariant is preserved. The caller gets a synchronous response. The projector remains the single point for event → side-effect transformation.

Atomic audit logging

Second pattern: audit logging. In a projector, you often do several things in response to an event: update the read model, write an audit entry, send a notification, sometimes trigger a logout.

The trap: if the business projection succeeds and the audit fails, you have a divergence. The user was connected, but the audit log doesn’t know. Or worse: the audit says “login at 14:03” but the session was created at 14:02 because the audit was retried.

The pattern: all side-effects in a single DB transaction, except the post-tx logout which is best-effort.

func (p *LoginProjector) Handle(event LoginSucceeded) error {
    tx, err := p.db.Begin()
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // 1. Update read model
    if err := p.updateSession(tx, event); err != nil {
        return err
    }

    // 2. Write audit entry - IN the same transaction
    if err := p.writeAuditEntry(tx, AuditEntry{
        Action:    "login_succeeded",
        UserID:    event.UserID,
        Timestamp: event.Timestamp,
        IP:        event.IP,
    }); err != nil {
        return err
    }

    // 3. Publish to pubsub bridge - IN the same transaction
    // (uses pg_notify or outbox pattern)
    if err := p.publishOutbox(tx, event); err != nil {
        return err
    }

    // Atomic commit: all or nothing
    if err := tx.Commit(); err != nil {
        return err
    }

    // 4. Best-effort: notify, cleanup, etc.
    // If this fails, the transaction is already committed
    go p.notifySecurityTeam(event)

    return nil
}

Login failures on unknown users: slog only

A corollary trap: login failures on users that don’t exist. If you write an audit entry for every attempt, a brute-forcer trying 10 million random emails produces 10 million rows in audit_events.

The rule: login failures on existing accounts deserve an audit (it’s a security signal). Login failures on non-existent accounts are noise — slog.Warn and that’s it. The per-IP rate limiter upstream limits the volume of logs themselves.

Conclusion

The pubsub bridge solves the “synchronous return in an asynchronous system” problem without breaking the event-XOR-error invariant. Atomic audit logging enforces “all or nothing” on critical side-effects.

Both patterns share a common thread: they impose discipline on transactional boundaries. In an event-sourced system, these boundaries are the only safety net between “the system is consistent” and “we no longer know what happened.”

The architecture is in place, security is audited. But how was the audit itself conducted? The methodology — iterative audit passes, self-generated false positives, and how to turn probes into regression tests — that’s the subject of the next article.