Uncategorized

The Hidden 43% — How Teams Are Wasting Almost Half Their LLM API Budget

You look at your provider dashboard and see one number: the total bill. It’s like getting an electricity bill that just says “$5,000” with no breakdown of whether it was the AC, the fridge, or someone leaving the lights on all month.

tbh, most AI startups are flying blind right now. We recently looked into the cost breakdown for several teams and found something crazy: almost 43% of LLM API spend is completely wasted. It’s not about paying for usage; it’s about paying for bad architecture.

Here’s where the leaks are actually happening:

  1. Retry Storms (34% of waste)
    Your agent fails to parse a JSON response, so it retries. And retries. Sometimes 5-10 times in a loop. You aren’t just paying for the failure, you are paying for the massive context window sent every single time.

  2. Duplicate Calls (85% of apps have this issue)
    Multiple users asking the exact same question, or internal systems running the same RAG pipeline on the same document. Without caching at the provider level, you’re paying OpenAI to generate the identical tokens twice.

  3. Context Bloat
    Sending the entire 50-page document history when the user just asked “what’s the summary of page 2”. RAG is great, but shoving everything into the prompt “just in case” is burning your runway.

  4. Wrong Model Selection
    Using GPT-4o or Claude 3 Opus for simple classification tasks when Haiku or GPT-3.5-turbo would do it for a fraction of the cost.

You can’t fix what you can’t see. That’s exactly why I built LLMeter (https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=hidden-43-percent-llm-waste). It’s an open-source dashboard that gives you per-customer and per-model cost tracking. Stop guessing who or what is draining your API budget.

Fwiw, just setting up basic budget alerts and seeing the breakdown by tenant usually drops a team’s bill by 20% in the first week. Give it a try, it’s open source (AGPL-3.0) and you can self-host or use the free tier.

Stop Making Your AI Agent Scrape the Web. There’s a Better Way.

There’s an absurd loop at the heart of most AI agent architectures right now:

  1. Agent needs data (a research paper, an FX rate, a flight status, a CVE)
  2. Agent calls a web scraper or fires an HTTP request to a public endpoint
  3. The endpoint returns HTML designed for a human to read in a browser
  4. Agent burns tokens parsing, cleaning, and extracting the actual value
  5. Agent retries when the scraper breaks because the page layout changed

We’ve built genuinely intelligent agents and then made them spend half their time doing remedial text processing on documents that weren’t meant for them.

Let me show you what the alternative looks like.

The Root Cause: Wrong Layer

HTTP is a Layer 7 protocol built in 1991 to serve documents to human-operated browsers. It’s brilliant at that. Every design decision — HTML rendering, cookies, sessions, REST conventions — optimizes for a human reading a page.

Agents don’t read pages. They consume structured data. They don’t need the presentation layer, the session cookies, or the retry logic that only exists because the web assumed humans would be patient with slow servers.

The right fix isn’t a better scraper. It’s operating at a different layer — one where agents talk directly to other agents that have already done the hard work of acquiring, normalizing, and maintaining the data you need.

What Specialized Data Agents Look Like in Practice

Pilot Protocol runs a network of ~163,000 agents. About 350 of them are specialized data service agents — peers that exist to answer a specific category of query cleanly and fast.

Here’s what a few of them replace:

Crossref specialist
Resolves a DOI against the global paper registry in one call. No scraping PubMed, no HTML parsing, no fighting rate limits. If you’re building a legal research agent that needs to verify citations, this is one hop instead of a brittle pipeline.

Historical FX specialist
Spot rate at an arbitrary timestamp. Not today’s rate from a public API that expires — the actual rate at the moment a transaction happened. Replaces three bank statement screenshots and a manual lookup.

Aviation weather specialist
Real-time METAR data for any airport. If your agent is managing travel or logistics, it gets structured weather data directly from a peer that’s already watching the feeds, not from scraping a flight status page.

crt.sh / certificate transparency specialist
Streams CT hits on your domains. Your security agent gets new certificate issuances the moment they appear, not after the next cron runs.

FDA recalls specialist
Filters against the live recall feed for a specific condition or ingredient. No crawling FDA’s website, no pagination, no HTML tables.

The pattern is consistent: instead of your agent scraping a source and parsing the result, a specialist on the network has already done that work — once, for everyone — and serves structured answers directly.

The Network Effect That Makes This Work

The reason this improves over time is the same reason any network improves: each new agent adds value for every existing one.

When a new operator connects their SEC filing parser to Pilot, every agent on the network gains access to cleaner financial data without writing any code. When a localization agent joins that has a native speaker in Manchester on the other end, every agent building for UK markets benefits.

Pilot calls this “a hive mind that gets smarter with every new agent.” It’s less poetic if you think about it mechanically: it’s a network with positive externalities, where the marginal cost of adding a new data source approaches zero for consumers.

Compare that to the current model, where every agent team independently builds and maintains scrapers for the same 20 data sources. The waste is staggering.

The Latency Numbers

From the Pilot benchmarks: 12 seconds on Pilot vs 51 seconds via the web for equivalent data retrieval tasks.

That’s not a small difference. It’s a 4x reduction in wall-clock time for the same result. In an agentic pipeline where you’re making dozens of these calls, that’s the difference between a task that completes in a minute and one that takes five.

The speed comes from two places:

  1. No parsing overhead — the data arrives structured, not as HTML you have to strip
  2. UDP transport — Pilot runs peer-to-peer over UDP with its own reliable-stream layer, avoiding the head-of-line blocking that makes TCP slow for parallel requests

Getting Your Agent Connected

# Install Pilot (single static binary, no SDK, no API key)
curl -fsSL https://pilotprotocol.network/install.sh | sh

# Start the daemon
pilotctl daemon start --hostname my-research-agent

# Your agent is now on the network
# Address: 0:A91F.0000.7C2E

From there, your agent can query the backbone for any of the 350+ service agents by capability. No URL directory to maintain, no API keys to manage per-service.

When You Still Need the Web

To be direct: Pilot doesn’t replace the web for everything. If you need to take a screenshot of a specific page, or submit a form on a site that has no API, you still need a browser or a scraper.

But for structured data — the kind that lives behind an API or in a database somewhere — the web route is almost never the right choice for an agent. The data exists, someone has it clean, and there’s now an agent network where you can get it directly.

The scraping loop is a workaround. The network is the fix.

Pilot Protocol: pilotprotocol.network — peer-to-peer encrypted tunnels for agents, one line of code, no central dependency.

TWD setup is now two Vite plugins and zero app code

Setting up TWD used to mean adding a block of dev-only code to your app’s entry file — a dynamic import for the runner, a test glob, a service-worker config, and a twd-relay browser client. It worked, but it never really belonged there.

With twd-js@1.8 and twd-relay@1.2, both packages ship Vite plugins. Setup is two entries in vite.config.ts and nothing in main.tsx.

The new setup

vite.config.ts:

import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
import { twd } from "twd-js/vite-plugin";
import { twdRemote } from "twd-relay/vite";

export default defineConfig({
    plugins: [
        react(),
        twd({
            testFilePattern: "/**/*.twd.test.ts",
            open: false,
            position: "right",
            search: true,
        }),
        twdRemote(),
    ],
});

main.tsx:

import React from "react";
import ReactDOM from "react-dom/client";
import { RouterProvider } from "react-router";
import { router } from "./routes/router";
import "./styles/index.css";

ReactDOM.createRoot(document.getElementById("root")!).render(
    <RouterProvider router={router} />,
);

That’s the whole setup. twd() owns the sidebar, glob discovery, and service-worker registration. twdRemote() attaches the relay to the Vite dev server and auto-injects the browser client into index.html. Both plugins use apply: 'serve', so production builds are untouched.

What it replaces

For comparison, here’s what a TWD entry file looked like a few weeks ago:

if (import.meta.env.DEV) {
    const { initTWD } = await import("twd-js/bundled");
    const tests = import.meta.glob("./**/*.twd.test.ts");
    initTWD(tests, {
        open: false,
        position: "right",
        serviceWorker: true,
        serviceWorkerUrl: "/mock-sw.js",
        search: true,
    });

    const { createBrowserClient } = await import("twd-relay/browser");
    const client = createBrowserClient({
        url: `${window.location.origin}/__twd/ws`,
    });
    client.connect();
}

Two top-level await imports, a glob, a service-worker URL that had to stay in sync with the runner, a WebSocket URL that had to match the relay path, and config repeating defaults. All of it dev-only, all of it sitting above ReactDOM.createRoot.

After the upgrade, that block is gone. No if (import.meta.env.DEV), no dynamic imports, no relay client. The dev-tooling story lives entirely in vite.config.ts.

Why it matters

One source of truth for the wiring. The serviceWorkerUrl, the SW served by the dev server, the WebSocket path used by the relay, and the path the browser client connects to were all strings in different files that had to agree. Now the plugins own them.

No top-level await for tooling. The await import("twd-js/bundled") was loading a chunk that had nothing to do with your app, before React was allowed to mount.

Tooling lives in tooling config. New developers reading main.tsx shouldn’t have to mentally if (import.meta.env.DEV)-out a quarter of the file to understand startup. The plugin model is what the rest of the Vite ecosystem already does — @vitejs/plugin-react, Tailwind, Tanstack Router devtools — and TWD now matches.

Non-Vite projects

Webpack, Angular CLI, Rollup, esbuild, Rspack — anywhere the Vite plugins don’t apply — keep the manual API. initTWD and createBrowserClient stay public exports forever. twdRemote({ autoConnect: false }) is also there as an escape hatch for Vite projects that want to wire the browser client by hand.

Try it

The runner is at https://twd.dev. Upgrade to twd-js@1.8 and twd-relay@1.2, drop the dev-only block from main.tsx, add the two plugins to vite.config.ts, and you’re done.

Por Qué Fallan los Agentes de IA: 3 Modos de Fallo Que Cuestan Tokens y Tiempo

Los agentes de IA no fallan como el software tradicional: no se bloquean con un stack trace. Fallan silenciosamente: devuelven respuestas incompletas, se congelan en APIs lentas o queman tokens llamando a la misma herramienta una y otra vez. El agente parece funcionar, pero la salida está mal, llega tarde o es costosa.

Esta serie cubre los tres modos de fallo más comunes con soluciones respaldadas por investigación. Cada técnica tiene una demostración ejecutable que mide la diferencia antes/después.

Código funcional: github.com/aws-samples/sample-why-agents-fail

Las demos usan Strands Agents con OpenAI (GPT-4o-mini). Los patrones son independientes del framework: aplican a LangGraph, AutoGen, CrewAI o cualquier framework que soporte llamadas a herramientas y hooks de ciclo de vida.

Esta Serie: 3 Soluciones Esenciales

  1. Desbordamiento de Ventana de Contexto — Patrón de Puntero de Memoria para datos grandes
  2. Herramientas MCP Que Nunca Responden — Patrón handleId asíncrono para APIs externas lentas
  3. Loops de Razonamiento en Agentes de IA — DebounceHook + estados claros de herramientas para bloquear llamadas repetidas

¿Qué Sucede Cuando las Salidas de Herramientas Desbordan la Ventana de Contexto?

El desbordamiento de ventana de contexto ocurre cuando una herramienta devuelve más datos de los que el LLM puede procesar: logs del servidor, resultados de bases de datos o contenidos de archivos que exceden el límite de tokens. El agente no falla con un error. Se degrada silenciosamente: trunca datos, pierde contexto o produce respuestas incompletas.

Una investigación de IBM cuantifica esto: un flujo de trabajo de Ciencia de Materiales consumió 20 millones de tokens y falló. El mismo flujo con punteros de memoria usó 1,234 tokens y tuvo éxito.

Comparación de un agente de IA sin Patrón de Puntero de Memoria versus con él, mostrando cómo los datos grandes permanecen fuera de la ventana de contexto

La solución — Patrón de Puntero de Memoria: Almacena datos grandes en agent.state, devuelve un puntero corto al contexto. La siguiente herramienta resuelve el puntero para acceder a los datos completos:

from strands import tool, ToolContext

@tool(context=True)
def fetch_application_logs(app_name: str, tool_context: ToolContext, hours: int = 24) -> str:
    """Obtiene logs. Almacena datos grandes como puntero para evitar desbordamiento de contexto."""
    logs = generate_logs(app_name, hours)  # Podría ser 200KB+

    if len(str(logs)) > 20_000:
        pointer = f"logs-{app_name}"
        tool_context.agent.state.set(pointer, logs)
        return f"Datos almacenados como puntero '{pointer}'. Usa herramientas de análisis para consultarlo."
    return str(logs)

@tool(context=True)
def analyze_error_patterns(data_pointer: str, tool_context: ToolContext) -> str:
    """Analiza errores — resuelve puntero desde agent.state."""
    data = tool_context.agent.state.get(data_pointer)
    errors = [e for e in data if e["level"] == "ERROR"]
    return f"Se encontraron {len(errors)} errores en {len(set(e['service'] for e in errors))} servicios"

El LLM nunca ve los 200KB: solo ve "Datos almacenados como puntero 'logs-payment-service'" (52 bytes).

¿Por qué Strands Agents? La API de ToolContext proporciona agent.state como un almacén clave-valor nativo con alcance para cada agente: sin diccionarios globales, sin infraestructura externa. Para flujos multi-agente, invocation_state comparte datos entre agentes en un Swarm con la misma API.

Métrica Sin punteros Con Punteros de Memoria
Datos en contexto 214KB (logs completos) 52 bytes (puntero)
Comportamiento del agente Trunca o falla Procesa todos los datos
Errores detectados Parcial Completo

Gráfico de barras mostrando uso de tokens en diferentes estrategias de gestión de contexto

Demo completa: 01-context-overflow-demo — implementaciones de agente único y multi-agente (Swarm) con notebooks.

¿Por Qué los Agentes de IA se Congelan al Llamar APIs Externas?

Los agentes de IA se congelan cuando las herramientas MCP llaman a APIs externas lentas o que no responden. El agente se bloquea en la llamada a la herramienta, el usuario no ve progreso, y después de 7 segundos muchas implementaciones devuelven un error 424. MCP (Model Context Protocol) les da a los agentes la capacidad de llamar herramientas externas, pero no maneja timeout o reintentos por defecto.

Llamada síncrona a herramienta MCP mostrando agente bloqueado mientras espera API lenta

La solución — Patrón handleId asíncrono: La herramienta devuelve inmediatamente un ID de trabajo. El agente consulta una herramienta separada check_status:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("timeout-demo")
JOBS = {}

@mcp.tool()
async def start_long_job(task: str) -> str:
    """Devuelve handle inmediatamente — previene timeout."""
    job_id = str(uuid.uuid4())[:8]
    JOBS[job_id] = {"status": "processing", "task": task}
    asyncio.create_task(_process_job(job_id))  # Trabajo en segundo plano
    return f"Trabajo iniciado. Handle: {job_id}. Usa check_job_status para consultar."

@mcp.tool()
async def check_job_status(job_id: str) -> str:
    """Consulta estado del trabajo — devuelve 'processing' o 'completed' con resultado."""
    job = JOBS.get(job_id)
    if not job:
        return f"FAILED: Trabajo '{job_id}' no encontrado"
    return f"{job['status'].upper()}: {job.get('result', 'Todavía procesando...')}"
Escenario Tiempo de respuesta UX
API rápida (1s) 3s total OK
API lenta (15s) 18s bloqueado Agente congelado
API fallida Error 424 después de 7s Agente falla
handleId asíncrono ~4s (inmediato + consulta) Agente responde

Visualización de línea de tiempo mostrando cuatro patrones de respuesta MCP

¿Por qué Strands Agents? El MCPClient se conecta a cualquier servidor MCP. El agente descubre herramientas en tiempo de ejecución vía list_tools_sync(): sin lista de herramientas codificada. Cuando el servidor MCP implementa el patrón asíncrono, el agente consulta automáticamente sin código de orquestación adicional.

Demo completa: 02-mcp-timeout-demo — servidor MCP local con los 4 escenarios y notebook.

¿Por Qué los Agentes de IA Repiten la Misma Llamada a Herramienta?

Los loops de razonamiento en agentes de IA ocurren cuando el agente llama a la misma herramienta repetidamente con parámetros idénticos, sin hacer progreso. La causa raíz es retroalimentación ambigua de la herramienta: respuestas como “puede haber más resultados disponibles” hacen que el agente piense que otra llamada producirá mejores resultados. Las investigaciones muestran que los agentes pueden hacer loops cientos de veces sin entregar una respuesta.

Diagrama mostrando cómo la retroalimentación ambigua de herramientas causa loops versus cómo estados claros y DebounceHook los previenen

Solución 1 — Estados terminales claros: Las herramientas devuelven SUCCESS o FAILED explícito en lugar de mensajes ambiguos:

# Ambiguo (causa loops)
return f"Vuelos encontrados: {results}. Puede haber más resultados disponibles."

# Claro (el agente se detiene)
return f"SUCCESS: Vuelo {conf_id} reservado para {passenger}. Confirmación enviada."

Solución 2 — DebounceHook: Detecta y bloquea llamadas duplicadas a herramientas a nivel de framework:

from strands.hooks.registry import HookProvider, HookRegistry
from strands.hooks.events import BeforeToolCallEvent

class DebounceHook(HookProvider):
    """Bloquea llamadas duplicadas a herramientas en una ventana deslizante."""
    def __init__(self, window_size=3):
        self.call_history = []
        self.window_size = window_size

    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(BeforeToolCallEvent, self.check_duplicate)

    def check_duplicate(self, event: BeforeToolCallEvent) -> None:
        key = (event.tool_use["name"], json.dumps(event.tool_use.get("input", {})))
        if self.call_history.count(key) >= 2:
            event.cancel_tool = f"BLOCKED: Llamada duplicada a {event.tool_use['name']}"
        self.call_history.append(key)
        self.call_history = self.call_history[-self.window_size:]
Estrategia Llamadas a herramientas Resultado
Retroalimentación ambigua (línea base) 14 llamadas Sin respuesta definitiva
DebounceHook 12 llamadas (2 bloqueadas) Completa con bloqueos
Estados SUCCESS claros 2 llamadas Completado inmediato

Gráfico de barras mostrando llamadas a herramientas en diferentes estrategias

¿Por qué Strands Agents? La API de HookProvider intercepta llamadas a herramientas vía BeforeToolCallEvent antes de que se ejecuten. Establecer event.cancel_tool bloquea la ejecución a nivel de framework: el LLM no puede omitirlo. Esto hace que los hooks sean componibles para apilar DebounceHook, LimitToolCounts y validadores personalizados en el mismo agente.

Demo completa: 03-reasoning-loops-demo — los 4 escenarios con hooks y notebook.

Requisitos Previos

Necesitas Python 3.9+, uv (un gestor de paquetes rápido de Python), y una clave API de OpenAI.

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail/stop-ai-agents-wasting-tokens

# Elige cualquier demo
cd 01-context-overflow-demo   # o 02-mcp-timeout-demo, 03-reasoning-loops-demo
uv venv && uv pip install -r requirements.txt
export OPENAI_API_KEY="tu-clave-aquí"

uv run python test_*.py

Cada demo es independiente con sus propias dependencias, script de prueba y notebook de Jupyter.

Preguntas Frecuentes

¿Cuáles son los modos de fallo más comunes en agentes de IA?

Los tres modos de fallo más comunes son el desbordamiento de ventana de contexto (la herramienta devuelve más datos de los que el LLM puede procesar), timeouts de herramientas MCP (APIs externas bloquean al agente indefinidamente) y loops de razonamiento (el agente repite la misma llamada a herramienta sin progresar). Cada modo de fallo causa desperdicio de tokens y degrada la calidad de respuesta.

¿Cómo reduzco los costos de tokens de un agente de IA?

Las dos técnicas más efectivas son los punteros de memoria y estados claros de herramientas. El Patrón de Puntero de Memoria almacena salidas grandes de herramientas en estado externo y pasa referencias cortas al contexto del LLM, reduciendo el uso de tokens de más de 200KB a menos de 100 bytes por llamada a herramienta. Estados terminales claros (SUCCESS/FAILED) en respuestas de herramientas previenen que el agente reintente operaciones completadas, lo que puede reducir las llamadas a herramientas de 14 a 2.

¿Puedo usar estos patrones con frameworks distintos a Strands Agents?

Sí. El Patrón de Puntero de Memoria funciona con cualquier framework que soporte contexto de herramientas (pasar estado entre herramientas). El patrón handleId asíncrono es un patrón de diseño de servidor MCP: funciona con cualquier agente compatible con MCP. DebounceHook requiere hooks de ciclo de vida, que están disponibles en LangGraph, AutoGen y CrewAI con APIs diferentes.

Referencias

Investigación

  • Solving Context Window Overflow in AI Agents — IBM Research, Nov 2025
  • Towards Effective GenAI Multi-Agent Collaboration — Amazon, Dec 2024
  • Resilient AI Agents With MCP — Octopus, May 2025
  • Language models can overthink — The Decoder, Jan 2025

Implementación

  • Strands Agent State — ToolContext and agent.state
  • Strands MCP Tools — Connect any MCP server
  • Strands Hooks — Lifecycle events and tool cancellation

¿Qué modo de fallo has encontrado en tus agentes? Comparte en los comentarios.

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

elizabethfuentes12 image

Elizabeth Fuentes LFollow

I help developers build production-ready AI applications through hands-on tutorials and open-source projects.

WordPress / WooCommerce Checkout Anti-Fraud — 9 Production-Tested Defenses (2026)

WordPress / WooCommerce Checkout Anti-Fraud — 9 Production-Tested Defenses (2026)

You wake up to a flurry of emails from your WooCommerce store. At first, it’s a rush—50 new orders overnight. Then you look closer. Every order is for a $1.99 digital download. The customer names are gibberish. The credit cards are all different, but the shipping addresses are identical and nonsensical. Half the payments failed. You’ve just been used for card testing.

This isn’t a sophisticated hack targeting a multinational corporation. It’s the bread-and-butter reality of running a small online store today. Fraudsters use small, independent sites like yours as a proving ground for stolen credit card numbers. For every successful fraudulent transaction, you lose the product, the revenue, and get hit with a $15-$25 chargeback fee from your payment processor. For every failed attempt, your payment processor’s risk algorithms start to look at you sideways.

If you’re losing a few hundred to a few thousand dollars a month to this digital shoplifting, you’re not alone. The good news is you don’t need an enterprise-level budget to fight back. This guide outlines a layered defense strategy, from free tools to affordable plugins, that can stop the majority of common checkout fraud before it costs you money. We’ll cover the tools, the logic, and when it makes financial sense to implement each layer.

The Indie Store Fraud Landscape in 2026

For a small WooCommerce store, fraud isn’t one single problem. It’s a collection of different attacks, each with its own pattern. If you’re using Stripe, you already have Stripe Radar, which is a good baseline. But determined fraudsters know how to work around it. Understanding the three most common types of fraud is the first step to building a better defense.

  • Card Testing (or “Carding”): This is the most common nuisance. Fraudsters buy lists of thousands of stolen credit card numbers on the dark web. They don’t know which ones are still active. So, they use bots to “test” the cards by making small purchases on hundreds of websites simultaneously. Your site is just one of many. They look for stores with low-priced items and weak security. The goal isn’t to get your product; it’s to find a valid card they can use for a much larger purchase elsewhere. For you, this means a flood of failed transactions, a handful of successful ones you’ll have to refund, and potential penalties from your payment gateway.
  • Reseller Fraud: This is more targeted. A fraudster uses a stolen card to buy a high-demand physical product from your store (e.g., a limited-edition pair of sneakers, a specific electronic component). They have the item shipped to a “mule” or a freight forwarder. They then sell your product on a marketplace like eBay or StockX for cash. Weeks later, the legitimate cardholder discovers the charge, initiates a chargeback, and you’re out the product and the money.
  • Refund Abuse (or “Friendly Fraud”): This one feels personal. A legitimate customer buys a product, receives it, and then falsely claims it never arrived, was defective, or that the charge was unauthorized. They file a chargeback to get their money back, effectively getting your product for free. This is especially common with digital goods where “delivery” is hard to prove, or with services where satisfaction is subjective.

Layer 1: Challenge the Bots at the Gate

Most low-level fraud, especially card testing, is automated. The first line of defense is to make it difficult for bots to even access your checkout page. A CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is the standard tool for this. But not all CAPTCHAs are created equal, and a bad user experience can cost you legitimate sales.

Here’s how the main contenders stack up for a WooCommerce checkout page in 2026.

Tool How It Works User Experience Cost Honest Limitations
Cloudflare Turnstile Analyzes browser telemetry and user behavior without a visual puzzle. It runs a quick, non-interactive check. Excellent. It’s invisible to most legitimate users. A loading spinner might appear for a second on high-risk connections. Free for most use cases. It’s a bot challenge, not a fraud analysis tool. It won’t stop a determined human using a stolen card. It only tells you if the visitor is likely a human.
Google reCAPTCHA v3 Runs in the background, analyzing user behavior across the site to generate a risk score (0.0 to 1.0). Good. It’s also invisible. You decide what to do with the score (e.g., block orders with a score below 0.3). Free for up to 1 million calls/month. The “black box” nature of the scoring can be frustrating. It sometimes gives low scores to legitimate users on VPNs or with privacy-focused browsers. It also sends a lot of data to Google, which is a privacy concern for some.
hCaptcha Often presents a visual puzzle (e.g., “click the boats”). It has a “passive” mode similar to Turnstile, but its main differentiator is the puzzle. Poor to Fair. The visual puzzles are a known conversion killer. They introduce friction and frustration right at the point of purchase. Free tier is available, but paid tiers offer more control and less complex puzzles. The free version can present users with difficult or annoying puzzles, leading to checkout abandonment. It’s generally overkill for checkout protection unless you are under a sustained, heavy bot attack.

Our recommendation: Start with Cloudflare Turnstile. It provides 80% of the benefit of a bot challenge with almost zero impact on legitimate customer conversions. It’s a simple, free, and effective first layer.

Layer 2: Basic Input Validation

Fraudsters are lazy. Their scripts often use nonsensical or disposable data. You can catch a surprising amount of fraud by simply checking if the information entered looks like it belongs to a real person.

Email Address Validation

Don’t just check if the email has an “@” symbol. Check for:

  • Disposable Domains: Services like mailinator.com or temp-mail.org are a huge red flag. A simple check against a public list of disposable domains can block many low-effort fraud attempts. The disposable-email-domains list on GitHub is a good resource.
  • Syntax and MX Records: A valid email address must have a real domain with mail exchange (MX) records. You can use a free API to verify this at checkout. This stops typos and gibberish like asdf@asdf.asdf.

Phone Number Validation

A phone number can be a strong indicator of legitimacy. Check if the number provided is valid for the country listed in the billing address. A US address with a phone number that has a Nigerian country code is suspicious. Services like Twilio’s Lookup API (paid) or free libraries can help with formatting and validation.

Address Validation (AVS)

Your payment processor already does this. Address Verification System (AVS) checks if the numeric parts of the billing address (street number and ZIP code) match the information on file with the card issuer. Make sure you have AVS enabled in your payment gateway settings and that you are configured to decline transactions that return a hard “no match.”

Layer 3: BIN/IIN and Country Mismatch

This is a classic, highly effective check. The first 6-8 digits of a credit card are the Bank Identification Number (BIN) or Issuer Identification Number (IIN). This number tells you which bank issued the card and in what country.

The logic is simple: Does the card’s issuing country match the customer’s IP address country and/or the billing address country?

A fraudster in Vietnam using a stolen card from a bank in Ohio is a common scenario. A simple check reveals this mismatch:

  • Card BIN: United States
  • Customer IP Address: Vietnam

This is a major red flag. While there are legitimate reasons for this (e.g., a US citizen traveling abroad), it’s a powerful signal for high-risk orders. You can use a free online tool like BIN List to look up BINs manually, or integrate their API (or a similar service) for automated checks.

Most dedicated anti-fraud plugins for WooCommerce perform this check automatically.

Layer 4: Smart Velocity Rules

Velocity rules limit how many times a certain action can be performed in a given timeframe. This is your primary weapon against card testing bots. Generic advice is to “use velocity rules,” but which ones actually work?

Here are some production-tested rules to implement either in a security plugin or with your developer:

  • Block IP after 5 failed payment attempts in 1 hour. A real customer might mistype their CVC once or twice. A bot will try dozens of cards from the same IP address.
  • Flag order for review if 1 IP address uses more than 3 different credit cards in 24 hours. This is a classic sign of card testing.
  • Flag order for review if 1 email address is associated with more than 3 different credit cards in its lifetime. Similar to the above, but catches fraudsters who switch IPs.
  • Flag order for review if there are more than 3 orders to the same shipping address with different billing addresses/cards in a week. This helps catch reseller fraud using mules.

The key is to set thresholds that stop bots without inconveniencing legitimate customers. These numbers are a good starting point; you can adjust them based on your store’s specific traffic patterns.

Layer 5: The 14-Day Hold for High-Risk Orders

Sometimes, an order isn’t obviously fraudulent, but it has multiple red flags. Maybe it’s a large order from a new customer, with a BIN/IP mismatch, shipping to a freight forwarder. Auto-blocking it might cost you a good sale. Allowing it might cost you a $1,000 chargeback.

The solution is an admin queue and a holding period.

Instead of processing the order immediately, you can programmatically place it in a special “On Hold for Review” status in WooCommerce. This does two things:

  1. It gives you, the store owner, time to manually review the order details. You can Google the address, check the customer’s email or social media, or even send a polite email asking for confirmation.
  2. It delays fulfillment. For physical goods, you don’t ship. For digital goods, you don’t grant access. A typical holding period is 14 days. This is often long enough for the legitimate cardholder to notice the fraud and report it, triggering a decline from the bank before you’ve lost any product.

This manual step is a core part of a robust defense. It’s the human check that catches what the algorithms miss. This is a central feature in our own GuardLabs Anti-Fraud service, as we’ve found it to be one of the most effective ways to prevent high-value losses.

Layer 6: Getting More out of Stripe Radar

If you use Stripe, you have Radar. For many, it’s a “set it and forget it” tool. But its real value for an established store lies in custom rules. Go to your Stripe Dashboard -> Radar -> Rules to start.

You can essentially replicate many of the checks mentioned above directly within Stripe. This is powerful because Stripe has access to data from its entire network. Here are three custom rules you should add today:

  1. Block payments where the card’s issuing country doesn’t match the IP address country and the order total is over $100.

    Rule: Block if :card_country: != :ip_country: AND :amount_in_usd: > 100

    This is the BIN/IP mismatch check. We add a value threshold to avoid blocking small, legitimate purchases from travelers.

  2. Place payments in review if the shipping address is a known freight forwarder and it’s the customer’s first transaction.

    Rule: Request manual review if :is_freight_forwarder_shipping: AND :card_past_transfers_count: == 0

    Stripe can identify many freight forwarders. This rule flags these orders for your review, which is crucial for preventing reseller fraud.

  3. Block payments from disposable email addresses.

    Stripe doesn’t have a simple rule primitive for this, but you can build a block list. Go to Radar -> Lists and create a new list of “email domains to block.” Populate it with common disposable domains (mailinator.com, 10minutemail.com, etc.). Then, create a rule:

    Rule: Block if @email_domain in @disposable_domains

Stripe Radar is a solid tool, but it’s not a complete solution. It works best when combined with on-site checks (like a bot challenge) and a clear process for handling flagged orders.

The Decision Tree: Block, Review, or Allow?

With all these layers, you need a clear system for making decisions. A simple risk score can help. Assign points for risky attributes and then act based on the total score.

Here’s a sample scoring system:

  • BIN country != IP country: +40 points
  • Email is from a disposable domain: +30 points
  • Shipping address is a known freight forwarder: +20 points
  • IP address is a known proxy or VPN: +15 points
  • Order value > $500 (or 3x your average): +10 points
  • More than 3 failed payments from IP in last hour: +50 points

Then, create your decision tree:

  • Score 70+: Auto-Block. The probability of fraud is too high. Block the transaction and, if possible, the IP address.
  • Score 30-69: Send to Manual Review. Place the order on hold. Delay fulfillment. Investigate the details. This is where the 14-day hold is your best friend.
  • Score 0-29: Auto-Allow. The order appears low-risk. Process it as normal.

A good WooCommerce anti-fraud plugin will do this scoring for you. If you’re building your own system, this logic is a solid foundation.

Cost vs. Benefit: When Does Each Layer Pay Off?

Implementing every layer might be overkill if you’re just starting out. Here’s a pragmatic guide to when each defense becomes worth the time or money, based on your Gross Merchandise Volume (GMV).

  • Under $5,000/month GMV: Your fraud losses are likely low.

    • What to do: Enable Stripe Radar’s default settings. Add the custom rules mentioned above (free). Install Cloudflare Turnstile on your checkout (free). This is your basic, no-cost setup.
  • $5,000 – $20,000/month GMV: You’re probably losing $100-$500/month to fraud and chargeback fees. It’s starting to hurt.

    • What to do: Add a dedicated anti-fraud plugin. This is where a service like the WooCommerce Anti-Fraud plugin or our own GuardLabs Anti-Fraud ($79/year) becomes a clear win. The cost is less than a few chargeback fees. These tools automate the BIN checks, velocity rules, and risk scoring.
  • $20,000 – $100,000/month GMV: Fraud is now a significant cost center. A 1% fraud rate could mean up to $1,000 in monthly losses, not including lost inventory.

    • What to do: Your system needs to be robust. You need all the automated checks, plus the manual review queue for high-risk orders. This is the sweet spot for a comprehensive solution that combines automated blocking with a manual hold-and-review process. You might also consider a paid service like IPQualityScore for more advanced proxy/VPN detection if you see a lot of sophisticated attacks.
  • Over $100,000/month GMV: At this scale, even a 0.5% fraud rate is a five-figure annual problem.

    • What to do: You need everything discussed here, and you likely have enough transaction volume to justify the cost of more advanced tools and potentially a part-time staff member dedicated to reviewing flagged orders. Your Website Care plan should include proactive monitoring of these systems.

Fighting checkout fraud isn’t about finding one magic bullet. It’s about building a series of layered, logical defenses that make your store a less attractive target than the one next door. By starting with free tools like Cloudflare Turnstile and Stripe Radar’s custom rules, and then adding more sophisticated checks as your store grows, you can significantly reduce your losses without frustrating legitimate customers or paying for enterprise software you don’t need.

If you’re tired of manually canceling bogus orders and want a system that implements most of these layers—from a non-annoying bot challenge to automated risk scoring and a manual review queue—out of the box, take a look at our service. The GuardLabs Anti-Fraud stack was built for small- to medium-sized WooCommerce stores facing exactly these problems, starting at $79/year.

Originally published at guardlabs.online. More tooling for indie builders & small agencies — guardlabs.online.

I was a half-builder

I was a half-builder

I have thirteen public repositories on GitHub.

Three of them are real products.

The rest are half-shipped: interesting starts, side-quests, idea-shaped objects with a README and a pushed_at date and not much past it. Universal-codemode: clean idea, two demos, no users I can name. Vasted: works on my machine, never advertised, never used by anyone who isn’t me. Smart-spawn: model router, never wired into anything I run daily. Mcclaw: Mac LLM checker, fun side build, abandoned at v0.2. Moltedin: a marketplace I sketched and walked away from. Lobster-tools. Tldr-club. Clawbot-blog.

I built fast. I shipped half. I posted screenshots.

That’s the dominant mode on AI-builder X right now and I want to write the post about it as someone caught inside it, not above it.

The Builder.ai version

The loud version of this is Builder.ai.

The pitch was an AI named Natasha that built apps from a single sentence. Microsoft believed it. SoftBank’s DeepCore believed it. The Qatar Investment Authority believed it. About $450M of capital believed it.

Behind the AI: 700 human engineers in India and Eastern Europe.

By 2024 the investigations had landed. Bloomberg. WSJ. The Information. By May 2025 the company was filing for insolvency, Microsoft and the creditors were inside the building, and “Builder.ai” had become culture-wide shorthand for AI-washing. Strap “AI” to a labor product, raise nine figures, ride the cycle until the cycle catches up.

That’s the loud version of the pattern.

Curtain pulled back on the AI

The quiet version is on your X feed every day, and it’s not committing fraud. It’s people shipping the half they can ship and calling it the whole. That’s what I’ve been doing.

What a half-builder actually is

Tighter than “doesn’t ship”:

A half-builder is an operator who can do exactly one half of design-to-deploy, then skips the other half by simply not showing it. They post the artifact for their good half. The bad half is implied to exist. It usually doesn’t.

There are three failure modes and I’ve personally lived all three.

The designer who can’t code. Posts the Figma. Posts the AI-generated mock. Posts the screenshot, the concept, the “what if I built this?” thread. Never posts the running URL. The “build” is a frame around an image. I did this for years before I learned to ship.

The coder who can’t design. Posts the diff. Posts the gist. Posts the prompt. The thing technically runs but you wouldn’t keep it open for more than a session. The interface is a textarea and a <details> tag in Helvetica. I’ve published a few of these too. I called them “tools.”

The either who can’t ship. The most common failure mode by an order of magnitude. They can do their half competently. They can’t deploy it, can’t keep it up, can’t onboard a single user, can’t reach week two. Six demos a month. Zero products. The artifact dies in a screenshot.

The third failure mode is the one I’ve spent the most time in. I’d build a thing in a weekend, push it to a public repo, post a screenshot, get a few likes, and move to the next thing on Monday. I called that “shipping.” It wasn’t. It was sketching in public.

In all three modes the AI is real. The thing posted is real. Something got built. What didn’t happen was building the whole thing. The half that wasn’t shown was fake, missing, on someone else’s calendar, or a TODO that never got picked up again.

That’s a half-builder.

Why half-building is the default

It’s not a personal failure. It’s the structure of the industry for twenty years.

Design and engineering have been culturally separated since the early-2000s web. You picked a side at 22. The side trained you. Designers learned visual systems, components, motion, brand. Engineers learned data structures, infra, deployment, latency budgets. The handoff was the deliverable. Each side optimized for being good at their half, because their half was the whole job.

AI is collapsing that gap.

Every tool that closes the design-to-code distance (Figma-to-code generators, coding assistants, no-code with escape hatches, full-stack agents) pays out to operators who hold both sides in one head. The premium isn’t on either half anymore. It’s on the seam.

Twenty years of single-side specialization don’t unwind in a hype cycle.

So the dominant cohort on AI-builder X is exactly who you’d expect. People whose career was built around being competent at one half. Learning AI in real time. Posting the half they can already do. Hoping the AI bridges the rest.

Sometimes it does. Most of the time it doesn’t. The shipped product never appears. The next thread does.

I’ve been on this side of the timeline for years. Designers who became “builders” the day GPT-4 dropped. Engineers who became “AI engineers” the day Cursor got good. I’m one of them. The honest answer is that AI made it embarrassingly easy to look like a whole-builder while staying a half-builder underneath.

Builder.ai was that, with a $450M check on top.

What I’ve actually shipped (and what I’ve half-shipped)

Here’s the honest receipts list. Not the highlight reel.

Real products people use:

  • Dory. Shared memory layer for AI agents. Local-first, markdown source of truth, CLI / HTTP / MCP native. Open-source on GitHub, has actual users, gets actual issues filed. This is the only one I’d call run-grade.
  • deeflect.com. Personal site, in production, anchors my entity online.
  • blog.deeflect.com. Thirty-one published articles. Some of them are good. Not all of them are from this year, that was overstated in earlier drafts of this essay.
  • dee.agency. Solo studio site, productized AI work.
  • Don’t Replace Me. Survival book on the AI apocalypse, paperback, hardcover, Kindle, on Amazon. Written end-to-end. People are reading it.
  • The SEO-to-GEO Gap. First research paper, accepted and posted on SSRN this month with a real DOI. First peer-review-adjacent credential I’ve ever earned.

Half-shipped:

  • ViBE. Twitter-based reception benchmark across 22 frontier AI model families, 2,965 judged mentions, $1.92 in judge cost. I love the writeup. I keep pitching the writeup. The benchmark itself is dogshit as a continuous product. It’s a one-shot artifact, not a living thing, and treating it like a flagship was me confusing “interesting research” for “shipped product.”
  • Universal-codemode. Two tools that replace hundreds. Clever. Not used.
  • Vasted. GPU-inference one-liner. Works. Unadopted.
  • Smart-spawn. Model router. Demo grade.
  • Castkit. CLI demo recorder in Rust. Cute. Sat down.
  • Mcclaw. Mac-LLM checker. Fun. Abandoned.
  • Moltedin / lobster-tools / tldr-club / clawbot-blog. Different shapes, same pattern. Started, posted, walked away.

The actual range underneath all of it:

Fifteen years of design. A cybersecurity bachelor. Firmware on ESP32 and marauder builds when the topic shifts. Designed for VALK across 70-plus financial institutions and 15 countries before walking out of that role earlier this year. Russian-born, lived across five-plus countries. ADHD wired enough to learn shit in a week and bored enough to walk away from it in a month.

The range is real. The shipping discipline isn’t there yet.

In October 2025 I burned out and quit X for six months from a 200K-impressions-a-day peak. I’m reactivating from 640 followers as I write this. The list above is what got built around the crash year: three real products, a book, a paper, a personal entity I can point to, and a graveyard of clever half-things.

That’s the honest picture. I’m a recovering half-builder.

The opposite cohort

The opposite of a half-builder is a whole-builder.

A whole-builder is one operator who covers design + code + AI + deploy + distribution end-to-end with no handoff. They pick fewer fights. They keep the artifacts alive past launch week. They have repos with users in the issue tracker, not just stars in the corner.

Pieter Levels is the canonical example. Design, code, deploy, distribute, monetize, all solo, all in public, receipts measured in MRR and screenshots. Marc Lou ships products with full visual identity attached. Theo runs an entire product line out of what he can hold in one head.

These aren’t unicorns. They’re the rarer category: operators who didn’t pick a side and built their working pattern around not having a handoff. They’re also the operators who said no to the next side-quest and kept the last one running.

I’ve copied the breadth half of that pattern. I haven’t copied the discipline half. Whole-building isn’t about doing more. It’s about doing fewer things further. That part I’m still learning.

How to spot a half-builder (mirror included)

Most “AI builders” on the internet right now are half-builders, and most of us know which side we’re on if we’re honest about it.

The test is mechanical. It costs nothing. Run it on every “AI builder” account in your timeline this week, and on yourself.

Ask for the running URL. Not the prompt. Not the screenshot. Not the demo video. The URL someone else can open right now, on their phone, with no auth, no waitlist. If they can’t produce one, you’re talking to half a builder.

Ask for the repo. Public repo, last commit recent enough to matter, an issue tracker that isn’t a ghost town. If “the code is private”, fine. Ask for the deployed product. If neither exists, you have your answer.

Ask what they shipped this month. Not last year. Not “in their career.” This month. Half-builders ship demos. Whole-builders ship products that someone else is using on a Tuesday morning.

If you ran that on me a month ago, you’d hear about ViBE and a clever Rust thing and a model router and a half-finished benchmark and a launch I almost did. You’d hear about everything except a product someone else opened on a Tuesday. The honest answer would have been Dory, and maybe the blog, and the rest is noise.

Show the repo or sit down, including the one I’m pointing back at when I write that.

Bouncer at the door asking for the running URL

Stopping

The exit from being a half-builder is mechanical, not mystical.

Pick the half you can’t do and start doing it badly until you can do it. Designers shipping their first deploy. Coders learning visual hierarchy. Either learning distribution. The half you can’t do isn’t a personality. It’s a backlog.

Pick fewer things. Keep them alive past the first week. Treat “shipped” as “someone else used it on a Tuesday,” not “pushed to GitHub on a Sunday.”

Whole-building is a slow accumulation of the second half by the first, until the seam disappears. None of that happens in a single weekend.

This essay is the first move. The next moves are: Dory gets the maintenance it deserves. ViBE either becomes a continuously-updating thing or gets retired honestly as a one-shot paper, not pretended into a flagship. The agency stops being a placeholder. The next side-quest waits its turn, or doesn’t get started.

I’m writing this with the same uncertainty most of you feel scrolling past it. Am I the half-builder? Probably. What does the turn look like? Like this.

Build the whole thing.

Ship the running URL.

Show the repo.

Or sit down, including me.

That’s the post.

Sources for the Builder.ai facts: Bloomberg’s investigation into the company’s engineering operations (2024), the Wall Street Journal’s coverage of the May 2025 insolvency, and *The Information‘s reporting on the human-engineer back-end. Public, well-indexed; current URLs available via search.*

I Built a Free Firefox New Tab Extension with Live Weather and World Clocks

I spent a few weekends building a Firefox browser extension because I was tired of my new tab page doing absolutely nothing useful.

The result: Weather & Clock Dashboard — a replacement new tab that shows live weather, a 3-day forecast, and clocks for any cities you care about.

What it does

  • Live weather: Current conditions with temperature, humidity, and feels-like for your location
  • 3-day forecast: See what’s coming so you can actually plan your day
  • World clocks: Multiple cities displayed in real time — great for remote teams across time zones
  • Search bar: Quick search without switching tabs
  • Dark/light mode: Respects your preference, toggles with one click

Why I built it

I was using Firefox’s default new tab (tiles of recent sites). It told me nothing useful at a glance.

I wanted something that answered “should I bring an umbrella?” and “is my colleague in London even awake yet?” in under a second, without switching apps.

The tech (refreshingly simple)

  • Pure HTML, CSS, and vanilla JavaScript — no framework, no npm, no webpack
  • Uses Open-Meteo for weather (free API, no key required)
  • All data stays local — no servers, no accounts, no tracking
  • MIT licensed and fully open source

The entire extension is about 300 lines of JavaScript. Sometimes the best solution is the simplest one.

Install it

→ Weather & Clock Dashboard on Firefox Add-ons

Free, takes 10 seconds to install, no account required.

Also: Quick Calculator

I also published Quick Calculator & Unit Converter — a sidebar calculator that handles unit conversions (km ↔ miles, Celsius ↔ Fahrenheit, etc.). Same approach: useful, fast, zero setup.

Happy to take questions or feedback. What does your current new tab setup look like?

The MPS 2026.1 Early Access Program Has Started

The MPS 2026.1 Early Access Program (EAP) is kicking off today. Download the first 2026.1 EAP release and give it a try!

DOWNLOAD MPS 2026.1 EAP

Along with numerous bug fixes, this build introduces several key improvements.

Migration to IntelliJ Platform 2026.1, JDK 25, and Kotlin 2.3

This MPS 2026.1 EAP build completes the jump to the current generation of the IntelliJ Platform. The runtime is JDK 25, and the embedded Kotlin version is 2.3.0. Additionally, MPS now builds and ships its own kotlinx-metadata-klib / kotlin-metadata-jvm artifacts from the Kotlin repository at the matching 2.3.0 tag, restoring the KLib-based Kotlin stubs support that the last public kotlinx-metadata-klib:0.0.6 could no longer provide.

Ability to check ICheckedNamePolicy against specific natural languages

MPS now uses the IntelliJ Platform’s natural language support, provided by Grazie. This means you can check whether string values in instances of ICheckedNamePolicy, such as intentions, actions, or tools, have proper capitalization according to the rules of a specific natural language.
An incorrectly capitalized text caption
Thanks to this change, you can install natural language support for select languages into MPS, and the IDE will detect the language used in strings and verify that individual words are capitalized correctly. You can also bypass the language detection mechanism and specify your desired language explicitly.

In addition to the default Title-case capitalization rules, MPS offers three other options:

  • Sentence-case, which follows the IntelliJ Platform’s rules
  • Inherited, which uses the capitalization rules of the closest ancestor ICheckedNamePolicy
  • No capitalization rules

Binary operations can be split into multiple lines

In the editor, you can now split long lines with binary operations. A dedicated intention action lets you toggle between the single-line and multi-line layouts for a given BinaryOperation.
A long binary expression split on several lines

New boolean editor style: read-only-inspector

The new read-only-inspector style enforces the read-only property on all editor cells in the inspector. When this style is applied to a cell in the main editor, the inspector becomes read-only for the inspected node when the cell with this style is selected. The new style has the following properties:

  • It is disabled by default.
  • The style is inheritable and overridable, just like the read-only style.
  • It has no effect on main editor cells.
  • The read-only style set by this mechanism can be overridden in any cell farther down the inspector editor cell tree.

Transitive dependencies in Build Language

Build Language no longer requires every transitively-reachable build script to be listed in dependencies. This means that a build script, BuildA, that depends on BuildB can now reach BuildC through BuildB (provided that BuildB depends on BuildC) without having to list BuildC explicitly. The generator emits ${artifacts.BuildC} Ant properties for such cases, and these properties can be supplied from the outer build tool (Gradle, Maven, etc.).

This lets you split large builds into smaller ones without forcing every user to update the dependency lists. For example, a single platform build script can wrap a growing set of external libraries used across sub-projects.

More reliable migrations via recorded dependencies

Migration code previously decided which migrations to apply based on the actual module dependencies and used languages collected at migration time, but it would read versions from the dependency snapshot recorded in the module descriptor. That mismatch could cause migrations to use a different view of the world than the one the module was last modified against.

In this 2026.1 EAP build, the migration machinery consistently uses the dependencies and used languages recorded in the module descriptor at the moment of last modification, not the currently observable state. The migration checker was refactored accordingly. It now reuses information already collected for the migration process instead of recomputing it on demand.

Improved Java stubs

A cluster of long-standing Java-stubs bugs has been fixed, visibly improving the accuracy of BaseLanguage stubs produced for imported .jar files and Java Sources model roots:

  • MPS-33174 – Classes with InnerClasses attributes are now correctly transformed to BaseLanguage stubs (open since 2021). The signature’s inner-class information and parameterized owner types are preserved, so fields and methods of inner classes of generic outer classes now show the proper type instead of collapsing to the outer class.
  • MPS-39375 – Type variables in generic methods of inner classes are now handled, so methods referencing type variables of the outer class no longer show java.lang.Object in place of the real type variable.
  • MPS-39007 – The spurious Java imports annotation is present error no longer appears on every root of a Java source stub model.
  • MPS-39565 – Java source stub roots no longer disappear on changes to the containing module’s properties, so references from project code to those roots stay intact when module properties are changed.

Modernized project lifecycle

With MPSProject having moved from a legacy IntelliJ IDEA ProjectComponent to a project service, MPS-aware features need a reliable way to be notified about MPSProject becoming available and going away.

This build introduces a dedicated mechanism for managing MPSProject startup and shutdown activities, giving MPS control over the sequencing, grouping, ordering, and threading of those activities. This was something the platform’s ProjectActivity / MPSProjectActivity could not offer.

How it works: Implementors register against the jetbrains.mps.project.lifecycleListener extension point (declared in MPSCore.xml) via a ProjectLifecycleListener.Bean with a listenerClass and an optional integer priority. The LifecycleEventDispatch.java inside MPSProject can fire:

  • projectReady (non-blocking)
  • projectDiscarded (blocking)
  • asyncProjectClosed (non-blocking)

Wayland by default

MPS now offers Wayland as the default display protocol on supported Linux systems. When running in a Wayland-capable environment, MPS automatically switches to a native Wayland backend instead of relying on X11 compatibility layers, bringing it in line with modern Linux desktop standards.

This transition improves overall integration with the system, providing better stability across Wayland compositors, proper support for input methods and drag-and-drop, and more consistent rendering – especially on HiDPI and fractional scaling setups. While the user experience remains largely familiar, some differences (such as window positioning or decorations) may be noticeable due to Wayland’s architecture. X11 is still fully supported and can be used as a fallback when needed, ensuring compatibility across all Linux environments.

You can review the complete list of fixed issues here.

Your JetBrains MPS team

Docker 27.0 vs Podman 5.0 for Rootless Containers: 500 Enterprise Adoption Survey Finds 27% Fewer Security Vulnerabilities

Docker 27.0 vs Podman 5.0 for Rootless Containers: 500 Enterprise Adoption Survey Finds 27% Fewer Security Vulnerabilities

A new comprehensive survey of 500 enterprise IT and DevOps teams sheds light on the security and adoption trends for rootless container runtimes, with Podman 5.0 outperforming Docker 27.0 in vulnerability reduction by a significant margin.

Key Survey Methodology and Findings

The 2024 Enterprise Container Security Survey polled 500 organizations across North America, Europe, and Asia-Pacific, with 78% of respondents running production workloads in rootless mode. The core finding: environments using Podman 5.0 for rootless containers reported 27% fewer critical and high-severity security vulnerabilities over a 12-month period compared to peers using Docker 27.0.

Additional findings include:

  • 62% of Podman 5.0 adopters cited built-in rootless support as their primary selection criteria, versus 41% for Docker 27.0 users.
  • Podman 5.0 users reported 19% faster mean time to patch (MTTP) for container runtime vulnerabilities.
  • Docker 27.0 retained higher overall market share (58% vs 32% for Podman) but trailed in rootless-specific satisfaction scores (4.1/5 vs 4.7/5 for Podman).

What Are Rootless Containers?

Rootless containers run without elevated root privileges on the host system, using user namespaces to map container UIDs/GIDs to unprivileged host users. This eliminates the risk of container breakout granting full root access to the host, a long-standing concern for privileged container deployments. Both Docker and Podman have added rootless support in recent releases, but their implementation differs fundamentally.

Docker 27.0 Rootless Implementation

Docker 27.0 introduced improved rootless mode stability, building on the experimental rootless support added in Docker 19.03. It relies on the rootlesskit utility to set up user namespaces and manage network interfaces, with support for overlay2 and vfs storage drivers in rootless mode. Key limitations noted in the survey include:

  • Dependency on external tools like slirp4netns for network isolation, which introduces minor performance overhead.
  • Limited support for privileged container operations in rootless mode, requiring workarounds for legacy workloads.
  • Docker daemon still runs as a background process, creating a larger attack surface than Podman’s daemonless architecture.

Podman 5.0 Rootless Implementation

Podman was designed as a daemonless, rootless-first container engine from its inception, with Podman 5.0 refining its rootless capabilities with improved user namespace handling and native support for rootless overlay2 storage without third-party utilities. Survey respondents highlighted these advantages:

  • Daemonless architecture eliminates a single point of failure and reduces attack surface, as no privileged process runs persistently.
  • Native integration with systemd for rootless container management, simplifying automation for enterprise workloads.
  • Full compatibility with Docker CLI commands, reducing migration friction for teams switching from Docker.

Why the 27% Vulnerability Gap?

Security researchers and survey respondents pointed to three core factors driving Podman 5.0’s lower vulnerability rate:

  1. Daemonless Design: Docker’s persistent daemon requires root privileges (even in rootless mode, the daemon runs with elevated capabilities), while Podman runs as the unprivileged user launching the container, removing a common attack vector.
  2. Fewer Dependencies: Podman 5.0’s rootless mode requires no external utilities beyond the kernel’s user namespace support, while Docker 27.0 relies on rootlesskit, slirp4netns, and other third-party tools that have historically had their own vulnerabilities.
  3. Stricter Default Policies: Podman 5.0 enforces stricter default seccomp and AppArmor profiles for rootless containers, while Docker 27.0’s default policies are more permissive to maintain backward compatibility.

Enterprise Adoption Trends

Despite Docker’s larger market share, Podman adoption grew 41% year-over-year among enterprises running rootless workloads, per the survey. Key drivers include:

  • Regulatory compliance requirements (e.g., PCI-DSS, HIPAA) that mandate least-privilege container deployments.
  • Integration with Red Hat OpenShift and other Kubernetes distributions that prioritize rootless runtimes.
  • Lower long-term maintenance costs, as Podman’s daemonless architecture reduces patching overhead.

Docker 27.0 remains the preferred choice for teams with legacy Docker-dependent workflows, with 68% of Docker users citing ecosystem familiarity as their primary retention factor.

Migration Considerations for Enterprises

For teams considering switching from Docker 27.0 to Podman 5.0 for rootless workloads, the survey recommends:

  • Validating compatibility with existing CI/CD pipelines, as Podman’s Docker-compatible CLI minimizes but does not eliminate workflow changes.
  • Testing rootless overlay2 performance for high-throughput workloads, as Podman 5.0’s native implementation offers better throughput than Docker’s rootlesskit-backed storage.
  • Leveraging Podman’s podman-compose tool to replace Docker Compose with minimal rework.

Conclusion

The 500-enterprise survey confirms Podman 5.0’s edge in rootless container security, with 27% fewer vulnerabilities driven by its daemonless, rootless-first design. While Docker 27.0 retains broader ecosystem support, enterprises prioritizing security for rootless workloads are increasingly shifting to Podman. As container security regulations tighten, the gap between the two runtimes’ security postures is likely to drive further Podman adoption in 2024 and beyond.

How to Make Your Website AI-Agent Readable in 2026 (llms.txt, MCP Cards, Structured Data)

How to Make Your Website AI-Agent Readable in 2026 (llms.txt, MCP Cards, Structured Data)

You ask Perplexity a question about your niche industry. It gives a clean, well-sourced answer, citing three of your competitors. Your site, which has a definitive guide on the exact topic, is nowhere to be seen. You try again with ChatGPT, then Claude. Same result. It feels like being invisible.

This isn’t a failure of traditional SEO. Your rankings on Google might be fine. This is a new problem: your website isn’t “agent-readable.” The large language models (LLMs) that power these AI agents are increasingly the first stop for users seeking information. If they can’t parse, understand, and trust your content, you don’t exist in this new ecosystem. Getting cited by an AI is becoming the new “page one” ranking.

This guide isn’t about “using AI for SEO” fluff. It’s a technical, practical manual for founders and operators who manage their own websites. We’ll cover the specific file formats, server configurations, and data structures that AI crawlers from OpenAI, Anthropic, Google, and others are looking for right now. This is how you get your data out of your website and into their answers.

Why Agent-Readiness Is the New SEO

For two decades, SEO was about signaling relevance to algorithms like Google’s PageRank. Now, we must also signal authority and structure to language models. The goal is different. Instead of just a click, you’re aiming to become a citable source in a generated answer. This is a higher bar.

If you check your server logs today, you’ll likely find that traffic from known AI crawlers (like GPTBot, ClaudeBot, and PerplexityBot) already makes up a small but growing slice of your traffic. For many sites, this is already in the 1-3% range and is expected to increase significantly. This is the data-gathering phase. The models are actively ingesting the web to train future versions. Being accessible now means you’re part of that foundational knowledge.

Traditional SEO focuses on user intent leading to a click. Agent-readiness focuses on machine-readable data that allows an AI to satisfy user intent directly, with your site as a trusted source. The two are not mutually exclusive, but they require different tactics. A keyword-optimized blog post is great for Google Search. A well-structured page with clear JSON-LD, a permissive robots.txt, and maybe even an llms.txt file is what gets you cited by an AI agent.

The llms.txt Specification: A User Manual for Your Site

The llms.txt file is a proposal, primarily championed by Anthropic (the makers of Claude), for a standardized way to give instructions to AI models about your site. Think of it as a robots.txt but for usage policy instead of crawling access. It tells models how they are permitted to use your content in their training and output.

What It Is and Where to Put It

An llms.txt file is a plain text file placed in the /.well-known/ directory of your website. The full path should be https://yourdomain.com/.well-known/llms.txt.

The file uses a simple field: value format. The key fields currently proposed are:

  • User-Agent: Specifies which bot the rules apply to. A * applies to all bots. You can also target specific bots like ClaudeBot.
  • Allow: Specifies directories or pages that are explicitly permitted for use in training generative models.
  • Disallow: Specifies directories or pages that are forbidden from being used for training.
  • Allow-Citing: A proposed field to explicitly permit the model to cite your content.

A Practical llms.txt Example

Here’s a configuration that allows all bots to use most of the site for training, disallows a private /members/ area, and explicitly allows citing from the /articles/ directory.

# Default policy for all LLM agents
User-Agent: *
Disallow: /members/
Disallow: /private-data/

# Allow all bots to cite our public articles
User-Agent: *
Allow-Citing: /articles/

# Specific rules for ClaudeBot, if needed
User-Agent: ClaudeBot
Allow: /

Pros and Cons of llms.txt

  • Pro: It provides a clear, machine-readable way to state your usage terms. This is much better than burying it in a human-readable “Terms of Service” page that no crawler will ever parse.
  • Pro: It’s forward-looking. Adopting it now signals that you’re an engaged, technically savvy publisher.
  • Con: It’s still a proposal. There is no guarantee all major AI companies will honor it. OpenAI, for example, currently relies on robots.txt. It’s a bet on a future standard.
  • Con: It adds another configuration file to maintain. For most small sites, a simple, permissive file is a set-and-forget task.

JSON-LD: Spoon-Feeding Structured Data to Machines

If you want an AI to understand the meaning of your content, you need to tell it what it’s looking at. Is this page a product, an article, or a how-to guide? JSON-LD is a way to embed this structured data directly in your HTML, using the vocabulary from Schema.org.

AI agents, especially those focused on shopping or step-by-step instructions, actively look for this data. It’s the difference between them trying to guess your product’s price and you telling them directly: "price": "240". You should add the JSON-LD script tag within the `

` of your HTML. For most platforms (like WordPress with a plugin), this is handled for you once configured.

Key Schemas AI Agents Actually Use

Don’t try to implement every schema. Focus on the ones that map to your content and are most valuable to AI agents.

  • Article: Essential for any blog post or publication. It clearly defines the author, publication date, headline, and body. This helps agents attribute content correctly.

    <br>
    {<br>
    &quot;@context&quot;: &quot;<a href=”https://schema.org”>https://schema.org</a>&quot;,<br>
    &quot;@type&quot;: &quot;Article&quot;,<br>
    &quot;headline&quot;: &quot;How to Make Your Website AI-Agent Readable&quot;,<br>
    &quot;author&quot;: {<br>
    &quot;@type&quot;: &quot;Organization&quot;,<br>
    &quot;name&quot;: &quot;GuardLabs&quot;<br>
    },<br>
    &quot;datePublished&quot;: &quot;2024-05-21&quot;<br>
    }<br>

  • Product: If you sell anything, this is non-negotiable. It allows agents to pull product names, descriptions, pricing, availability, and reviews into comparison models. This is how you show up in “what’s the best tool for X” queries. Our own Website Care plan could be marked up this way.

    <br>
    {<br>
    &quot;@context&quot;: &quot;<a href=”https://schema.org”>https://schema.org</a>&quot;,<br>
    &quot;@type&quot;: &quot;Product&quot;,<br>
    &quot;name&quot;: &quot;Website Care Plan&quot;,<br>
    &quot;image&quot;: &quot;<a href=”https://guardlabs.online/images/care-icon.png”>https://guardlabs.online/images/care-icon.png</a>&quot;,<br>
    &quot;description&quot;: &quot;Annual website maintenance and support.&quot;,<br>
    &quot;offers&quot;: {<br>
    &quot;@type&quot;: &quot;Offer&quot;,<br>
    &quot;priceCurrency&quot;: &quot;USD&quot;,<br>
    &quot;price&quot;: &quot;240.00&quot;<br>
    }<br>
    }<br>

  • FAQPage: If you have a FAQ, mark it up. AI agents love FAQs because they are pre-packaged question-answer pairs. This makes it trivial for them to use your content to answer a user’s question directly.

  • HowTo: For step-by-step guides, this schema is perfect. It breaks down the process into discrete steps, which an agent can then re-format and present to a user.

The main limitation of JSON-LD is that it’s only as good as the data you provide. If your schema is incomplete or inaccurate (e.g., the price on the page doesn’t match the price in the JSON-LD), it can confuse bots or cause them to distrust your site.

MCP Cards: A Business Card for Your Server

The Machine-readable Citable Page (MCP) protocol is a newer, more experimental concept. The idea is simple: what if, alongside your human-readable webpage, you provided a simple, structured JSON file that contained all the key citable information? This is an MCP “card.”

An AI agent could fetch https://yourdomain.com/my-article.mcp.json to get the core facts of your article without having to parse HTML, ads, and navigation menus. This makes their job easier and your data cleaner.

When and How to Publish an MCP Card

You don’t need an MCP card for every page. It’s most useful for data-rich, citable content like reports, product pages, or reference guides.

To implement it, you create a static JSON file that follows the MCP spec and host it at a predictable URL. A common convention is to append .mcp.json to the original URL. You then link to it from your HTML page using a tag in the `

Company

Purpose

Honors `robots.txt`?

GPTBot

OpenAI

Crawls web data to improve future ChatGPT models.

Yes

ClaudeBot

Anthropic

Used for training Claude models.

Yes

PerplexityBot

Perplexity AI

Crawls the web to find answers for Perplexity’s conversational search engine.

Yes

Google-Extended

Google

A separate crawler Google uses to improve Bard/Gemini. Opting out here does not affect Google Search.

Yes

CCBot

Common Crawl

Not a company, but a non-profit that crawls and archives the web. Its data is widely used to train many open-source and commercial LLMs.

Yes

Example `robots.txt` for AI Readiness

A sensible default for most businesses is to allow these bots. If you don’t have a `robots.txt` file, create one in the root of your domain. Here is a permissive example:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# You might want to disallow CCBot if you are concerned about
# your content being in a public dataset forever.
User-agent: CCBot
Disallow: /

# Keep your existing rules for other bots
User-agent: *
Disallow: /admin
Disallow: /private/

The only real “con” to allowing these bots is that they use bandwidth. However, their crawl rate is typically low and shouldn’t impact performance for most sites. The bigger risk is being left out by disallowing them.

How to Verify: Are the Bots Actually Reading You?

How do you know if any of this is working? You can’t just ask ChatGPT “did you read my site?” Instead, you need to test from the agent’s perspective.

  1. Check Server Logs: This is the ground truth. Filter your server’s access logs for the user agents listed in the table above (e.g., `grep “GPTBot” /var/log/nginx/access.log). If you see entries with a 200 OKstatus code, you know they are successfully crawling your pages. If you see 403 Forbiddenor 503 Service Unavailable`, you have a problem.

  2. Use `curl` to Impersonate a Bot: You can simulate a request from an AI crawler using the command-line tool `curl`. This is great for debugging firewall or CDN issues.

    curl -A "GPTBot" -I https://yourdomain.com/my-article

    The `-Aflag sets the User-Agent string. The -Iflag just fetches the headers. If you get a HTTP/2 200response, the bot can access your site. If you get a 403` or are presented with a CAPTCHA, your security settings are blocking it.

  3. Prompt Engineering for Citation: After you’ve confirmed the bots are crawling your site and you’ve given them a few weeks to ingest the data, you can test for citation. The trick is to ask a question where your site is a uniquely authoritative source. Don’t ask “what is a website care plan?” Ask something specific that only your content answers well, like: “According to guardlabs.online, what is included in their Website Care plan?” This forces the model to check its specific knowledge of your domain.

Common Mistakes That Make You Invisible to AI

Many well-intentioned sites accidentally block AI agents or make their content impossible to parse.

  • Overzealous Cloudflare Rules: The “Bot Fight Mode” or aggressive “Super Bot Attack Mode” settings in Cloudflare are notorious for blocking legitimate AI crawlers. They see a non-human user agent and present a JavaScript challenge that the bot cannot solve. You must go into your Cloudflare settings and specifically allow the user agents for `GPTBot, ClaudeBot`, etc. Cloudflare’s new “AI Audit” feature can help identify and allow these bots.
  • Content Behind Paywalls or Login Walls: An AI crawler is an unauthenticated user. If your definitive guide is behind a hard paywall or requires a login, the bot will only see the login page. It cannot index what it cannot see. If you run a membership site, consider having public, citable summaries or abstracts.
  • Missing Canonical URLs: If you have the same content accessible at multiple URLs (e.g., with and without `www, or with tracking parameters), you must use the rel=”canonical”` link tag to tell all bots which URL is the master version. Without it, AI models might see your content as duplicate or low-quality.
  • Relying on Images or Video for Key Info: LLMs primarily read text. If your product’s price, specs, or key features are only available in an image or a video, the AI crawler will miss them. All critical information should exist as plain HTML text on the page.

Making your site agent-readable isn’t a one-time fix; it’s a new layer of web maintenance. It requires a shift in thinking from just pleasing human visitors and search engine spiders to also accommodating machine learning models. The sites that do this work now will become the trusted, citable sources for the next generation of search and information discovery.

If you’ve gone through this guide and feel it’s more than you want to manage yourself, this is the kind of deep-dive technical audit we perform. Our Agent-Ready Site audit is a full readiness scan that covers everything mentioned here, from `robots.txt` configuration to JSON-LD validation and firewall rules, to ensure your site is positioned to be a source of truth for AI agents.

Originally published at guardlabs.online. More tooling for indie builders & small agencies — guardlabs.online.