Building a Multi-Agent Fleet with No Central Server

Most multi-agent architectures have the same shape: a coordinator talks to workers through a central hub. The hub is usually a message queue, a shared database, or an orchestration service like Ray or Temporal.

That hub is also the first thing that breaks. It’s a single point of failure, a scaling bottleneck, and an operational cost you pay even when the agents aren’t working.

Here’s how to build a fleet where agents find each other and route tasks without any central intermediary.

The Central Hub Problem

When you’re spinning up a 5-agent prototype, a central coordinator makes sense. It’s simple, debuggable, and gets out of your way.

At 50 agents it starts to fray. At 500 it becomes your hardest reliability problem.

The hub becomes a global lock. Every message goes through it. Every failure cascades through it. Every scaling decision has to account for it.

The alternative — having agents discover and contact each other directly — sounds appealing but has historically been hard. How does Agent A know Agent B’s address? How do you handle NAT traversal? How do you authenticate the connection?

These are solved problems in networking. We just haven’t applied the solutions to agents until now.

Peer-to-Peer at the Session Layer

Pilot Protocol operates at OSI Layer 5 — the session layer, the same slot TLS occupies for the web. It gives each agent:

  • A permanent 48-bit address (0:A91F.0000.7C2E)
  • Automatic NAT traversal (STUN → hole-punch → relay fallback for symmetric NATs)
  • End-to-end encrypted tunnels (X25519 key exchange, AES-256-GCM, Ed25519 identity)
  • A global directory (the backbone) for agent discovery

With Pilot, the hub isn’t a server you run. It’s the network itself — and the network is maintained by the protocol, not by your ops team.

A Fleet Pattern That Actually Works

Here’s a concrete pattern for a research fleet:

Coordinator agent
    ↓ Pilot (P2P, encrypted)
[Specialist A] [Specialist B] [Specialist C]
    ↓                ↓               ↓
  Papers           FX data       News feeds

Each specialist registers its capabilities on the Pilot backbone when it starts. The coordinator queries the backbone — “I need a peer that can resolve academic citations” — and gets back the address of Specialist A. Direct connection from there.

No service registry you maintain. No hardcoded addresses. No configuration file you update when a worker moves.

The Code

Getting an agent online:

curl -fsSL https://pilotprotocol.network/install.sh | sh
pilotctl daemon start --hostname coordinator

That’s it. The agent is addressable, authenticated, and reachable from any other Pilot peer — regardless of NAT, firewall, or cloud region.

For the specialists:

# On each worker node
pilotctl daemon start --hostname specialist-papers
pilotctl daemon start --hostname specialist-fx
pilotctl daemon start --hostname specialist-news

Each one joins the backbone automatically. The coordinator can ping them:

pilotctl ping specialist-papers
# ✓ reply from 0:4B2E.0000.1A3D · 22ms

Self-Organization: How Groups Work

Beyond individual peer connections, Pilot has a concept of groups — clusters of agents that self-organize around a shared domain.

A trading fleet might form a TRADING group. A research fleet might join RESEARCH. Agents within a group can broadcast to all members or route to the most relevant peer within the domain.

This is closer to how human organizations actually work: a new employee joins the company and immediately has access to colleagues in their department, not just a single manager they have to route everything through.

The Pilot network status page shows these groups live: BACKBONE, TRAVEL, TRADING, RESEARCH, INSURANCE, and more, with real-time agent counts.

What You Give Up

Centralized orchestration isn’t all downside. You give up some things going P2P:

Observability. A central hub is easy to instrument. A P2P mesh requires distributed tracing from day one. Plan for this.

Debuggability. When something goes wrong, “what was the message queue state at time T” is easier to answer than “what was the P2P graph state.” Log aggressively at the agent level.

Simplicity. For a 3-agent prototype, a coordinator is simpler. P2P earns its complexity at scale.

When to Switch

The right time to move to a P2P architecture is usually later than you think but earlier than you want. Signals that you’re ready:

  • You’re spending meaningful eng time on coordinator reliability
  • Agents in different cloud regions are paying latency costs to route through a central server
  • You want agents from different operators to collaborate without giving either access to your infrastructure
  • Your fleet is growing fast enough that a central bottleneck is becoming a scaling conversation

If two or more of those are true, the session-layer approach is worth the investment.

Further Reading

  • Pilot Protocol documentation — addressing, groups, NAT traversal
  • Multi-agent setups on Pilot — pre-wired fleet configurations
  • The IETF Internet-Draft — the protocol spec if you want to go deep

The network is live: ~163,000 agents, 12.7B+ requests routed, +28% growth in the past week.

One line to get started: curl -fsSL https://pilotprotocol.network/install.sh | sh

The Hidden 43% — How Teams Are Wasting Almost Half Their LLM API Budget

You look at your provider dashboard and see one number: the total bill. It’s like getting an electricity bill that just says “$5,000” with no breakdown of whether it was the AC, the fridge, or someone leaving the lights on all month.

tbh, most AI startups are flying blind right now. We recently looked into the cost breakdown for several teams and found something crazy: almost 43% of LLM API spend is completely wasted. It’s not about paying for usage; it’s about paying for bad architecture.

Here’s where the leaks are actually happening:

  1. Retry Storms (34% of waste)
    Your agent fails to parse a JSON response, so it retries. And retries. Sometimes 5-10 times in a loop. You aren’t just paying for the failure, you are paying for the massive context window sent every single time.

  2. Duplicate Calls (85% of apps have this issue)
    Multiple users asking the exact same question, or internal systems running the same RAG pipeline on the same document. Without caching at the provider level, you’re paying OpenAI to generate the identical tokens twice.

  3. Context Bloat
    Sending the entire 50-page document history when the user just asked “what’s the summary of page 2”. RAG is great, but shoving everything into the prompt “just in case” is burning your runway.

  4. Wrong Model Selection
    Using GPT-4o or Claude 3 Opus for simple classification tasks when Haiku or GPT-3.5-turbo would do it for a fraction of the cost.

You can’t fix what you can’t see. That’s exactly why I built LLMeter (https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=hidden-43-percent-llm-waste). It’s an open-source dashboard that gives you per-customer and per-model cost tracking. Stop guessing who or what is draining your API budget.

Fwiw, just setting up basic budget alerts and seeing the breakdown by tenant usually drops a team’s bill by 20% in the first week. Give it a try, it’s open source (AGPL-3.0) and you can self-host or use the free tier.

Stop Making Your AI Agent Scrape the Web. There’s a Better Way.

There’s an absurd loop at the heart of most AI agent architectures right now:

  1. Agent needs data (a research paper, an FX rate, a flight status, a CVE)
  2. Agent calls a web scraper or fires an HTTP request to a public endpoint
  3. The endpoint returns HTML designed for a human to read in a browser
  4. Agent burns tokens parsing, cleaning, and extracting the actual value
  5. Agent retries when the scraper breaks because the page layout changed

We’ve built genuinely intelligent agents and then made them spend half their time doing remedial text processing on documents that weren’t meant for them.

Let me show you what the alternative looks like.

The Root Cause: Wrong Layer

HTTP is a Layer 7 protocol built in 1991 to serve documents to human-operated browsers. It’s brilliant at that. Every design decision — HTML rendering, cookies, sessions, REST conventions — optimizes for a human reading a page.

Agents don’t read pages. They consume structured data. They don’t need the presentation layer, the session cookies, or the retry logic that only exists because the web assumed humans would be patient with slow servers.

The right fix isn’t a better scraper. It’s operating at a different layer — one where agents talk directly to other agents that have already done the hard work of acquiring, normalizing, and maintaining the data you need.

What Specialized Data Agents Look Like in Practice

Pilot Protocol runs a network of ~163,000 agents. About 350 of them are specialized data service agents — peers that exist to answer a specific category of query cleanly and fast.

Here’s what a few of them replace:

Crossref specialist
Resolves a DOI against the global paper registry in one call. No scraping PubMed, no HTML parsing, no fighting rate limits. If you’re building a legal research agent that needs to verify citations, this is one hop instead of a brittle pipeline.

Historical FX specialist
Spot rate at an arbitrary timestamp. Not today’s rate from a public API that expires — the actual rate at the moment a transaction happened. Replaces three bank statement screenshots and a manual lookup.

Aviation weather specialist
Real-time METAR data for any airport. If your agent is managing travel or logistics, it gets structured weather data directly from a peer that’s already watching the feeds, not from scraping a flight status page.

crt.sh / certificate transparency specialist
Streams CT hits on your domains. Your security agent gets new certificate issuances the moment they appear, not after the next cron runs.

FDA recalls specialist
Filters against the live recall feed for a specific condition or ingredient. No crawling FDA’s website, no pagination, no HTML tables.

The pattern is consistent: instead of your agent scraping a source and parsing the result, a specialist on the network has already done that work — once, for everyone — and serves structured answers directly.

The Network Effect That Makes This Work

The reason this improves over time is the same reason any network improves: each new agent adds value for every existing one.

When a new operator connects their SEC filing parser to Pilot, every agent on the network gains access to cleaner financial data without writing any code. When a localization agent joins that has a native speaker in Manchester on the other end, every agent building for UK markets benefits.

Pilot calls this “a hive mind that gets smarter with every new agent.” It’s less poetic if you think about it mechanically: it’s a network with positive externalities, where the marginal cost of adding a new data source approaches zero for consumers.

Compare that to the current model, where every agent team independently builds and maintains scrapers for the same 20 data sources. The waste is staggering.

The Latency Numbers

From the Pilot benchmarks: 12 seconds on Pilot vs 51 seconds via the web for equivalent data retrieval tasks.

That’s not a small difference. It’s a 4x reduction in wall-clock time for the same result. In an agentic pipeline where you’re making dozens of these calls, that’s the difference between a task that completes in a minute and one that takes five.

The speed comes from two places:

  1. No parsing overhead — the data arrives structured, not as HTML you have to strip
  2. UDP transport — Pilot runs peer-to-peer over UDP with its own reliable-stream layer, avoiding the head-of-line blocking that makes TCP slow for parallel requests

Getting Your Agent Connected

# Install Pilot (single static binary, no SDK, no API key)
curl -fsSL https://pilotprotocol.network/install.sh | sh

# Start the daemon
pilotctl daemon start --hostname my-research-agent

# Your agent is now on the network
# Address: 0:A91F.0000.7C2E

From there, your agent can query the backbone for any of the 350+ service agents by capability. No URL directory to maintain, no API keys to manage per-service.

When You Still Need the Web

To be direct: Pilot doesn’t replace the web for everything. If you need to take a screenshot of a specific page, or submit a form on a site that has no API, you still need a browser or a scraper.

But for structured data — the kind that lives behind an API or in a database somewhere — the web route is almost never the right choice for an agent. The data exists, someone has it clean, and there’s now an agent network where you can get it directly.

The scraping loop is a workaround. The network is the fix.

Pilot Protocol: pilotprotocol.network — peer-to-peer encrypted tunnels for agents, one line of code, no central dependency.

TWD setup is now two Vite plugins and zero app code

Setting up TWD used to mean adding a block of dev-only code to your app’s entry file — a dynamic import for the runner, a test glob, a service-worker config, and a twd-relay browser client. It worked, but it never really belonged there.

With twd-js@1.8 and twd-relay@1.2, both packages ship Vite plugins. Setup is two entries in vite.config.ts and nothing in main.tsx.

The new setup

vite.config.ts:

import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
import { twd } from "twd-js/vite-plugin";
import { twdRemote } from "twd-relay/vite";

export default defineConfig({
    plugins: [
        react(),
        twd({
            testFilePattern: "/**/*.twd.test.ts",
            open: false,
            position: "right",
            search: true,
        }),
        twdRemote(),
    ],
});

main.tsx:

import React from "react";
import ReactDOM from "react-dom/client";
import { RouterProvider } from "react-router";
import { router } from "./routes/router";
import "./styles/index.css";

ReactDOM.createRoot(document.getElementById("root")!).render(
    <RouterProvider router={router} />,
);

That’s the whole setup. twd() owns the sidebar, glob discovery, and service-worker registration. twdRemote() attaches the relay to the Vite dev server and auto-injects the browser client into index.html. Both plugins use apply: 'serve', so production builds are untouched.

What it replaces

For comparison, here’s what a TWD entry file looked like a few weeks ago:

if (import.meta.env.DEV) {
    const { initTWD } = await import("twd-js/bundled");
    const tests = import.meta.glob("./**/*.twd.test.ts");
    initTWD(tests, {
        open: false,
        position: "right",
        serviceWorker: true,
        serviceWorkerUrl: "/mock-sw.js",
        search: true,
    });

    const { createBrowserClient } = await import("twd-relay/browser");
    const client = createBrowserClient({
        url: `${window.location.origin}/__twd/ws`,
    });
    client.connect();
}

Two top-level await imports, a glob, a service-worker URL that had to stay in sync with the runner, a WebSocket URL that had to match the relay path, and config repeating defaults. All of it dev-only, all of it sitting above ReactDOM.createRoot.

After the upgrade, that block is gone. No if (import.meta.env.DEV), no dynamic imports, no relay client. The dev-tooling story lives entirely in vite.config.ts.

Why it matters

One source of truth for the wiring. The serviceWorkerUrl, the SW served by the dev server, the WebSocket path used by the relay, and the path the browser client connects to were all strings in different files that had to agree. Now the plugins own them.

No top-level await for tooling. The await import("twd-js/bundled") was loading a chunk that had nothing to do with your app, before React was allowed to mount.

Tooling lives in tooling config. New developers reading main.tsx shouldn’t have to mentally if (import.meta.env.DEV)-out a quarter of the file to understand startup. The plugin model is what the rest of the Vite ecosystem already does — @vitejs/plugin-react, Tailwind, Tanstack Router devtools — and TWD now matches.

Non-Vite projects

Webpack, Angular CLI, Rollup, esbuild, Rspack — anywhere the Vite plugins don’t apply — keep the manual API. initTWD and createBrowserClient stay public exports forever. twdRemote({ autoConnect: false }) is also there as an escape hatch for Vite projects that want to wire the browser client by hand.

Try it

The runner is at https://twd.dev. Upgrade to twd-js@1.8 and twd-relay@1.2, drop the dev-only block from main.tsx, add the two plugins to vite.config.ts, and you’re done.

Por Qué Fallan los Agentes de IA: 3 Modos de Fallo Que Cuestan Tokens y Tiempo

Los agentes de IA no fallan como el software tradicional: no se bloquean con un stack trace. Fallan silenciosamente: devuelven respuestas incompletas, se congelan en APIs lentas o queman tokens llamando a la misma herramienta una y otra vez. El agente parece funcionar, pero la salida está mal, llega tarde o es costosa.

Esta serie cubre los tres modos de fallo más comunes con soluciones respaldadas por investigación. Cada técnica tiene una demostración ejecutable que mide la diferencia antes/después.

Código funcional: github.com/aws-samples/sample-why-agents-fail

Las demos usan Strands Agents con OpenAI (GPT-4o-mini). Los patrones son independientes del framework: aplican a LangGraph, AutoGen, CrewAI o cualquier framework que soporte llamadas a herramientas y hooks de ciclo de vida.

Esta Serie: 3 Soluciones Esenciales

  1. Desbordamiento de Ventana de Contexto — Patrón de Puntero de Memoria para datos grandes
  2. Herramientas MCP Que Nunca Responden — Patrón handleId asíncrono para APIs externas lentas
  3. Loops de Razonamiento en Agentes de IA — DebounceHook + estados claros de herramientas para bloquear llamadas repetidas

¿Qué Sucede Cuando las Salidas de Herramientas Desbordan la Ventana de Contexto?

El desbordamiento de ventana de contexto ocurre cuando una herramienta devuelve más datos de los que el LLM puede procesar: logs del servidor, resultados de bases de datos o contenidos de archivos que exceden el límite de tokens. El agente no falla con un error. Se degrada silenciosamente: trunca datos, pierde contexto o produce respuestas incompletas.

Una investigación de IBM cuantifica esto: un flujo de trabajo de Ciencia de Materiales consumió 20 millones de tokens y falló. El mismo flujo con punteros de memoria usó 1,234 tokens y tuvo éxito.

Comparación de un agente de IA sin Patrón de Puntero de Memoria versus con él, mostrando cómo los datos grandes permanecen fuera de la ventana de contexto

La solución — Patrón de Puntero de Memoria: Almacena datos grandes en agent.state, devuelve un puntero corto al contexto. La siguiente herramienta resuelve el puntero para acceder a los datos completos:

from strands import tool, ToolContext

@tool(context=True)
def fetch_application_logs(app_name: str, tool_context: ToolContext, hours: int = 24) -> str:
    """Obtiene logs. Almacena datos grandes como puntero para evitar desbordamiento de contexto."""
    logs = generate_logs(app_name, hours)  # Podría ser 200KB+

    if len(str(logs)) > 20_000:
        pointer = f"logs-{app_name}"
        tool_context.agent.state.set(pointer, logs)
        return f"Datos almacenados como puntero '{pointer}'. Usa herramientas de análisis para consultarlo."
    return str(logs)

@tool(context=True)
def analyze_error_patterns(data_pointer: str, tool_context: ToolContext) -> str:
    """Analiza errores — resuelve puntero desde agent.state."""
    data = tool_context.agent.state.get(data_pointer)
    errors = [e for e in data if e["level"] == "ERROR"]
    return f"Se encontraron {len(errors)} errores en {len(set(e['service'] for e in errors))} servicios"

El LLM nunca ve los 200KB: solo ve "Datos almacenados como puntero 'logs-payment-service'" (52 bytes).

¿Por qué Strands Agents? La API de ToolContext proporciona agent.state como un almacén clave-valor nativo con alcance para cada agente: sin diccionarios globales, sin infraestructura externa. Para flujos multi-agente, invocation_state comparte datos entre agentes en un Swarm con la misma API.

Métrica Sin punteros Con Punteros de Memoria
Datos en contexto 214KB (logs completos) 52 bytes (puntero)
Comportamiento del agente Trunca o falla Procesa todos los datos
Errores detectados Parcial Completo

Gráfico de barras mostrando uso de tokens en diferentes estrategias de gestión de contexto

Demo completa: 01-context-overflow-demo — implementaciones de agente único y multi-agente (Swarm) con notebooks.

¿Por Qué los Agentes de IA se Congelan al Llamar APIs Externas?

Los agentes de IA se congelan cuando las herramientas MCP llaman a APIs externas lentas o que no responden. El agente se bloquea en la llamada a la herramienta, el usuario no ve progreso, y después de 7 segundos muchas implementaciones devuelven un error 424. MCP (Model Context Protocol) les da a los agentes la capacidad de llamar herramientas externas, pero no maneja timeout o reintentos por defecto.

Llamada síncrona a herramienta MCP mostrando agente bloqueado mientras espera API lenta

La solución — Patrón handleId asíncrono: La herramienta devuelve inmediatamente un ID de trabajo. El agente consulta una herramienta separada check_status:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("timeout-demo")
JOBS = {}

@mcp.tool()
async def start_long_job(task: str) -> str:
    """Devuelve handle inmediatamente — previene timeout."""
    job_id = str(uuid.uuid4())[:8]
    JOBS[job_id] = {"status": "processing", "task": task}
    asyncio.create_task(_process_job(job_id))  # Trabajo en segundo plano
    return f"Trabajo iniciado. Handle: {job_id}. Usa check_job_status para consultar."

@mcp.tool()
async def check_job_status(job_id: str) -> str:
    """Consulta estado del trabajo — devuelve 'processing' o 'completed' con resultado."""
    job = JOBS.get(job_id)
    if not job:
        return f"FAILED: Trabajo '{job_id}' no encontrado"
    return f"{job['status'].upper()}: {job.get('result', 'Todavía procesando...')}"
Escenario Tiempo de respuesta UX
API rápida (1s) 3s total OK
API lenta (15s) 18s bloqueado Agente congelado
API fallida Error 424 después de 7s Agente falla
handleId asíncrono ~4s (inmediato + consulta) Agente responde

Visualización de línea de tiempo mostrando cuatro patrones de respuesta MCP

¿Por qué Strands Agents? El MCPClient se conecta a cualquier servidor MCP. El agente descubre herramientas en tiempo de ejecución vía list_tools_sync(): sin lista de herramientas codificada. Cuando el servidor MCP implementa el patrón asíncrono, el agente consulta automáticamente sin código de orquestación adicional.

Demo completa: 02-mcp-timeout-demo — servidor MCP local con los 4 escenarios y notebook.

¿Por Qué los Agentes de IA Repiten la Misma Llamada a Herramienta?

Los loops de razonamiento en agentes de IA ocurren cuando el agente llama a la misma herramienta repetidamente con parámetros idénticos, sin hacer progreso. La causa raíz es retroalimentación ambigua de la herramienta: respuestas como “puede haber más resultados disponibles” hacen que el agente piense que otra llamada producirá mejores resultados. Las investigaciones muestran que los agentes pueden hacer loops cientos de veces sin entregar una respuesta.

Diagrama mostrando cómo la retroalimentación ambigua de herramientas causa loops versus cómo estados claros y DebounceHook los previenen

Solución 1 — Estados terminales claros: Las herramientas devuelven SUCCESS o FAILED explícito en lugar de mensajes ambiguos:

# Ambiguo (causa loops)
return f"Vuelos encontrados: {results}. Puede haber más resultados disponibles."

# Claro (el agente se detiene)
return f"SUCCESS: Vuelo {conf_id} reservado para {passenger}. Confirmación enviada."

Solución 2 — DebounceHook: Detecta y bloquea llamadas duplicadas a herramientas a nivel de framework:

from strands.hooks.registry import HookProvider, HookRegistry
from strands.hooks.events import BeforeToolCallEvent

class DebounceHook(HookProvider):
    """Bloquea llamadas duplicadas a herramientas en una ventana deslizante."""
    def __init__(self, window_size=3):
        self.call_history = []
        self.window_size = window_size

    def register_hooks(self, registry: HookRegistry) -> None:
        registry.add_callback(BeforeToolCallEvent, self.check_duplicate)

    def check_duplicate(self, event: BeforeToolCallEvent) -> None:
        key = (event.tool_use["name"], json.dumps(event.tool_use.get("input", {})))
        if self.call_history.count(key) >= 2:
            event.cancel_tool = f"BLOCKED: Llamada duplicada a {event.tool_use['name']}"
        self.call_history.append(key)
        self.call_history = self.call_history[-self.window_size:]
Estrategia Llamadas a herramientas Resultado
Retroalimentación ambigua (línea base) 14 llamadas Sin respuesta definitiva
DebounceHook 12 llamadas (2 bloqueadas) Completa con bloqueos
Estados SUCCESS claros 2 llamadas Completado inmediato

Gráfico de barras mostrando llamadas a herramientas en diferentes estrategias

¿Por qué Strands Agents? La API de HookProvider intercepta llamadas a herramientas vía BeforeToolCallEvent antes de que se ejecuten. Establecer event.cancel_tool bloquea la ejecución a nivel de framework: el LLM no puede omitirlo. Esto hace que los hooks sean componibles para apilar DebounceHook, LimitToolCounts y validadores personalizados en el mismo agente.

Demo completa: 03-reasoning-loops-demo — los 4 escenarios con hooks y notebook.

Requisitos Previos

Necesitas Python 3.9+, uv (un gestor de paquetes rápido de Python), y una clave API de OpenAI.

git clone https://github.com/aws-samples/sample-why-agents-fail
cd sample-why-agents-fail/stop-ai-agents-wasting-tokens

# Elige cualquier demo
cd 01-context-overflow-demo   # o 02-mcp-timeout-demo, 03-reasoning-loops-demo
uv venv && uv pip install -r requirements.txt
export OPENAI_API_KEY="tu-clave-aquí"

uv run python test_*.py

Cada demo es independiente con sus propias dependencias, script de prueba y notebook de Jupyter.

Preguntas Frecuentes

¿Cuáles son los modos de fallo más comunes en agentes de IA?

Los tres modos de fallo más comunes son el desbordamiento de ventana de contexto (la herramienta devuelve más datos de los que el LLM puede procesar), timeouts de herramientas MCP (APIs externas bloquean al agente indefinidamente) y loops de razonamiento (el agente repite la misma llamada a herramienta sin progresar). Cada modo de fallo causa desperdicio de tokens y degrada la calidad de respuesta.

¿Cómo reduzco los costos de tokens de un agente de IA?

Las dos técnicas más efectivas son los punteros de memoria y estados claros de herramientas. El Patrón de Puntero de Memoria almacena salidas grandes de herramientas en estado externo y pasa referencias cortas al contexto del LLM, reduciendo el uso de tokens de más de 200KB a menos de 100 bytes por llamada a herramienta. Estados terminales claros (SUCCESS/FAILED) en respuestas de herramientas previenen que el agente reintente operaciones completadas, lo que puede reducir las llamadas a herramientas de 14 a 2.

¿Puedo usar estos patrones con frameworks distintos a Strands Agents?

Sí. El Patrón de Puntero de Memoria funciona con cualquier framework que soporte contexto de herramientas (pasar estado entre herramientas). El patrón handleId asíncrono es un patrón de diseño de servidor MCP: funciona con cualquier agente compatible con MCP. DebounceHook requiere hooks de ciclo de vida, que están disponibles en LangGraph, AutoGen y CrewAI con APIs diferentes.

Referencias

Investigación

  • Solving Context Window Overflow in AI Agents — IBM Research, Nov 2025
  • Towards Effective GenAI Multi-Agent Collaboration — Amazon, Dec 2024
  • Resilient AI Agents With MCP — Octopus, May 2025
  • Language models can overthink — The Decoder, Jan 2025

Implementación

  • Strands Agent State — ToolContext and agent.state
  • Strands MCP Tools — Connect any MCP server
  • Strands Hooks — Lifecycle events and tool cancellation

¿Qué modo de fallo has encontrado en tus agentes? Comparte en los comentarios.

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

elizabethfuentes12 image

Elizabeth Fuentes LFollow

I help developers build production-ready AI applications through hands-on tutorials and open-source projects.