Web scraping for AI agents: How to give your agents web access

AI agents are only as useful as the information they can act on. A reasoning model with a January knowledge cutoff can’t tell you today’s pricing, yesterday’s news, or what your competitor just changed on their homepage. Giving your agent a way to reach out and pull fresh data from the web is how you fix that.

Web scraping is how you do that. This guide walks through how it works, what breaks, and how to wire it cleanly into an AI agent workflow.

Why agents need live web access

Most LLMs are trained once and frozen. They know a lot, but that knowledge has an expiry date. This creates a fundamental problem for agents doing anything time-sensitive:

  • A research agent summarizing a competitor’s product page will surface stale pricing.
  • A lead generation agent building contact lists from directories misses companies founded last month.
  • A news monitoring agent trained on data from six months ago isn’t monitoring anything.
  • A price tracking agent with no live feed is just guessing.

Equipping your agent with a tool call that fetches current HTML, parses it intelligently, and returns structured data is how you solve this.

What scraping looks like in an agent loop

In practice, scraping fits into an agent’s tool-use loop the same way a database query or API call does. The agent decides it needs information from a URL, calls the scraping tool, gets back structured data, and continues reasoning.

Agent needs: "What's the current price of product X?"
  → calls scrapeUrl(url, prompt)
  → gets back: { "name": "Product X", "price": 49.99, "currency": "USD" }
  → continues: "The price is $49.99, which is $5 lower than last week..."

This workflow is also represented in the diagram below:

What scraping looks like in an agent loop

The key design question is: what does scrapeUrl actually do under the hood?

Different scraping approaches

There are a few ways to implement web access for an agent. They sit on a spectrum of complexity vs. reliability.

Raw HTTP + HTML parsing

The simplest approach: fetch the URL with fetch, parse the HTML with a library like Cheerio, extract what you need with selectors.

import * as cheerio from "cheerio";

async function scrape(url) {
  const res = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0" } });
  const html = await res.text();
  const $ = cheerio.load(html);
  return $("body").text();
}

The problem: Most modern websites don’t return meaningful HTML on the first HTTP request. They’re JavaScript-rendered. The above returns a shell. The content loads after JS executes. You’ll also get blocked quickly with no proxy rotation.

Headless browsers

Tools like Playwright and Puppeteer launch a real browser, wait for JS to execute, then let you extract content. More reliable for modern sites.

import { chromium } from "playwright";

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url);
await page.waitForLoadState("networkidle");
const content = await page.content();
await browser.close();

The problem: This is expensive to run at scale. Infrastructure, browser pools, proxy management, and CAPTCHA handling all become your problem. And sophisticated anti-bot systems will still block you based on browser fingerprinting.

Scraping APIs

The third option: delegate all of that to a purpose-built API. You send a URL and a description of what you want. The API handles browser automation, proxy rotation, CAPTCHA solving, and returns clean structured data.

For agents, this is almost always the right call. You get a simple async interface, reliable results, and you’re not managing headless browser infrastructure.

The real challenges (and why they matter for agents)

Before picking an approach, understand what actually breaks in production:

  • Anti-bot detection: IP rate limiting, CAPTCHA challenges, browser fingerprinting. If your agent scrapes the same site repeatedly, naive implementations get blocked fast.

  • JavaScript-rendered content: Most product pages, social feeds, and dashboards render content after the initial HTML loads. Raw HTTP fetches get empty shells.

  • Unstructured output: Raw HTML or even extracted text isn’t what your agent wants. Agents reason better over {"price": 49.99} than over a wall of text that contains the price somewhere.

  • Async workflows: Scraping takes time (seconds, not milliseconds). Your agent can’t block waiting for a result. You need job submission, polling, and async result handling baked in.

  • Scale: If your agent processes 100 leads at a time, you need batch processing. Running 100 sequential scrape calls is slow and fragile.

What agent-ready scraping looks like

Here’s what the ideal scraping tool looks like from an agent’s perspective:

  1. Natural language prompts: The agent describes what it wants, not how to get it. "Extract the job title, company, and salary range" rather than a CSS selector.
  2. Structured JSON output: Returns a typed object matching a schema the agent defines. No parsing, no regex, no string manipulation.
  3. Async with polling: Submit a job, get a job ID, poll for results. Non-blocking.
  4. Proxy and anti-bot handling built in: The agent doesn’t care about IP rotation. That’s infrastructure.
  5. Batch support: Submit 50 URLs at once, get 50 results back.

Let’s build this.

Practical Implementation

The following examples use Spidra, an API built specifically for this pattern: browser automation, proxy rotation, CAPTCHA solving, and AI-powered extraction in one endpoint. The concepts translate to any scraping API with similar capabilities.

Setup

Get an API key from app.spidra.io → Settings → API Keys.

API Key on Spidra dashboard

Base URL: https://api.spidra.io/api
Auth: x-api-key header on every request.

Example 1: Simple scrape tool for an agent

The pattern is always the same: submit a job, get a jobId, poll until complete.

const API_KEY = "your-api-key";
const BASE_URL = "https://api.spidra.io/api";
const HEADERS = { "x-api-key": API_KEY, "Content-Type": "application/json" };

async function scrape(url, prompt, schema, options = {}) {
  const payload = {
    urls: [{ url }],
    prompt,
    output: "json",
    useProxy: true,
    ...(schema && { schema }),
    ...options,
  };

  const res = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify(payload),
  });
  const { jobId } = await res.json();

  while (true) {
    const status = await fetch(`${BASE_URL}/scrape/${jobId}`, {
      headers: HEADERS,
    }).then((r) => r.json());

    if (status.status === "completed") return status.result.content;
    if (status.status === "failed") throw new Error(status.error);

    await new Promise((r) => setTimeout(r, 3000));
  }
}

Now your agent has a clean tool call:

const result = await scrape(
  "https://news.ycombinator.com",
  "List the top 5 stories with title, points, and comment count",
  {
    type: "object",
    required: ["stories"],
    properties: {
      stories: {
        type: "array",
        items: {
          type: "object",
          required: ["title", "points", "comments"],
          properties: {
            title: { type: "string" },
            points: { type: "number" },
            comments: { type: "number" },
            url: { type: ["string", "null"] },
          },
        },
      },
    },
  }
);

// result.stories → [{ title, points, comments, url }, ...]

The agent gets back a typed list it can iterate, filter, and reason over. No parsing.

Example 2: Structured output with JSON schema

The schema field is the most important feature for agent use. Instead of getting unpredictable text, you define the exact shape of the response and the API enforces it.

Here’s a job listing extractor:

const result = await scrape(
  "https://jobs.example.com/senior-engineer",
  "Extract all details about this job listing.",
  {
    type: "object",
    required: ["title", "company", "remote"],
    properties: {
      title: { type: "string" },
      company: { type: "string" },
      location: { type: ["string", "null"] },
      remote: { type: ["boolean", "null"] },
      salary_min: { type: ["number", "null"] },
      salary_max: { type: ["number", "null"] },
      employment_type: {
        type: ["string", "null"],
        enum: ["full_time", "part_time", "contract", null],
      },
      skills: {
        type: "array",
        items: { type: "string" },
      },
    },
  }
);

// Guaranteed shape: fields in `required` always present, nullable where marked
// {
//   title: "Senior Engineer",
//   company: "Acme Corp",
//   location: "Austin, TX",
//   remote: true,
//   salary_min: 140000,
//   salary_max: 180000,
//   employment_type: "full_time",
//   skills: ["TypeScript", "React", "AWS"]
// }

Two rules worth knowing:

  • Fields in required always appear, as null if the data isn’t found.
  • Optional fields are omitted entirely if unavailable.
  • Mark anything that might be missing as ["type", "null"] to avoid surprises.

Example 3: Crawling an entire site

Sometimes your agent doesn’t know which pages to scrape. It needs to discover them. The crawl endpoint handles this: give it a base URL, tell it which pages to find, and what to extract from each.

async function crawlSite(baseUrl, crawlInstruction, extractInstruction, maxPages = 20) {
  const res = await fetch(`${BASE_URL}/crawl`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({
      baseUrl,
      crawlInstruction,
      transformInstruction: extractInstruction,
      maxPages,
      useProxy: true,
    }),
  });
  const { jobId } = await res.json();

  while (true) {
    const data = await fetch(`${BASE_URL}/crawl/${jobId}`, {
      headers: HEADERS,
    }).then((r) => r.json());

    if (data.status === "completed") return data.result;
    if (data.status === "failed") throw new Error("Crawl failed");

    console.log(data.progress?.message ?? "crawling...");
    await new Promise((r) => setTimeout(r, 5000));
  }
}

// Example: crawl a competitor's blog for content strategy research
const posts = await crawlSite(
  "https://competitor.com/blog",
  "Find all blog post pages published in the last 6 months",
  "Extract the title, author, publish date, and a one-sentence summary",
  30
);

// posts → [{ url, title, data: { title, author, publish_date, summary } }, ...]

Example 4: Geo-targeted scraping

Some sites show different content based on the visitor’s country: prices in local currency, region-specific inventory, geo-restricted offers. Use proxyCountry to scrape from a specific location.

// Scrape a German Amazon page with a German IP
const result = await scrape(
  "https://www.amazon.de/gp/bestsellers/electronics",
  "List the top 10 bestselling electronics with name and price in EUR",
  {
    type: "object",
    required: ["products"],
    properties: {
      products: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            price_eur: { type: ["number", "null"] },
            rank: { type: "number" },
          },
        },
      },
    },
  },
  { proxyCountry: "de" }
);

// Spidra supports 50+ country codes: us, gb, de, fr, jp, au, ca, br, in, ...
// Use "eu" for rotating EU proxies, "global" for worldwide rotation

Example 5: Authenticated scraping

For pages behind a login: dashboards, account pages, paywalled content. Pass session cookies directly.

// Export cookies from your browser DevTools (Application → Cookies)
// or grab them with document.cookie from the console

const result = await scrape(
  "https://app.example.com/dashboard/reports",
  "Extract monthly revenue, active users, and conversion rate for the last 3 months",
  {
    type: "object",
    required: ["months"],
    properties: {
      months: {
        type: "array",
        items: {
          type: "object",
          properties: {
            month: { type: "string" },
            revenue: { type: "number" },
            active_users: { type: "number" },
            conversion_rate: { type: "number" },
          },
        },
      },
    },
  },
  { cookies: "session=abc123; auth_token=xyz789; csrf=def456" }
);

Wiring it into an agent (full example)

Here’s a minimal but complete research agent using the Vercel AI SDK with scrapeUrl as a tool. The SDK handles the agentic loop: the model decides when to call the tool, the tool fetches live data, and the model reasons over the result.

import { generateText, tool } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const result = await generateText({
  model: anthropic("claude-opus-4-6"),
  maxSteps: 5,
  tools: {
    scrapeUrl: tool({
      description:
        "Fetch and extract structured data from a URL. Use this when you need current information from a website.",
      parameters: z.object({
        url: z.string().describe("The URL to scrape"),
        prompt: z
          .string()
          .describe("What to extract from the page, in plain English"),
      }),
      execute: async ({ url, prompt }) => {
        const data = await scrape(url, prompt);
        return JSON.stringify(data);
      },
    }),
  },
  prompt:
    "What are the top 3 trending repositories on GitHub today, and what do they do?",
});

console.log(result.text);

maxSteps lets the model make multiple tool calls in sequence if it needs to follow links, cross-reference sources, or refine its query. The scraping layer handles everything else. The model just decides what to fetch and what to ask for.

Practical agent use cases

To make this concrete, here are a few agent patterns that become viable with web access:

  • Competitive intelligence agent: Crawls competitor sites weekly, diffs pricing and feature changes, surfaces meaningful deltas to a Slack channel.

  • Lead enrichment agent: Given a list of company names, scrapes their websites, LinkedIn pages, and job boards to build structured profiles: company size, tech stack, recent hires, open roles.

  • Research agent: Given a topic, searches the web, scrapes the top results, synthesizes findings into a structured report with citations.

  • Price monitoring agent: Tracks SKUs across multiple retailers, alerts when prices drop below a threshold or when a product goes out of stock.

  • News digest agent: Crawls a configured list of sources each morning, extracts headlines and summaries, sends a curated briefing tailored to the user’s interests.

Each of these follows the same fundamental pattern: the agent knows what it wants, the scraping layer fetches and structures the data, and the agent reasons over clean output rather than raw HTML.

Wrapping up

Web access expands the category of problems an AI agent can tackle. A scraping tool lets it monitor competitor pages, research live topics, track prices, and respond to things happening right now. Without it, your agent is limited to reasoning over whatever it already knows.

The implementation is straightforward: a submit-and-poll pattern, a JSON schema for the output shape, and a proxy-enabled API to handle the infrastructure. The agent doesn’t need to know how any of that works. It just needs a reliable tool call that returns structured data. That’s the interface worth building toward.

Thanks for reading!

Build a Social Media Event Bus: React to Posts, Comments, and Follows in Real-Time

Social media platforms don’t give you webhooks. Instagram won’t ping your server when someone comments. TikTok won’t notify you when a creator posts.

So you build your own.

I built an event bus that polls social media APIs and converts changes into events. New post? Event. New comment? Event. Follower count changed by more than 5%? Event. Then any downstream system can subscribe — Discord bots, email senders, dashboards, CRMs.

It turned 10 separate “check social media” scripts into one system.

Architecture

Poller (cron jobs)
  │
  ├── Check profiles every 30 minutes
  ├── Check posts every 15 minutes
  ├── Check comments every hour
  │
  ↓ Detect changes (diff against last known state)
  │
Event Bus (in-process EventEmitter or Redis Pub/Sub)
  │
  ├── → Discord notifier
  ├── → Email sender
  ├── → Database logger
  ├── → Slack alerter
  └── → Webhook forwarder (POST to any URL)

The pollers detect changes. The event bus routes them. The handlers do whatever you want. Completely decoupled.

The Stack

  • Node.js – runtime
  • SociaVault API – data source
  • EventEmitter (built-in) – event bus for single-process; Redis Pub/Sub for multi-process
  • better-sqlite3 – state tracking
  • node-cron – polling schedule

Setup

mkdir social-event-bus && cd social-event-bus
npm init -y
npm install axios better-sqlite3 node-cron dotenv

Step 1: The State Store

To detect changes, you need to know what things looked like last time you checked.

// state.js
const Database = require('better-sqlite3');
const db = new Database('./state.db');

db.exec(`
  CREATE TABLE IF NOT EXISTS known_state (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
  );
`);

const getState = db.prepare('SELECT value FROM known_state WHERE key = ?');
const setState = db.prepare(`
  INSERT INTO known_state (key, value, updated_at) VALUES (?, ?, CURRENT_TIMESTAMP)
  ON CONFLICT(key) DO UPDATE SET value = excluded.value, updated_at = CURRENT_TIMESTAMP
`);

module.exports = {
  get: (key) => {
    const row = getState.get(key);
    return row ? JSON.parse(row.value) : null;
  },
  set: (key, value) => {
    setState.run(key, JSON.stringify(value));
  },
};

Step 2: The Event Bus

// bus.js
const { EventEmitter } = require('events');

class SocialEventBus extends EventEmitter {
  emit(eventType, payload) {
    const event = {
      type: eventType,
      timestamp: new Date().toISOString(),
      ...payload,
    };

    // Log every event
    console.log(`[EVENT] ${eventType}${payload.platform}/@${payload.username || 'unknown'}`);

    // Emit both the specific event and a wildcard
    super.emit(eventType, event);
    super.emit('*', event);

    return true;
  }
}

// Singleton
const bus = new SocialEventBus();
module.exports = bus;

Event types we’ll generate:

Event Trigger
new_post Creator published a new post/video
post_milestone A post crossed a view/like threshold
follower_change Follower count changed significantly (±5%)
new_comment New comment on a tracked post
engagement_spike Post engagement rate is 3x+ above creator’s average
profile_updated Bio, name, or profile pic changed

Step 3: The Pollers

Each poller fetches current data, diffs against stored state, and emits events for any changes.

// pollers/profile-poller.js
const axios = require('axios');
const state = require('../state');
const bus = require('../bus');

const api = axios.create({
  baseURL: 'https://api.sociavault.com/v1/scrape',
  headers: { 'x-api-key': process.env.SOCIAVAULT_API_KEY },
});

async function pollProfile(platform, username) {
  const endpoint = platform === 'instagram'
    ? `/instagram/profile?username=${username}`
    : `/tiktok/profile?username=${username}`;

  try {
    const { data: res } = await api.get(endpoint);
    const profile = res.data || res;

    const key = `profile:${platform}:${username}`;
    const previous = state.get(key);

    const current = {
      followers: profile.followersCount || profile.followerCount || 0,
      following: profile.followingCount || 0,
      posts: profile.postsCount || profile.videoCount || 0,
      bio: profile.bio || profile.signature || '',
      displayName: profile.fullName || profile.nickname || '',
    };

    if (previous) {
      // Check for follower changes (±5% or ±1000)
      const followerDelta = current.followers - previous.followers;
      const followerPercent = previous.followers > 0
        ? Math.abs(followerDelta / previous.followers) * 100
        : 0;

      if (followerPercent >= 5 || Math.abs(followerDelta) >= 1000) {
        bus.emit('follower_change', {
          platform,
          username,
          previous: previous.followers,
          current: current.followers,
          delta: followerDelta,
          percentChange: parseFloat(followerPercent.toFixed(1)),
        });
      }

      // Check for new posts
      if (current.posts > previous.posts) {
        bus.emit('new_post', {
          platform,
          username,
          previousCount: previous.posts,
          currentCount: current.posts,
          newPosts: current.posts - previous.posts,
        });
      }

      // Check for bio changes
      if (current.bio !== previous.bio) {
        bus.emit('profile_updated', {
          platform,
          username,
          field: 'bio',
          old: previous.bio,
          new: current.bio,
        });
      }
    }

    state.set(key, current);
  } catch (err) {
    console.error(`Poll failed for ${platform}/@${username}: ${err.message}`);
  }
}

module.exports = { pollProfile };
// pollers/post-poller.js
const axios = require('axios');
const state = require('../state');
const bus = require('../bus');

const api = axios.create({
  baseURL: 'https://api.sociavault.com/v1/scrape',
  headers: { 'x-api-key': process.env.SOCIAVAULT_API_KEY },
});

async function pollPosts(platform, username) {
  const endpoint = platform === 'instagram'
    ? `/instagram/posts?username=${username}&limit=5`
    : `/tiktok/profile-videos?username=${username}&limit=5`;

  try {
    const { data: res } = await api.get(endpoint);
    const posts = res.data || res.posts || [];

    for (const post of posts) {
      const postId = post.id || post.shortcode || post.videoId;
      if (!postId) continue;

      const key = `post:${platform}:${postId}`;
      const previous = state.get(key);

      const current = {
        likes: post.likesCount || post.diggCount || 0,
        comments: post.commentsCount || post.commentCount || 0,
        views: post.viewCount || post.playCount || null,
        shares: post.shareCount || null,
      };

      if (previous) {
        // Check for engagement spike
        const likeGrowth = previous.likes > 0
          ? current.likes / previous.likes
          : 0;

        if (likeGrowth >= 3) {
          bus.emit('engagement_spike', {
            platform,
            username,
            postId,
            metric: 'likes',
            previous: previous.likes,
            current: current.likes,
            multiplier: parseFloat(likeGrowth.toFixed(1)),
          });
        }

        // Check for view milestones (10K, 100K, 1M)
        const milestones = [10000, 100000, 1000000, 10000000];
        if (current.views) {
          for (const milestone of milestones) {
            if (previous.views < milestone && current.views >= milestone) {
              bus.emit('post_milestone', {
                platform,
                username,
                postId,
                milestone,
                currentViews: current.views,
              });
            }
          }
        }
      }

      state.set(key, current);
    }
  } catch (err) {
    console.error(`Post poll failed for ${platform}/@${username}: ${err.message}`);
  }
}

module.exports = { pollPosts };

Step 4: The Handlers

This is where you plug in whatever actions you want:

// handlers/discord.js
const axios = require('axios');
const bus = require('../bus');

const DISCORD_WEBHOOK = process.env.DISCORD_WEBHOOK_URL;

bus.on('new_post', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  await axios.post(DISCORD_WEBHOOK, {
    content: `🆕 **@${event.username}** posted ${event.newPosts} new ${event.newPosts === 1 ? 'post' : 'posts'} on ${event.platform}!`,
  });
});

bus.on('engagement_spike', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  await axios.post(DISCORD_WEBHOOK, {
    content: `🔥 **Engagement spike!** @${event.username}'s post is getting ${event.multiplier}x normal likes on ${event.platform}`,
  });
});

bus.on('follower_change', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  const direction = event.delta > 0 ? '📈' : '📉';
  const sign = event.delta > 0 ? '+' : '';
  await axios.post(DISCORD_WEBHOOK, {
    content: `${direction} **@${event.username}** ${sign}${event.delta.toLocaleString()} followers (${event.percentChange}%) on ${event.platform}`,
  });
});
// handlers/webhook-forwarder.js
const axios = require('axios');
const bus = require('../bus');

// Forward all events to an external URL (your own API, Zapier, n8n, etc.)
const WEBHOOK_URL = process.env.FORWARD_WEBHOOK_URL;

bus.on('*', async (event) => {
  if (!WEBHOOK_URL) return;

  try {
    await axios.post(WEBHOOK_URL, event, {
      headers: { 'Content-Type': 'application/json' },
      timeout: 5000,
    });
  } catch (err) {
    console.error(`Webhook forward failed: ${err.message}`);
  }
});

Step 5: Main Entry Point

// index.js
require('dotenv').config();
const cron = require('node-cron');
const { pollProfile } = require('./pollers/profile-poller');
const { pollPosts } = require('./pollers/post-poller');

// Load handlers (they self-register on the bus)
require('./handlers/discord');
require('./handlers/webhook-forwarder');

// Accounts to monitor
const WATCHED = [
  { platform: 'instagram', username: 'competitor_1' },
  { platform: 'instagram', username: 'competitor_2' },
  { platform: 'tiktok', username: 'competitor_3' },
  { platform: 'tiktok', username: 'your_own_account' },
];

async function runProfilePolls() {
  console.log(`[${new Date().toISOString()}] Polling profiles...`);
  for (const account of WATCHED) {
    await pollProfile(account.platform, account.username);
    await new Promise(r => setTimeout(r, 500));
  }
}

async function runPostPolls() {
  console.log(`[${new Date().toISOString()}] Polling posts...`);
  for (const account of WATCHED) {
    await pollPosts(account.platform, account.username);
    await new Promise(r => setTimeout(r, 500));
  }
}

// Initial run
runProfilePolls();
runPostPolls();

// Schedule
cron.schedule('*/30 * * * *', runProfilePolls);  // Profiles every 30 min
cron.schedule('*/15 * * * *', runPostPolls);      // Posts every 15 min

console.log(`Social event bus started. Watching ${WATCHED.length} accounts.`);
console.log('Profile polls: every 30 minutes');
console.log('Post polls: every 15 minutes');

Why This Pattern?

Because polling scripts always start simple and end up as spaghetti. You start with one script that checks competitors and sends a Discord message. Then your boss wants Slack too. Then email. Then someone wants to log it to a spreadsheet. Then you need to check comments too, not just posts.

The event bus pattern means:

  • Adding a new data source = write one poller function
  • Adding a new action = write one handler function
  • They don’t know about each other — the poller doesn’t care if Discord or Slack or email is listening

I’ve run this pattern for 6 months. Added 4 handlers and 2 pollers without touching existing code once.

Scaling Up

When you outgrow a single Node.js process:

  1. Replace EventEmitter with Redis Pub/Sub — pollers publish, handlers subscribe, can run on different machines
  2. Move pollers to separate workers — one per platform
  3. Add a dead letter queue for failed handler deliveries
  4. Add a simple web UI to see recent events (Express + SSE)

But honestly, a single Node process on a $5 VPS handles 50+ accounts with room to spare.

Read the Full Guide

Build a Social Media Event Bus → SociaVault Blog

Turn social media data into real-time events with SociaVault — one API for TikTok, Instagram, YouTube, and 10+ platforms. Profiles, posts, comments, followers — all endpoints, one key.

Discussion

What’s your approach to “real-time” social media monitoring when the platforms don’t offer webhooks? Poll and diff like this, or a different strategy entirely?

javascript #nodejs #architecture #webdev #api

Why Single-Pass AI Test Generation Produces Garbage

After 9 years of writing test cases manually, I built an AI tool that generates them from User Stories. The first version used a single API call. The output looked reasonable until I tried to automate it.

“Verify the system works correctly.” What does that mean in Playwright?

“Enter valid data and submit.” What data? Which field? What’s the expected state after submit?

Single-pass AI treats test case writing like creative writing. But test cases are engineering artifacts. They need specific values, verifiable assertions, and steps an automation engineer can translate to code without asking questions.

So I rebuilt the pipeline with three passes. The quality jumped from 4-5/10 to 8-9/10 consistently. Here’s what I learned.

The single-pass problem

Give any LLM a User Story and ask for test cases. You get a reasonable-looking list. But look at what’s actually there:

Vague assertions — “Verify the system displays correct results.” What results? Where? How do I assert that?

Missing coverage — 8 acceptance criteria in the story, 3 test cases in the output. Five requirements untested.

No priority differentiation — every test case is Priority 1. When the build breaks and you have 10 minutes, which ones do you run?

Placeholder data — “Enter a valid email.” My automation script needs user@example.com, not a description of what to enter.

Merged scenarios — three distinct AC collapsed into one test. When it fails, which requirement is broken?

This isn’t a prompt engineering problem. I spent weeks tweaking prompts. The real issue is structural: one pass doesn’t have enough context to generate AND review simultaneously.

Three passes: Worker, Judge, Optimizer

Here’s what CasePilot does instead.

Pass 1 — Worker

The Worker generates initial test cases from full context:

  • User Story title, description, acceptance criteria
  • Discussion comments (filtered: human only, no bot noise)
  • Project Knowledge (tech stack, business rules, UI patterns)
  • Wiki/Confluence pages linked to the project
  • Parent Epic context (if the story is part of a larger feature)
  • Existing test cases (to avoid generating duplicates)

The Worker prompt is instructed to think like a mid-level QA automation engineer. Not a writer. Each acceptance criterion gets its own test. Test data uses concrete values, not placeholders.

The Worker also applies ISTQB test design techniques directly in the prompt:

  • Boundary Value Analysis — min, min+1, max-1, max for every numeric field
  • Equivalence Partitioning — valid class, invalid class, edge class
  • Decision Table Testing — combinations of conditions for complex logic
  • State Transition Testing — valid and invalid workflow transitions

Pass 2 — Judge

The Judge receives the Worker’s output plus the original User Story. It reviews like a QA Lead reviewing a pull request:

  • Can each test be translated directly into a test method?
  • Are assertions programmatically verifiable?
  • Are coverage gaps filled?
  • Are there duplicate or overlapping tests?

The Judge rewrites vague tests, adds missing edge cases, removes unnecessary ones, and scores the overall quality 1-10.

Real example: Worker generates 11 test cases for a registration form. Judge consolidates three email-validation tests into one parameterized test, removes a redundant “form displays correctly” check, adds a missing duplicate-email test. Result: 7 tests, quality score 9/10.

Pass 3 — Optimizer

For sets of 3+ test cases, the Optimizer analyzes the full suite:

  • Duplicate steps — “Navigate to login page” appears in 6 tests. Extract to shared precondition.
  • Overlapping coverage — Test 3 and Test 7 both verify the same error message. Merge or differentiate.
  • Suggested groups — Tests 1, 2, 5 share the same setup. Group them under “Authenticated User” precondition.

The Optimizer doesn’t change the tests. It gives you insights on how to structure your test suite when you automate.

What this looks like in practice

A User Story about applying discount codes at checkout. 8 acceptance criteria: valid percentage coupon, invalid coupon, expired coupon, empty cart, multiple coupons, coupon removal, minimum order amount, case-insensitive codes.

Single-pass output:
3 generic test cases, all Priority 1, 1-2 steps each. “Apply a valid coupon and verify discount.” No test data. No edge cases.

Three-pass output:
8 specific test cases. Mixed P1/P2/P3. Each has 3-5 steps with concrete data:

Title: [Checkout] should reject expired coupon code with clear error message
Category: negative
Priority: 2
Preconditions:
  - User is logged in with items in cart (total: $150.00)
  - Coupon "SUMMER2024" exists but expired on 2024-12-31
Steps:
  1. Navigate to checkout page
     Expected: Cart shows $150.00 total
  2. Enter "SUMMER2024" in coupon field and click Apply
     Expected: Error message "This coupon has expired" displayed
     Test Data: coupon = "SUMMER2024"
  3. Verify cart total remains $150.00
     Expected: No discount applied, total unchanged

An automation engineer reads this and starts writing code. No questions needed.

Five things I learned building this

1. Token budget matters more than prompt engineering.

I spent weeks tweaking prompts. The real breakthrough was increasing max output tokens from 4,096 to 8,192. The AI was literally running out of space to finish generating test cases. It would produce 3 good tests and then stop because the response was truncated. Not a quality problem. A capacity problem.

2. The model follows examples, not instructions.

“Generate at least one test per acceptance criterion” — ignored.
“Each test must have 3-5 steps with specific expected results” — partially followed.

Adding a concrete JSON example in the system prompt with 3 steps, specific assertions, real test data, and a [Feature Area] prefix fixed everything instantly. The AI pattern-matches on examples far more reliably than parsing natural language instructions.

3. Post-processing catches what prompts can’t enforce.

The AI won’t always:

  • Add [Feature Area] prefixes to titles
  • Distribute tests across positive/negative/edge categories
  • Include all ISTQB technique labels

Code-based post-processing handles these reliably. Trust AI for content, trust code for formatting. My pipeline has a postProcess step that enforces category distribution, adds feature area tags, scores flakiness risk, and flags shallow tests (fewer than 3 steps).

4. The Judge pass pays for itself.

Three API calls cost ~3x more than one. But the quality difference means users generate once instead of regenerating three times. Net token cost is actually lower. And the Judge catches real issues: a Worker test that says “Verify the page loads” gets rewritten to “Verify the checkout page displays cart items with prices, quantities, and subtotal matching the cart state.”

5. Speed vs quality is a false tradeoff.

The three-pass pipeline takes 30-60 seconds on GPT-5.4. Users are fine waiting one minute for test cases they can actually automate. They are not fine getting instant results they have to rewrite manually.

I added a three-phase progress bar showing Worker, Judge, Optimizer passes so users see progress instead of staring at a spinner. Perception of speed matters more than actual speed.

Beyond test cases

The same two-pass pattern (Worker + Judge) powers three tools now:

  • CasePilot — test case generation from User Stories
  • BugPilot — structured bug reports from vague descriptions (repro steps, severity, root cause, impact radius)
  • StoryPilot — complete User Story enrichment from a title (description, AC, priority, story points, risks, DoD)

The pattern works because review is fundamentally different from generation. The Worker creates. The Judge evaluates against the source material. Two different cognitive tasks that don’t combine well in a single prompt.

Try it

CasePilot is on the Azure DevOps Marketplace and coming to Jira. Free tier: 20 test cases/month, no credit card.

If you want to use the flakiness prediction and boundary value generation in your own test framework, I open-sourced those as a standalone npm package: npm install @iklab/testkit. Zero dependencies, works with Jest, Vitest, Playwright, anything.

I’m interested in how other people handle AI output quality for structured data. The three-pass approach works for test cases. Does it generalize to other domains where AI output needs to be precise and actionable? Let me know in the comments.

Ihor Kosheliev — Senior QA Automation Engineer. Building AI tools for QA at iklab.dev.

Building VoiceAgent: From Speech to Safe Action

Introduction

Voice interfaces feel natural to humans, but systems require structure, validation, and control.

VoiceAgent was built to bridge that gap — a system that takes voice input, understands intent, and executes actions safely.

This article focuses on the architecture, design choices, and challenges behind building the system.

System Architecture

The system follows a structured pipeline:

Voice → Text → Intent → Validation → Approval → Action

Each stage plays a critical role in ensuring both functionality and safety.

1. Speech-to-Text (Whisper)

For transcription, I used a local Whisper model.

Why Whisper?

  • High accuracy for speech recognition
  • Works offline (no API dependency)
  • No cost involved

Key Consideration

Handling audio input required:

  • Converting audio to float32 format
  • Normalizing amplitude
  • Resampling to 16 kHz for consistent input

2. Intent Detection (Groq + LLM)

Once text is generated, it is passed to a language model via Groq.

Why Groq?

  • Fast inference speed
  • Free tier available
  • Reliable for structured prompting

Approach

Instead of free-form output, I enforced structured JSON responses:

{
  "intent": "...",
  "params": {...},
  "reasoning": "..."
}

This ensured:

  • Predictability
  • Easier parsing
  • Better control over execution

3. Validation Layer

Before executing any action, the system performs strict validation:

  • Filename sanitization
  • Allowed file extensions only
  • File size limits
  • Prevention of overwriting existing files

This layer ensures that the system remains safe and controlled.

4. Human-in-the-Loop

For file-related actions, execution is not automatic.

The system pauses and asks for user confirmation.

This prevents unintended or harmful actions and adds an extra safety layer.

5. Execution Engine

Once approved, the system executes the action:

  • File creation
  • Code writing
  • Text responses

All operations are restricted to a local output/ directory.

Challenges Faced

1. Audio Handling

Handling both microphone input and file uploads required a unified processing pipeline. Different formats and sampling rates had to be normalized.

2. Transcription Noise

Speech models can produce unexpected outputs when audio is unclear. This was addressed using normalization and controlled inference settings.

3. Safe Execution

Allowing an AI system to create files introduces risk. The solution was a combination of:

  • Validation
  • Restricted directories
  • User confirmation

4. Structured LLM Output

Ensuring consistent JSON output from the model required careful prompt design and fallback handling.

Key Design Decisions

  • Use local Whisper to avoid API costs and enable offline capability
  • Use Groq for fast and efficient inference
  • Enforce structured JSON output for reliability
  • Add human confirmation for safety
  • Restrict execution to a sandboxed directory

Conclusion

VoiceAgent is not just about converting speech to text.

It is about building a system that:

  • Understands
  • Validates
  • Executes

— all while keeping the user in control.

This project highlights that in AI systems, safety and structure are just as important as intelligence.

Links

GitHub: https://github.com/Suraj308/VoiceAgent
Demo Video: https://youtu.be/gGnH3v7BVdQ

Servo Now on crates.io: What Rust Devs Need to Know

Servo Now on crates.io: What Rust Devs Need to Know

Meta Description: Servo is now available on crates.io, making the embeddable browser engine accessible to Rust developers. Here’s what it means, how to use it, and why it matters.

TL;DR: Servo, the experimental browser engine originally developed by Mozilla and now maintained by the Linux Foundation, is now available as a crate on crates.io. This means Rust developers can embed a real, modern web rendering engine directly into their applications with a single dependency. It’s a significant milestone for the Rust ecosystem and for anyone building apps that need HTML/CSS rendering without shipping a full browser.

Key Takeaways

  • Servo is now available on crates.io, making it trivially easy to add browser-engine capabilities to any Rust project
  • The crate enables embedding HTML, CSS, and JavaScript rendering directly into desktop and embedded applications
  • This is a major step toward Servo becoming a practical, production-ready alternative to WebView-based solutions
  • Early adopters should expect some API instability — this is still maturing software
  • The move signals growing confidence from the Servo project team and the broader Rust community in the engine’s stability

What Is Servo, and Why Does This Matter?

If you’ve been following the Rust ecosystem for any length of time, you’ve probably heard of Servo. Originally born inside Mozilla Research around 2012, Servo was an ambitious attempt to build a next-generation browser engine from scratch — one that could take full advantage of parallelism, memory safety, and modern systems programming techniques.

After Mozilla’s restructuring in 2020, the project was transferred to the Linux Foundation, where it has continued to evolve with renewed community energy. Fast-forward to today, and Servo is now available on crates.io — a milestone that fundamentally changes how Rust developers can interact with the project.

Why does this matter? Because before this, integrating Servo into your project meant cloning a massive repository, wrestling with complex build dependencies, and hoping nothing broke between commits. Now, you can add it as a dependency like any other crate. That’s a qualitative shift in accessibility.

[INTERNAL_LINK: Rust ecosystem overview]

The State of Browser Engines in Rust Applications

Before diving into the specifics of the Servo crate, it’s worth understanding the landscape that makes this announcement significant.

The Problem With Existing Solutions

Rust developers who need to render HTML and CSS in their applications have historically had a few options, none of them particularly elegant:

  • WebView wrappers (like Tauri): Use the operating system’s built-in browser engine (WebKit on macOS/iOS, WebView2 on Windows, WebKitGTK on Linux). This keeps binary sizes small but means inconsistent rendering behavior across platforms.
  • CEF (Chromium Embedded Framework): Powerful and consistent, but you’re shipping a significant portion of Chromium with your app. Expect binary sizes in the hundreds of megabytes.
  • Custom renderers: Some applications (game engines, terminal UIs) implement just enough HTML/CSS parsing for their needs. Fragile and expensive to maintain.
  • Building from Servo’s source directly: Technically possible, but the barrier to entry was high.

None of these options are universally great. WebView gives you inconsistency. CEF gives you bloat. Custom renderers give you maintenance nightmares.

Where Servo Fits

Servo aims to occupy a middle ground: a full-featured, spec-compliant web engine that you can embed in your application, with a Rust-native API, and without the overhead of bundling all of Chromium. Now that Servo is available on crates.io, that middle ground is actually reachable for working developers.

Getting Started: Adding Servo to Your Rust Project

Let’s get practical. Here’s what you need to know to actually use the Servo crate today.

Basic Installation

Adding Servo to your Cargo.toml is now as straightforward as any other dependency:

[dependencies]
servo = "0.0.1"  # Check crates.io for the latest version

You’ll want to check crates.io/crates/servo directly for the current version, as the project is iterating quickly.

System Prerequisites

Servo still has native system dependencies that Cargo can’t fully manage on its own. Before building, you’ll need:

  • GStreamer (for media playback support)
  • OpenGL or a compatible graphics backend
  • Platform-specific libraries depending on your target OS

The project’s documentation covers platform-specific setup in detail. On Linux, most dependencies are available through your package manager. On macOS and Windows, the setup is somewhat more involved, though the Servo team has been actively improving this story.

A Minimal Embedding Example

Here’s a simplified look at what embedding Servo can look like conceptually:

// Note: API is subject to change — always check the latest docs
use servo::Servo;
use servo::embedder_traits::EmbedderMsg;

fn main() {
    // Initialize Servo with your window/surface handle
    let mut servo = Servo::new(/* embedder config */);

    // Load a URL
    servo.load_url("https://example.com".parse().unwrap());

    // Run the event loop
    loop {
        servo.handle_events(vec![]);
        // Handle embedder messages, render frames, etc.
    }
}

This is deliberately simplified — the actual API involves event loops, surface management, and embedder trait implementations. The Servo embedding documentation and the servoshell example application (which ships with the project) are your best reference points for real implementation.

[INTERNAL_LINK: Rust GUI frameworks comparison]

What the Servo Crate Actually Gives You

It’s worth being specific about capabilities, because “browser engine” can mean a lot of things.

What’s Included

Feature Status
HTML5 parsing and rendering ✅ Supported
CSS layout (Flexbox, Grid) ✅ Actively developed
JavaScript (via SpiderMonkey) ✅ Supported
WebGL ✅ Supported
Media playback (video/audio) ✅ Via GStreamer
WebAssembly ✅ Supported
Accessibility tree 🔄 In progress
Full CSS3 compliance 🔄 Ongoing work
WebGPU 🔄 Experimental

What to Be Realistic About

Servo is not Chromium. There will be websites and web apps that don’t render perfectly, particularly those relying on browser-specific behaviors or very recent web APIs. For embedding use cases — rendering documentation, displaying UI built with HTML/CSS, running controlled web content — Servo is increasingly capable. For rendering arbitrary web content from the open internet, you’ll encounter rough edges.

The project has been transparent about this. The Servo team actively publishes compatibility progress, and the trajectory is clearly positive.

Real-World Use Cases for the Servo Crate

So who should actually be excited about this? Let’s be concrete.

Desktop Application UIs

If you’re building a desktop application in Rust and want to use HTML/CSS for your UI layer — without the electron-style overhead or the platform inconsistency of WebView — Servo is now a genuinely viable option to evaluate. Think of it as a lighter-weight alternative to what Tauri does, but with more control over the rendering engine itself.

Document and Report Rendering

Applications that need to render HTML documents — whether that’s a PDF-generation pipeline, an email client, or a documentation browser — can now embed Servo to handle that rendering in a consistent, spec-compliant way.

Embedded and Kiosk Systems

Servo’s architecture was designed with parallelism and memory efficiency in mind. For kiosk displays, automotive infotainment systems, or other embedded Linux environments where you want web-based UI without the weight of a full browser, Servo is worth serious consideration.

Game Engine UI Overlays

Several game engines and simulation environments use HTML/CSS for their UI layers. With Servo available on crates.io, Rust-based game engines (like those built with Bevy) could potentially integrate web-based UI directly.

Developer Tools and IDEs

Rich developer tools that need to render documentation, changelogs, or UI components described in HTML could benefit from a native Rust rendering engine rather than spinning up a separate browser process.

Comparing Your Options: Servo vs. Alternatives

Servo (crates.io) Tauri/WebView CEF Custom Renderer
Binary size impact Medium Small Very Large Small
Rendering consistency High Low (OS-dependent) High Varies
Rust-native API ✅ Yes Partial ❌ No ✅ Yes
JavaScript support ✅ Yes ✅ Yes ✅ Yes ❌ Usually No
Maintenance burden Low (crate) Low Medium High
Production readiness Maturing Mature Mature Varies
License MPL 2.0 MIT/Apache BSD N/A

The honest takeaway: if you need production-grade stability today for rendering arbitrary web content, Tauri or CEF are safer bets. If you’re building something new, have some tolerance for API evolution, and want a Rust-native solution with a bright future, Servo on crates.io is now worth serious evaluation.

The Bigger Picture: What This Means for the Rust Ecosystem

The availability of Servo on crates.io isn’t just a convenience improvement — it’s a signal.

Ecosystem Maturity

For a project as complex as a browser engine to publish on crates.io, the build system, dependency management, and public API surface have to reach a certain level of stability. The Servo team making this move indicates confidence that the project is ready for broader adoption and experimentation.

Competing With Electron’s Dominance

One of the most persistent criticisms of the modern app development landscape is the proliferation of Electron-based applications — apps that ship an entire Chromium instance to render what is essentially a website. The combination of Rust’s performance characteristics and Servo’s embedding-focused architecture represents a genuine alternative path. It won’t replace Electron overnight, but the building blocks are getting real.

Attracting Contributors

Publishing on crates.io dramatically lowers the barrier to experimentation, which means more developers will try Servo, find bugs, write fixes, and contribute back. This is how open source projects accelerate.

[INTERNAL_LINK: Contributing to Rust open source projects]

Practical Advice for Early Adopters

If you’re planning to start experimenting with the Servo crate, here’s what I’d recommend based on the current state of the project:

  1. Start with servoshell: Before writing your own embedder, run the reference shell application. It’ll help you understand how the embedding API is meant to be used.

  2. Pin your version carefully: The API is evolving. Use a specific version in your Cargo.toml and update deliberately, reviewing the changelog each time.

  3. Join the community: The Servo project is active on GitHub and has a Zulip chat. If you’re building something with the crate, engaging with the community will save you significant debugging time.

  4. Don’t use it for untrusted content yet: If your use case involves rendering arbitrary user-supplied HTML from the internet, be cautious. Security hardening for embedding use cases is ongoing.

  5. Contribute your findings: If you hit a bug or limitation, file an issue. The team is responsive, and early-adopter feedback directly shapes the API.

Frequently Asked Questions

Q: Is Servo production-ready now that it’s on crates.io?

Not universally. For controlled use cases — rendering your own HTML/CSS content, building application UIs, displaying documentation — Servo is increasingly capable and the crates.io publication reflects meaningful stability. For rendering arbitrary web content from the open internet, you’ll encounter compatibility gaps. Evaluate it against your specific requirements.

Q: How does Servo’s performance compare to Chromium or WebKit?

Servo was architecturally designed to leverage parallelism in ways that older engines like Blink (Chromium) and WebKit weren’t. In specific benchmarks, particularly around CSS layout, Servo can be competitive or faster. In overall real-world browsing performance, the comparison is more nuanced. For embedding use cases, Servo’s performance profile is generally favorable.

Q: Can I use the Servo crate in a commercial application?

Yes. Servo is licensed under the Mozilla Public License 2.0 (MPL 2.0), which is a file-level copyleft license. You can use it in commercial applications; you’re required to make available any modifications you make to MPL-licensed files themselves, but your application code remains your own. Consult a lawyer for your specific situation.

Q: Does the Servo crate work on all platforms?

Servo supports Linux, macOS, and Windows. Android support is in progress. The degree of polish varies by platform — Linux tends to be best-supported given the development environment of most contributors. Check the project’s current platform support matrix before committing to a target.

Q: What’s the difference between Servo and the WebRender crate?

WebRender is Servo’s GPU-accelerated rendering backend, which was actually adopted by Firefox as its production rendering engine. WebRender handles the final painting of pixels. Servo is the full browser engine stack — HTML parsing, CSS layout, JavaScript execution, and WebRender for the final render. If you just need GPU-accelerated 2D graphics, WebRender might be the more focused tool; if you need a full web rendering pipeline, Servo is what you want.

The Bottom Line

Servo is now available on crates.io, and that’s genuinely exciting news for the Rust ecosystem. It represents years of work reaching a new level of accessibility, and it opens up use cases that were previously impractical for most developers.

Is it ready to replace your production WebView setup today? Probably not for every use case. Is it worth experimenting with if you’re building a new Rust application that needs HTML rendering? Absolutely yes.

The best way to form your own opinion is to try it. Add the crate, run the examples, and see how it fits your use case. The Servo team has made that easier than ever.

Have you tried embedding Servo in a Rust project? Drop your experience in the comments — real-world usage reports help the whole community understand where the project stands today.