This Concurrency Bug Stayed Hidden for a Year

We had a background job that processed thousands of records in parallel.
Each batch ran concurrently, and we kept track of total successful and failed records.

Everything worked perfectly.

For almost a year.

Then one day, the totals started becoming… wrong.

No exceptions.
No crashes.
Just incorrect numbers.

The Setup

  • Records processed in chunks
  • Multiple chunks running concurrently
  • Shared counters tracking totals
  • Periodic database updates with progress

All standard parallel batch processing.

And yet — totals drifted.

The Symptom

  • Some runs showed fewer successful records than expected
  • Re-running the same data produced different counts
  • The issue appeared only in one environment

Classic signs of a concurrency issue.

But the tricky part?

We were already using thread-safe collections.

What Was Actually Happening

Imagine two workers updating the same counter:

Initial total = 10

Worker A reads total (10)
Worker B reads total (10)

Worker A increments → 11
Worker B increments → 11  (overwrites A)

Final total = 11  ❌ (should be 12)

No exception.
No crash.
Just a lost update.

This is a race condition.

The Buggy Code

A simplified version looked like this:

int totalSuccess = 0;

Parallel.ForEach(records, record =>
{
    if (Process(record))
    {
        totalSuccess++; // not atomic
    }
});

++ is not atomic. It performs:

  1. Read
  2. Increment
  3. Write

Multiple threads interleaving these steps leads to lost updates.

Why volatile Alone Doesn’t Fix It

A common attempt is to use volatile:

private static volatile int totalSuccess = 0;

This ensures visibility, but not atomicity.

Two threads can still:

  • read the same value
  • increment
  • overwrite each other

So volatile alone does not solve the race.

Why It Took a Year to Appear

Concurrency bugs are timing dependent.

The race condition existed from the beginning, but it didn’t surface consistently.
In fact, it only appeared in one environment.

Subtle runtime differences — thread scheduling, CPU contention, and execution timing — made overlapping updates more likely there, eventually exposing the issue.

No code changes were required.
Just different timing.

The Fix: Atomic Counters

We replaced non-atomic updates with atomic operations:

int totalSuccess = 0;

Parallel.ForEach(records, record =>
{
    if (Process(record))
    {
        Interlocked.Increment(ref totalSuccess);
    }
});

This guarantees increments are atomic.

The Real-World Fix: Snapshot-Based Progress Reporting

We also had periodic progress updates.
Multiple workers updated counters while one periodically persisted totals.

The correct pattern was:

var finished = Interlocked.Increment(ref completedChunks);

if (finished % maxConcurrency == 0)
{
    var successSnapshot = Volatile.Read(ref totalSuccess);
    var failureSnapshot = Volatile.Read(ref totalFailed);

    job.TotalSuccessfulRecords = successSnapshot;
    job.TotalFailedRecords = failureSnapshot;

    await UpdateJobProgress(job);
}

Why This Works

  • Interlocked → atomic updates
  • Volatile.Read → latest visible value
  • Snapshot → consistent progress reporting
  • Batched DB updates → reduced contention

This eliminates inconsistent totals.

Additional Improvement: Local Aggregation

To reduce contention further:

Parallel.ForEach(chunks, chunk =>
{
    int localSuccess = 0;
    int localFailure = 0;

    foreach (var record in chunk)
    {
        if (Process(record))
            localSuccess++;
        else
            localFailure++;
    }

    Interlocked.Add(ref totalSuccess, localSuccess);
    Interlocked.Add(ref totalFailed, localFailure);
});

This minimizes shared writes.

Lessons Learned

  • Thread-safe collections ≠ thread-safe logic
  • ++ is not atomic
  • volatile ensures visibility, not correctness
  • Use Interlocked for counters
  • Snapshot values using Volatile.Read
  • Reduce shared mutable state
  • Batch progress updates
  • Concurrency bugs are timing dependent

Takeaway

If you’re running parallel batch jobs and tracking totals:

  • Use atomic counters
  • Take snapshot reads for reporting
  • Avoid frequent shared writes

Otherwise, everything may look fine…

Until it doesn’t.

I Shrunk My Docker Image From 1.58GB to 186MB. Then I Had to Explain What I Actually Broke.

Most Docker tutorials end at the win.

“Look, smaller image! Ship it!” And then you’re left alone at 11pm wondering why your perfectly optimized container is crashing in production doing something it did fine before.

This article doesn’t do that. We’re going through both sides: how I got from 1.58GB to 186MB, every error I hit along the way, and the honest conversation about what Alpine actually takes away from you. Because the shrink is real. But so are the trade-offs.

First, What Even Is a Docker Image?

Your app works on your machine because your machine has Node installed, the right OS, the right dependencies. Someone else’s server has none of that. Docker fixes this by packaging your app together with everything it needs to run — the runtime, the OS slice, the dependencies — into a sealed portable unit called an image.

A Dockerfile is the recipe. docker build executes it and produces the image. That image can now run anywhere Docker is installed, identically.

The problem is most beginners write that recipe without thinking about what goes into the package. I learned this the hard way — and I want to save you the 11pm production surprise. So let’s do this properly: the win, the errors, and everything the win quietly broke.

The Fat Build

Here’s the Dockerfile I started with:

FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["node", "app.js"]

Clean. Readable. Standard tutorial stuff.

When you build this and check the image size, the number that comes back stops you cold. 1.58 gigabytes. For a Node.js app that runs a simple HTTP server.

Every layer bakes into that image permanently. RUN npm install alone contributes megabytes of frozen layer. COPY . . adds more on top. Every one of those is locked inside the image forever.

The problem is not the app. The app is tiny. The problem is node:18. That base image is built on Debian Linux — a full operating system — and ships with compilers, build tools, package managers, debugging utilities, and about 400MB of things you will never use in production. When your npm install runs on top of that, all of it bakes into the final image together.

You are shipping the construction site instead of the finished building.

The .dockerignore vs .gitignore Mistake

Before we go further , this caught me early and it will catch you too.

.dockerignore and .gitignore are completely separate files.

  • .dockerignore tells Docker what not to copy into the build context.
  • .gitignore tells Git what not to track.

I had a .dockerignore but no .gitignore. When I pushed to GitHub, my entire node_modules folder went with it — hundreds of files committed to the repo. I had to go back and clean the git history.

Always create both. They often contain the same entries but they serve different tools entirely. Get this right before you build anything else.

Enter Multi-Stage Builds

The fix is separating your build environment from your runtime environment.

  • Build environment needs everything: the full OS, npm, build tools, all of it.
  • Runtime environment needs almost nothing: just Node and your app files.

Multi-stage builds let you use both in one Dockerfile, but only ship the second one.

# Stage 1: builder (does the work, never ships)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .

# Stage 2: runtime (only this becomes your image)
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/app.js ./app.js
COPY --from=builder /app/package.json ./package.json
CMD ["node", "app.js"]

The COPY --from=builder line is the bridge. It reaches back into Stage 1 and pulls only what you specify. Everything else in Stage 1 — the full Debian OS, the compiler tools, the cache — gets discarded and never touches the final image.

Simple idea. But getting there cost me three separate errors.

Error 1: The Empty Dockerfile

ERROR: failed to build: failed to solve: the Dockerfile cannot be empty

I ran docker build before writing anything in the file. The file existed but was empty. Not a deep error — but worth including because it’s the kind of thing that makes you feel stupid for ten seconds before you realise it’s just a file issue.

Fix: write something in the file before you build it.

Error 2: The NUL Character Ambush

After the fat build succeeded I set up my .dockerignore using PowerShell’s echo command:

echo "node_modules" > .dockerignore
echo ".git" >> .dockerignore
echo "*.log" >> .dockerignore
echo ".env" >> .dockerignore

Built again. Got this:

<input>:1:1: invalid character NUL
<input>:1:3: invalid character NUL
<input>:1:5: invalid character NUL

Sixteen lines of it.

PowerShell’s echo writes files in UTF-16 LE with a BOM by default. Docker’s parser expects UTF-8. The invisible encoding header and the null bytes between every character made the entire file unreadable to Docker.

The build still finished because Docker warned and continued — but my .dockerignore was being completely ignored. node_modules was getting copied into the build context on every single build, silently, without telling me.

The fix — always do this on Windows:

"node_modules`n.git`n*.log`n.env" | Out-File -FilePath .dockerignore -Encoding utf8

Or create the file in VS Code and confirm it saves as UTF-8. Never trust PowerShell echo for config files that other tools will read.

Error 3: The builder Name Collision (The Sneaky One)

This is the one that will catch most beginners.

I wrote my multi-stage Dockerfile but forgot AS builder on my first FROM statement:

FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/app.js ./app.js
COPY --from=builder /app/package.json ./package.json
CMD ["node", "app.js"]

Built it. Got this:

ERROR: failed to build: failed to solve: builder: failed to resolve 
source metadata for docker.io/library/builder:latest: pull access 
denied, repository does not exist

Docker looked at --from=builder and thought I was referencing an external Docker Hub image called builder. It went to Docker Hub looking for library/builder:latest. That image does not exist.

--from=builder only works when builder is an alias defined with AS builder in an earlier FROM statement. Without it, Docker has nothing to reference locally and defaults to treating builder as an external image name.

The fix:

# AS builder here is not optional
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .

# Stage 2: no alias needed, this is the final image
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/app.js ./app.js
COPY --from=builder /app/package.json ./package.json
CMD ["node", "app.js"]

AS builder on the first FROM gives Stage 1 a name. --from=builder references that name. Without it, Docker goes looking on the internet for something that doesn’t exist.

The Result

Image Disk Usage Content Size
myapp:fat 1.58GB 397MB
myapp:slim 186MB 45.6MB

88% reduction. Same app.

The slim image history only contains COPY node_modules, COPY app.js, COPY package.json. That’s it. The entire Debian OS, the build tools, the npm cache — none of it made it through. COPY --from=builder is surgical. You get exactly what you name and nothing else.

Now The Part Most Articles Skip

The slim image runs fine for a basic Node app. But “the app is the same” is only true if your app doesn’t touch anything Alpine removed.

Both images produce the same output. Same server on port 3000. Good so far.

Now run this:

docker run --rm myapp:slim bash

Bash does not exist in Alpine. Alpine only ships sh. Any script in your app or CI pipeline that calls bash will crash. And the error message isn’t clean — it throws a full Node.js module-not-found stack trace because CMD ["node", "app.js"] is the entrypoint and Node tried to interpret bash as a script. That’s a deeply confusing error if you don’t know what you’re looking at.

Here’s what else is missing:

glibc: Alpine uses musl libc instead. This is the silent killer. Native npm packages like bcrypt, sharp, canvas, and sqlite3 are compiled against glibc. When you run them on Alpine they break — with no warning during build. The error surfaces at runtime in production when a user tries to do something.

npm: You didn’t copy it into Stage 2. You cannot run npm install inside a running slim container.

curl, wget, ps: Your standard debugging tools. When something goes wrong in a running Alpine container you have almost nothing to work with.

apt-get: Alpine uses apk instead, which has a much smaller package registry.

So When Is Alpine Actually Safe?

Alpine is safe when:

  • Your app is pure JavaScript with no native compiled dependencies
  • You have no bash scripts in your startup or CI process
  • You don’t need to exec into running containers to debug
  • Your node_modules are all JavaScript packages — run npm install and check for node-gyp in the output. That flags a native package.

Alpine is risky when:

  • You use bcrypt for password hashing
  • You use sharp for image processing
  • You use canvas, sqlite3, puppeteer, or anything that compiles C++ bindings
  • Your Dockerfile or startup scripts reference bash anywhere

If you need native packages but still want a smaller image, use node:18-slim instead of node:18-alpine. It’s Debian-based so it keeps glibc, but strips out the heavy development tools. You’ll land around 300–400MB — not as dramatic as Alpine, but safe for production.

The Decision Framework Before You Slim Any Image

1. Do any of my npm packages use node-gyp?

npm install

Check the output for gyp. If it appears, do not use Alpine.

2. Do any of my scripts call bash?

grep -r "#!/bin/bash" .

If yes, switch to sh or do not use Alpine.

3. Do I need to exec into running containers for debugging?

If yes, use node:18-slim instead.

4. Is CI pipeline speed a priority?

Smaller images pull faster in every environment. If you’re running 50 builds a day the difference between 1.58GB and 186MB compounds significantly.

The Full Working Dockerfile

# Stage 1: build environment (discarded after build)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .

# Stage 2: runtime environment (this is what ships)
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/app.js ./app.js
COPY --from=builder /app/package.json ./package.json
CMD ["node", "app.js"]

Build and verify:

# Build the slim image
docker build -f slim/Dockerfile -t myapp:slim .

# Compare sizes
docker images myapp

# Confirm the app runs
docker run --rm myapp:slim node app.js

# Confirm what is missing
docker run --rm myapp:slim bash

Full repo with both Dockerfiles, the app, and all screenshots:
github.com/Arbythecoder/docker-optimization

What I Actually Learned

Going from 1.58GB to 186MB felt like a win. It is a win — for the right app.

But the real skill isn’t knowing how to shrink an image. It’s knowing whether to shrink it, what you’re trading away, and how to verify nothing broke before it reaches production.

Most tutorials give you the happy path. Production gives you everything else.

This article is part of my Docker for Production ebook series. Ebook 4 covers the complete pre-deployment checklist for containerized Node.js apps — including the full audit framework before you slim any production image. Follow me on DEV.to, LinkedIn and X to get notified when it drops.

Screenshots for reference:

Why your AI agent gets dumber over time (and how to fix memory drift)

Last week, a coding agent in a test repo did something weird: it opened the right files, referenced the wrong API version, and confidently wrote code for a migration we had already rolled back.

Nothing was “broken” in the usual sense. The prompts were fine. The tools were available. The model was good.

The problem was memory drift.

If you’ve built anything with long-running agents, you’ve probably seen it too: the agent starts strong, then gradually retrieves stale facts, outdated decisions, or half-relevant chunks from old work. Over time, its “memory” turns into a confidence amplifier for bad context.

A lot of teams try to solve this with a bigger vector store. That helps… until it doesn’t.

The real issue: vector stores decay quietly

Vector stores are great for fuzzy retrieval. If your agent needs “something similar to this design doc” or “the auth code near this endpoint,” embeddings are useful.

But agent memory is not just similarity search.

It’s often:

  • what changed
  • what supersedes what
  • who approved a decision
  • which fact is still valid
  • what depends on what
  • what should never be forgotten

That’s where vector-only memory starts to decay.

A simple example

Suppose your agent stores these facts over time:

  • JWT auth is used for internal APIs
  • Moved to mTLS for service-to-service auth
  • JWT still used for browser sessions
  • Deprecated auth middleware in v3
  • Hotfix restored old middleware for admin routes

A vector store can retrieve “similar auth-related stuff,” but it won’t naturally answer:

  • which statement is the latest truth?
  • which fact overrides another?
  • which context applies only to admin routes?
  • which decision was temporary?

That’s not an embedding problem. That’s a relationship problem.

Knowledge graphs don’t replace vectors — they constrain them

The best pattern I’ve seen is:

  • vector store for recall
  • knowledge graph for truth maintenance

Think of it like this:

User query
   |
   v
[Vector Search] ---> finds possibly relevant notes/docs/chunks
   |
   v
[Knowledge Graph] ---> resolves relationships:
                      - supersedes
                      - depends_on
                      - approved_by
                      - valid_for
                      - expires_at
   |
   v
[LLM Context] ---> smaller, fresher, less contradictory

A knowledge graph gives your system structure around memory:

  • entities: services, APIs, users, incidents, tasks
  • edges: supersedes, blocked_by, owned_by, approved_by
  • timestamps: when a fact became true
  • scope: where that fact applies
  • confidence: whether it’s canonical or provisional

Instead of asking “what text looks similar?”, you can ask:

  • “What is the current auth method for internal APIs?”
  • “What decision replaced this one?”
  • “Which open task depends on this migration?”
  • “What facts are stale after last deploy?”

That’s how you stop memory from becoming a junk drawer.

A practical rule of thumb

Use a vector store when you need:

  • semantic search
  • fuzzy recall
  • document retrieval
  • broad context gathering

Use a knowledge graph when you need:

  • state over time
  • versioned truth
  • explicit dependencies
  • conflict resolution
  • auditable memory

If you only use vectors, your agent will eventually retrieve both the old answer and the new answer and act like they’re equally valid.

A tiny runnable example

Here’s a minimal Node example using a graph to resolve the “latest truth” for a fact.

npm install graphology
const Graph = require("graphology");

const graph = new Graph();

graph.addNode("auth_v1", { value: "JWT for internal APIs", ts: 1 });
graph.addNode("auth_v2", { value: "mTLS for internal APIs", ts: 2 });

graph.addDirectedEdge("auth_v2", "auth_v1", { type: "supersedes" });

function currentFact(nodes) {
  return nodes
    .filter((n) => graph.inDegree(n) === 0)
    .map((n) => graph.getNodeAttribute(n, "value"));
}

console.log(currentFact(["auth_v1", "auth_v2"]));
// => [ 'mTLS for internal APIs' ]

Obviously, real systems need more than this. But the core idea matters: memory should encode replacement, not just storage.

What this looks like in production

A useful pattern is:

  1. Store raw docs, chats, and artifacts in a vector index
  2. Extract durable facts into a graph
  3. Mark facts with:
    • source
    • timestamp
    • scope
    • confidence
    • supersession links
  4. Retrieve from both systems
  5. Let the graph filter or rank what the LLM actually sees

If you already have a policy engine like OPA in your stack, this is also a good place to enforce rules like:

  • only approved memories can be treated as canonical
  • expired decisions should not be retrieved
  • temporary incident workarounds should not leak into normal planning

That’s usually a better answer than trying to prompt-engineer your way out of stale context.

The trap nobody mentions

The biggest mistake isn’t “using vectors.”

It’s treating all memory as text.

Some memory is text.
Some memory is state.
Some memory is policy.
Some memory is provenance.

If you flatten all of that into embeddings, your agent can retrieve context — but it can’t reliably reason about whether that context is still true.

That’s where drift starts.

Try it yourself

If you’re building agents and want to pressure-test the surrounding security and tooling:

  • Want to check your MCP server? Try https://tools.authora.dev
  • Run npx @authora/agent-audit to scan your codebase
  • Add a verified badge to your agent: https://passport.authora.dev
  • Check out https://github.com/authora-dev/awesome-agent-security for more resources

My take

Vector stores are still the right tool for retrieval.

But if you want long-lived agents that don’t slowly poison themselves with stale context, you need something that models truth over time.

Usually that means adding a knowledge graph, or at least graph-like relationships, on top of your retrieval layer.

How are you handling agent memory today: pure RAG, graph-backed memory, or something else? Drop your approach below.

— Authora team

This post was created with AI assistance.

Web scraping for AI agents: How to give your agents web access

AI agents are only as useful as the information they can act on. A reasoning model with a January knowledge cutoff can’t tell you today’s pricing, yesterday’s news, or what your competitor just changed on their homepage. Giving your agent a way to reach out and pull fresh data from the web is how you fix that.

Web scraping is how you do that. This guide walks through how it works, what breaks, and how to wire it cleanly into an AI agent workflow.

Why agents need live web access

Most LLMs are trained once and frozen. They know a lot, but that knowledge has an expiry date. This creates a fundamental problem for agents doing anything time-sensitive:

  • A research agent summarizing a competitor’s product page will surface stale pricing.
  • A lead generation agent building contact lists from directories misses companies founded last month.
  • A news monitoring agent trained on data from six months ago isn’t monitoring anything.
  • A price tracking agent with no live feed is just guessing.

Equipping your agent with a tool call that fetches current HTML, parses it intelligently, and returns structured data is how you solve this.

What scraping looks like in an agent loop

In practice, scraping fits into an agent’s tool-use loop the same way a database query or API call does. The agent decides it needs information from a URL, calls the scraping tool, gets back structured data, and continues reasoning.

Agent needs: "What's the current price of product X?"
  → calls scrapeUrl(url, prompt)
  → gets back: { "name": "Product X", "price": 49.99, "currency": "USD" }
  → continues: "The price is $49.99, which is $5 lower than last week..."

This workflow is also represented in the diagram below:

What scraping looks like in an agent loop

The key design question is: what does scrapeUrl actually do under the hood?

Different scraping approaches

There are a few ways to implement web access for an agent. They sit on a spectrum of complexity vs. reliability.

Raw HTTP + HTML parsing

The simplest approach: fetch the URL with fetch, parse the HTML with a library like Cheerio, extract what you need with selectors.

import * as cheerio from "cheerio";

async function scrape(url) {
  const res = await fetch(url, { headers: { "User-Agent": "Mozilla/5.0" } });
  const html = await res.text();
  const $ = cheerio.load(html);
  return $("body").text();
}

The problem: Most modern websites don’t return meaningful HTML on the first HTTP request. They’re JavaScript-rendered. The above returns a shell. The content loads after JS executes. You’ll also get blocked quickly with no proxy rotation.

Headless browsers

Tools like Playwright and Puppeteer launch a real browser, wait for JS to execute, then let you extract content. More reliable for modern sites.

import { chromium } from "playwright";

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url);
await page.waitForLoadState("networkidle");
const content = await page.content();
await browser.close();

The problem: This is expensive to run at scale. Infrastructure, browser pools, proxy management, and CAPTCHA handling all become your problem. And sophisticated anti-bot systems will still block you based on browser fingerprinting.

Scraping APIs

The third option: delegate all of that to a purpose-built API. You send a URL and a description of what you want. The API handles browser automation, proxy rotation, CAPTCHA solving, and returns clean structured data.

For agents, this is almost always the right call. You get a simple async interface, reliable results, and you’re not managing headless browser infrastructure.

The real challenges (and why they matter for agents)

Before picking an approach, understand what actually breaks in production:

  • Anti-bot detection: IP rate limiting, CAPTCHA challenges, browser fingerprinting. If your agent scrapes the same site repeatedly, naive implementations get blocked fast.

  • JavaScript-rendered content: Most product pages, social feeds, and dashboards render content after the initial HTML loads. Raw HTTP fetches get empty shells.

  • Unstructured output: Raw HTML or even extracted text isn’t what your agent wants. Agents reason better over {"price": 49.99} than over a wall of text that contains the price somewhere.

  • Async workflows: Scraping takes time (seconds, not milliseconds). Your agent can’t block waiting for a result. You need job submission, polling, and async result handling baked in.

  • Scale: If your agent processes 100 leads at a time, you need batch processing. Running 100 sequential scrape calls is slow and fragile.

What agent-ready scraping looks like

Here’s what the ideal scraping tool looks like from an agent’s perspective:

  1. Natural language prompts: The agent describes what it wants, not how to get it. "Extract the job title, company, and salary range" rather than a CSS selector.
  2. Structured JSON output: Returns a typed object matching a schema the agent defines. No parsing, no regex, no string manipulation.
  3. Async with polling: Submit a job, get a job ID, poll for results. Non-blocking.
  4. Proxy and anti-bot handling built in: The agent doesn’t care about IP rotation. That’s infrastructure.
  5. Batch support: Submit 50 URLs at once, get 50 results back.

Let’s build this.

Practical Implementation

The following examples use Spidra, an API built specifically for this pattern: browser automation, proxy rotation, CAPTCHA solving, and AI-powered extraction in one endpoint. The concepts translate to any scraping API with similar capabilities.

Setup

Get an API key from app.spidra.io → Settings → API Keys.

API Key on Spidra dashboard

Base URL: https://api.spidra.io/api
Auth: x-api-key header on every request.

Example 1: Simple scrape tool for an agent

The pattern is always the same: submit a job, get a jobId, poll until complete.

const API_KEY = "your-api-key";
const BASE_URL = "https://api.spidra.io/api";
const HEADERS = { "x-api-key": API_KEY, "Content-Type": "application/json" };

async function scrape(url, prompt, schema, options = {}) {
  const payload = {
    urls: [{ url }],
    prompt,
    output: "json",
    useProxy: true,
    ...(schema && { schema }),
    ...options,
  };

  const res = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify(payload),
  });
  const { jobId } = await res.json();

  while (true) {
    const status = await fetch(`${BASE_URL}/scrape/${jobId}`, {
      headers: HEADERS,
    }).then((r) => r.json());

    if (status.status === "completed") return status.result.content;
    if (status.status === "failed") throw new Error(status.error);

    await new Promise((r) => setTimeout(r, 3000));
  }
}

Now your agent has a clean tool call:

const result = await scrape(
  "https://news.ycombinator.com",
  "List the top 5 stories with title, points, and comment count",
  {
    type: "object",
    required: ["stories"],
    properties: {
      stories: {
        type: "array",
        items: {
          type: "object",
          required: ["title", "points", "comments"],
          properties: {
            title: { type: "string" },
            points: { type: "number" },
            comments: { type: "number" },
            url: { type: ["string", "null"] },
          },
        },
      },
    },
  }
);

// result.stories → [{ title, points, comments, url }, ...]

The agent gets back a typed list it can iterate, filter, and reason over. No parsing.

Example 2: Structured output with JSON schema

The schema field is the most important feature for agent use. Instead of getting unpredictable text, you define the exact shape of the response and the API enforces it.

Here’s a job listing extractor:

const result = await scrape(
  "https://jobs.example.com/senior-engineer",
  "Extract all details about this job listing.",
  {
    type: "object",
    required: ["title", "company", "remote"],
    properties: {
      title: { type: "string" },
      company: { type: "string" },
      location: { type: ["string", "null"] },
      remote: { type: ["boolean", "null"] },
      salary_min: { type: ["number", "null"] },
      salary_max: { type: ["number", "null"] },
      employment_type: {
        type: ["string", "null"],
        enum: ["full_time", "part_time", "contract", null],
      },
      skills: {
        type: "array",
        items: { type: "string" },
      },
    },
  }
);

// Guaranteed shape: fields in `required` always present, nullable where marked
// {
//   title: "Senior Engineer",
//   company: "Acme Corp",
//   location: "Austin, TX",
//   remote: true,
//   salary_min: 140000,
//   salary_max: 180000,
//   employment_type: "full_time",
//   skills: ["TypeScript", "React", "AWS"]
// }

Two rules worth knowing:

  • Fields in required always appear, as null if the data isn’t found.
  • Optional fields are omitted entirely if unavailable.
  • Mark anything that might be missing as ["type", "null"] to avoid surprises.

Example 3: Crawling an entire site

Sometimes your agent doesn’t know which pages to scrape. It needs to discover them. The crawl endpoint handles this: give it a base URL, tell it which pages to find, and what to extract from each.

async function crawlSite(baseUrl, crawlInstruction, extractInstruction, maxPages = 20) {
  const res = await fetch(`${BASE_URL}/crawl`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({
      baseUrl,
      crawlInstruction,
      transformInstruction: extractInstruction,
      maxPages,
      useProxy: true,
    }),
  });
  const { jobId } = await res.json();

  while (true) {
    const data = await fetch(`${BASE_URL}/crawl/${jobId}`, {
      headers: HEADERS,
    }).then((r) => r.json());

    if (data.status === "completed") return data.result;
    if (data.status === "failed") throw new Error("Crawl failed");

    console.log(data.progress?.message ?? "crawling...");
    await new Promise((r) => setTimeout(r, 5000));
  }
}

// Example: crawl a competitor's blog for content strategy research
const posts = await crawlSite(
  "https://competitor.com/blog",
  "Find all blog post pages published in the last 6 months",
  "Extract the title, author, publish date, and a one-sentence summary",
  30
);

// posts → [{ url, title, data: { title, author, publish_date, summary } }, ...]

Example 4: Geo-targeted scraping

Some sites show different content based on the visitor’s country: prices in local currency, region-specific inventory, geo-restricted offers. Use proxyCountry to scrape from a specific location.

// Scrape a German Amazon page with a German IP
const result = await scrape(
  "https://www.amazon.de/gp/bestsellers/electronics",
  "List the top 10 bestselling electronics with name and price in EUR",
  {
    type: "object",
    required: ["products"],
    properties: {
      products: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            price_eur: { type: ["number", "null"] },
            rank: { type: "number" },
          },
        },
      },
    },
  },
  { proxyCountry: "de" }
);

// Spidra supports 50+ country codes: us, gb, de, fr, jp, au, ca, br, in, ...
// Use "eu" for rotating EU proxies, "global" for worldwide rotation

Example 5: Authenticated scraping

For pages behind a login: dashboards, account pages, paywalled content. Pass session cookies directly.

// Export cookies from your browser DevTools (Application → Cookies)
// or grab them with document.cookie from the console

const result = await scrape(
  "https://app.example.com/dashboard/reports",
  "Extract monthly revenue, active users, and conversion rate for the last 3 months",
  {
    type: "object",
    required: ["months"],
    properties: {
      months: {
        type: "array",
        items: {
          type: "object",
          properties: {
            month: { type: "string" },
            revenue: { type: "number" },
            active_users: { type: "number" },
            conversion_rate: { type: "number" },
          },
        },
      },
    },
  },
  { cookies: "session=abc123; auth_token=xyz789; csrf=def456" }
);

Wiring it into an agent (full example)

Here’s a minimal but complete research agent using the Vercel AI SDK with scrapeUrl as a tool. The SDK handles the agentic loop: the model decides when to call the tool, the tool fetches live data, and the model reasons over the result.

import { generateText, tool } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const result = await generateText({
  model: anthropic("claude-opus-4-6"),
  maxSteps: 5,
  tools: {
    scrapeUrl: tool({
      description:
        "Fetch and extract structured data from a URL. Use this when you need current information from a website.",
      parameters: z.object({
        url: z.string().describe("The URL to scrape"),
        prompt: z
          .string()
          .describe("What to extract from the page, in plain English"),
      }),
      execute: async ({ url, prompt }) => {
        const data = await scrape(url, prompt);
        return JSON.stringify(data);
      },
    }),
  },
  prompt:
    "What are the top 3 trending repositories on GitHub today, and what do they do?",
});

console.log(result.text);

maxSteps lets the model make multiple tool calls in sequence if it needs to follow links, cross-reference sources, or refine its query. The scraping layer handles everything else. The model just decides what to fetch and what to ask for.

Practical agent use cases

To make this concrete, here are a few agent patterns that become viable with web access:

  • Competitive intelligence agent: Crawls competitor sites weekly, diffs pricing and feature changes, surfaces meaningful deltas to a Slack channel.

  • Lead enrichment agent: Given a list of company names, scrapes their websites, LinkedIn pages, and job boards to build structured profiles: company size, tech stack, recent hires, open roles.

  • Research agent: Given a topic, searches the web, scrapes the top results, synthesizes findings into a structured report with citations.

  • Price monitoring agent: Tracks SKUs across multiple retailers, alerts when prices drop below a threshold or when a product goes out of stock.

  • News digest agent: Crawls a configured list of sources each morning, extracts headlines and summaries, sends a curated briefing tailored to the user’s interests.

Each of these follows the same fundamental pattern: the agent knows what it wants, the scraping layer fetches and structures the data, and the agent reasons over clean output rather than raw HTML.

Wrapping up

Web access expands the category of problems an AI agent can tackle. A scraping tool lets it monitor competitor pages, research live topics, track prices, and respond to things happening right now. Without it, your agent is limited to reasoning over whatever it already knows.

The implementation is straightforward: a submit-and-poll pattern, a JSON schema for the output shape, and a proxy-enabled API to handle the infrastructure. The agent doesn’t need to know how any of that works. It just needs a reliable tool call that returns structured data. That’s the interface worth building toward.

Thanks for reading!

Build a Social Media Event Bus: React to Posts, Comments, and Follows in Real-Time

Social media platforms don’t give you webhooks. Instagram won’t ping your server when someone comments. TikTok won’t notify you when a creator posts.

So you build your own.

I built an event bus that polls social media APIs and converts changes into events. New post? Event. New comment? Event. Follower count changed by more than 5%? Event. Then any downstream system can subscribe — Discord bots, email senders, dashboards, CRMs.

It turned 10 separate “check social media” scripts into one system.

Architecture

Poller (cron jobs)
  │
  ├── Check profiles every 30 minutes
  ├── Check posts every 15 minutes
  ├── Check comments every hour
  │
  ↓ Detect changes (diff against last known state)
  │
Event Bus (in-process EventEmitter or Redis Pub/Sub)
  │
  ├── → Discord notifier
  ├── → Email sender
  ├── → Database logger
  ├── → Slack alerter
  └── → Webhook forwarder (POST to any URL)

The pollers detect changes. The event bus routes them. The handlers do whatever you want. Completely decoupled.

The Stack

  • Node.js – runtime
  • SociaVault API – data source
  • EventEmitter (built-in) – event bus for single-process; Redis Pub/Sub for multi-process
  • better-sqlite3 – state tracking
  • node-cron – polling schedule

Setup

mkdir social-event-bus && cd social-event-bus
npm init -y
npm install axios better-sqlite3 node-cron dotenv

Step 1: The State Store

To detect changes, you need to know what things looked like last time you checked.

// state.js
const Database = require('better-sqlite3');
const db = new Database('./state.db');

db.exec(`
  CREATE TABLE IF NOT EXISTS known_state (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
  );
`);

const getState = db.prepare('SELECT value FROM known_state WHERE key = ?');
const setState = db.prepare(`
  INSERT INTO known_state (key, value, updated_at) VALUES (?, ?, CURRENT_TIMESTAMP)
  ON CONFLICT(key) DO UPDATE SET value = excluded.value, updated_at = CURRENT_TIMESTAMP
`);

module.exports = {
  get: (key) => {
    const row = getState.get(key);
    return row ? JSON.parse(row.value) : null;
  },
  set: (key, value) => {
    setState.run(key, JSON.stringify(value));
  },
};

Step 2: The Event Bus

// bus.js
const { EventEmitter } = require('events');

class SocialEventBus extends EventEmitter {
  emit(eventType, payload) {
    const event = {
      type: eventType,
      timestamp: new Date().toISOString(),
      ...payload,
    };

    // Log every event
    console.log(`[EVENT] ${eventType}${payload.platform}/@${payload.username || 'unknown'}`);

    // Emit both the specific event and a wildcard
    super.emit(eventType, event);
    super.emit('*', event);

    return true;
  }
}

// Singleton
const bus = new SocialEventBus();
module.exports = bus;

Event types we’ll generate:

Event Trigger
new_post Creator published a new post/video
post_milestone A post crossed a view/like threshold
follower_change Follower count changed significantly (±5%)
new_comment New comment on a tracked post
engagement_spike Post engagement rate is 3x+ above creator’s average
profile_updated Bio, name, or profile pic changed

Step 3: The Pollers

Each poller fetches current data, diffs against stored state, and emits events for any changes.

// pollers/profile-poller.js
const axios = require('axios');
const state = require('../state');
const bus = require('../bus');

const api = axios.create({
  baseURL: 'https://api.sociavault.com/v1/scrape',
  headers: { 'x-api-key': process.env.SOCIAVAULT_API_KEY },
});

async function pollProfile(platform, username) {
  const endpoint = platform === 'instagram'
    ? `/instagram/profile?username=${username}`
    : `/tiktok/profile?username=${username}`;

  try {
    const { data: res } = await api.get(endpoint);
    const profile = res.data || res;

    const key = `profile:${platform}:${username}`;
    const previous = state.get(key);

    const current = {
      followers: profile.followersCount || profile.followerCount || 0,
      following: profile.followingCount || 0,
      posts: profile.postsCount || profile.videoCount || 0,
      bio: profile.bio || profile.signature || '',
      displayName: profile.fullName || profile.nickname || '',
    };

    if (previous) {
      // Check for follower changes (±5% or ±1000)
      const followerDelta = current.followers - previous.followers;
      const followerPercent = previous.followers > 0
        ? Math.abs(followerDelta / previous.followers) * 100
        : 0;

      if (followerPercent >= 5 || Math.abs(followerDelta) >= 1000) {
        bus.emit('follower_change', {
          platform,
          username,
          previous: previous.followers,
          current: current.followers,
          delta: followerDelta,
          percentChange: parseFloat(followerPercent.toFixed(1)),
        });
      }

      // Check for new posts
      if (current.posts > previous.posts) {
        bus.emit('new_post', {
          platform,
          username,
          previousCount: previous.posts,
          currentCount: current.posts,
          newPosts: current.posts - previous.posts,
        });
      }

      // Check for bio changes
      if (current.bio !== previous.bio) {
        bus.emit('profile_updated', {
          platform,
          username,
          field: 'bio',
          old: previous.bio,
          new: current.bio,
        });
      }
    }

    state.set(key, current);
  } catch (err) {
    console.error(`Poll failed for ${platform}/@${username}: ${err.message}`);
  }
}

module.exports = { pollProfile };
// pollers/post-poller.js
const axios = require('axios');
const state = require('../state');
const bus = require('../bus');

const api = axios.create({
  baseURL: 'https://api.sociavault.com/v1/scrape',
  headers: { 'x-api-key': process.env.SOCIAVAULT_API_KEY },
});

async function pollPosts(platform, username) {
  const endpoint = platform === 'instagram'
    ? `/instagram/posts?username=${username}&limit=5`
    : `/tiktok/profile-videos?username=${username}&limit=5`;

  try {
    const { data: res } = await api.get(endpoint);
    const posts = res.data || res.posts || [];

    for (const post of posts) {
      const postId = post.id || post.shortcode || post.videoId;
      if (!postId) continue;

      const key = `post:${platform}:${postId}`;
      const previous = state.get(key);

      const current = {
        likes: post.likesCount || post.diggCount || 0,
        comments: post.commentsCount || post.commentCount || 0,
        views: post.viewCount || post.playCount || null,
        shares: post.shareCount || null,
      };

      if (previous) {
        // Check for engagement spike
        const likeGrowth = previous.likes > 0
          ? current.likes / previous.likes
          : 0;

        if (likeGrowth >= 3) {
          bus.emit('engagement_spike', {
            platform,
            username,
            postId,
            metric: 'likes',
            previous: previous.likes,
            current: current.likes,
            multiplier: parseFloat(likeGrowth.toFixed(1)),
          });
        }

        // Check for view milestones (10K, 100K, 1M)
        const milestones = [10000, 100000, 1000000, 10000000];
        if (current.views) {
          for (const milestone of milestones) {
            if (previous.views < milestone && current.views >= milestone) {
              bus.emit('post_milestone', {
                platform,
                username,
                postId,
                milestone,
                currentViews: current.views,
              });
            }
          }
        }
      }

      state.set(key, current);
    }
  } catch (err) {
    console.error(`Post poll failed for ${platform}/@${username}: ${err.message}`);
  }
}

module.exports = { pollPosts };

Step 4: The Handlers

This is where you plug in whatever actions you want:

// handlers/discord.js
const axios = require('axios');
const bus = require('../bus');

const DISCORD_WEBHOOK = process.env.DISCORD_WEBHOOK_URL;

bus.on('new_post', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  await axios.post(DISCORD_WEBHOOK, {
    content: `🆕 **@${event.username}** posted ${event.newPosts} new ${event.newPosts === 1 ? 'post' : 'posts'} on ${event.platform}!`,
  });
});

bus.on('engagement_spike', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  await axios.post(DISCORD_WEBHOOK, {
    content: `🔥 **Engagement spike!** @${event.username}'s post is getting ${event.multiplier}x normal likes on ${event.platform}`,
  });
});

bus.on('follower_change', async (event) => {
  if (!DISCORD_WEBHOOK) return;

  const direction = event.delta > 0 ? '📈' : '📉';
  const sign = event.delta > 0 ? '+' : '';
  await axios.post(DISCORD_WEBHOOK, {
    content: `${direction} **@${event.username}** ${sign}${event.delta.toLocaleString()} followers (${event.percentChange}%) on ${event.platform}`,
  });
});
// handlers/webhook-forwarder.js
const axios = require('axios');
const bus = require('../bus');

// Forward all events to an external URL (your own API, Zapier, n8n, etc.)
const WEBHOOK_URL = process.env.FORWARD_WEBHOOK_URL;

bus.on('*', async (event) => {
  if (!WEBHOOK_URL) return;

  try {
    await axios.post(WEBHOOK_URL, event, {
      headers: { 'Content-Type': 'application/json' },
      timeout: 5000,
    });
  } catch (err) {
    console.error(`Webhook forward failed: ${err.message}`);
  }
});

Step 5: Main Entry Point

// index.js
require('dotenv').config();
const cron = require('node-cron');
const { pollProfile } = require('./pollers/profile-poller');
const { pollPosts } = require('./pollers/post-poller');

// Load handlers (they self-register on the bus)
require('./handlers/discord');
require('./handlers/webhook-forwarder');

// Accounts to monitor
const WATCHED = [
  { platform: 'instagram', username: 'competitor_1' },
  { platform: 'instagram', username: 'competitor_2' },
  { platform: 'tiktok', username: 'competitor_3' },
  { platform: 'tiktok', username: 'your_own_account' },
];

async function runProfilePolls() {
  console.log(`[${new Date().toISOString()}] Polling profiles...`);
  for (const account of WATCHED) {
    await pollProfile(account.platform, account.username);
    await new Promise(r => setTimeout(r, 500));
  }
}

async function runPostPolls() {
  console.log(`[${new Date().toISOString()}] Polling posts...`);
  for (const account of WATCHED) {
    await pollPosts(account.platform, account.username);
    await new Promise(r => setTimeout(r, 500));
  }
}

// Initial run
runProfilePolls();
runPostPolls();

// Schedule
cron.schedule('*/30 * * * *', runProfilePolls);  // Profiles every 30 min
cron.schedule('*/15 * * * *', runPostPolls);      // Posts every 15 min

console.log(`Social event bus started. Watching ${WATCHED.length} accounts.`);
console.log('Profile polls: every 30 minutes');
console.log('Post polls: every 15 minutes');

Why This Pattern?

Because polling scripts always start simple and end up as spaghetti. You start with one script that checks competitors and sends a Discord message. Then your boss wants Slack too. Then email. Then someone wants to log it to a spreadsheet. Then you need to check comments too, not just posts.

The event bus pattern means:

  • Adding a new data source = write one poller function
  • Adding a new action = write one handler function
  • They don’t know about each other — the poller doesn’t care if Discord or Slack or email is listening

I’ve run this pattern for 6 months. Added 4 handlers and 2 pollers without touching existing code once.

Scaling Up

When you outgrow a single Node.js process:

  1. Replace EventEmitter with Redis Pub/Sub — pollers publish, handlers subscribe, can run on different machines
  2. Move pollers to separate workers — one per platform
  3. Add a dead letter queue for failed handler deliveries
  4. Add a simple web UI to see recent events (Express + SSE)

But honestly, a single Node process on a $5 VPS handles 50+ accounts with room to spare.

Read the Full Guide

Build a Social Media Event Bus → SociaVault Blog

Turn social media data into real-time events with SociaVault — one API for TikTok, Instagram, YouTube, and 10+ platforms. Profiles, posts, comments, followers — all endpoints, one key.

Discussion

What’s your approach to “real-time” social media monitoring when the platforms don’t offer webhooks? Poll and diff like this, or a different strategy entirely?

javascript #nodejs #architecture #webdev #api

Why Single-Pass AI Test Generation Produces Garbage

After 9 years of writing test cases manually, I built an AI tool that generates them from User Stories. The first version used a single API call. The output looked reasonable until I tried to automate it.

“Verify the system works correctly.” What does that mean in Playwright?

“Enter valid data and submit.” What data? Which field? What’s the expected state after submit?

Single-pass AI treats test case writing like creative writing. But test cases are engineering artifacts. They need specific values, verifiable assertions, and steps an automation engineer can translate to code without asking questions.

So I rebuilt the pipeline with three passes. The quality jumped from 4-5/10 to 8-9/10 consistently. Here’s what I learned.

The single-pass problem

Give any LLM a User Story and ask for test cases. You get a reasonable-looking list. But look at what’s actually there:

Vague assertions — “Verify the system displays correct results.” What results? Where? How do I assert that?

Missing coverage — 8 acceptance criteria in the story, 3 test cases in the output. Five requirements untested.

No priority differentiation — every test case is Priority 1. When the build breaks and you have 10 minutes, which ones do you run?

Placeholder data — “Enter a valid email.” My automation script needs user@example.com, not a description of what to enter.

Merged scenarios — three distinct AC collapsed into one test. When it fails, which requirement is broken?

This isn’t a prompt engineering problem. I spent weeks tweaking prompts. The real issue is structural: one pass doesn’t have enough context to generate AND review simultaneously.

Three passes: Worker, Judge, Optimizer

Here’s what CasePilot does instead.

Pass 1 — Worker

The Worker generates initial test cases from full context:

  • User Story title, description, acceptance criteria
  • Discussion comments (filtered: human only, no bot noise)
  • Project Knowledge (tech stack, business rules, UI patterns)
  • Wiki/Confluence pages linked to the project
  • Parent Epic context (if the story is part of a larger feature)
  • Existing test cases (to avoid generating duplicates)

The Worker prompt is instructed to think like a mid-level QA automation engineer. Not a writer. Each acceptance criterion gets its own test. Test data uses concrete values, not placeholders.

The Worker also applies ISTQB test design techniques directly in the prompt:

  • Boundary Value Analysis — min, min+1, max-1, max for every numeric field
  • Equivalence Partitioning — valid class, invalid class, edge class
  • Decision Table Testing — combinations of conditions for complex logic
  • State Transition Testing — valid and invalid workflow transitions

Pass 2 — Judge

The Judge receives the Worker’s output plus the original User Story. It reviews like a QA Lead reviewing a pull request:

  • Can each test be translated directly into a test method?
  • Are assertions programmatically verifiable?
  • Are coverage gaps filled?
  • Are there duplicate or overlapping tests?

The Judge rewrites vague tests, adds missing edge cases, removes unnecessary ones, and scores the overall quality 1-10.

Real example: Worker generates 11 test cases for a registration form. Judge consolidates three email-validation tests into one parameterized test, removes a redundant “form displays correctly” check, adds a missing duplicate-email test. Result: 7 tests, quality score 9/10.

Pass 3 — Optimizer

For sets of 3+ test cases, the Optimizer analyzes the full suite:

  • Duplicate steps — “Navigate to login page” appears in 6 tests. Extract to shared precondition.
  • Overlapping coverage — Test 3 and Test 7 both verify the same error message. Merge or differentiate.
  • Suggested groups — Tests 1, 2, 5 share the same setup. Group them under “Authenticated User” precondition.

The Optimizer doesn’t change the tests. It gives you insights on how to structure your test suite when you automate.

What this looks like in practice

A User Story about applying discount codes at checkout. 8 acceptance criteria: valid percentage coupon, invalid coupon, expired coupon, empty cart, multiple coupons, coupon removal, minimum order amount, case-insensitive codes.

Single-pass output:
3 generic test cases, all Priority 1, 1-2 steps each. “Apply a valid coupon and verify discount.” No test data. No edge cases.

Three-pass output:
8 specific test cases. Mixed P1/P2/P3. Each has 3-5 steps with concrete data:

Title: [Checkout] should reject expired coupon code with clear error message
Category: negative
Priority: 2
Preconditions:
  - User is logged in with items in cart (total: $150.00)
  - Coupon "SUMMER2024" exists but expired on 2024-12-31
Steps:
  1. Navigate to checkout page
     Expected: Cart shows $150.00 total
  2. Enter "SUMMER2024" in coupon field and click Apply
     Expected: Error message "This coupon has expired" displayed
     Test Data: coupon = "SUMMER2024"
  3. Verify cart total remains $150.00
     Expected: No discount applied, total unchanged

An automation engineer reads this and starts writing code. No questions needed.

Five things I learned building this

1. Token budget matters more than prompt engineering.

I spent weeks tweaking prompts. The real breakthrough was increasing max output tokens from 4,096 to 8,192. The AI was literally running out of space to finish generating test cases. It would produce 3 good tests and then stop because the response was truncated. Not a quality problem. A capacity problem.

2. The model follows examples, not instructions.

“Generate at least one test per acceptance criterion” — ignored.
“Each test must have 3-5 steps with specific expected results” — partially followed.

Adding a concrete JSON example in the system prompt with 3 steps, specific assertions, real test data, and a [Feature Area] prefix fixed everything instantly. The AI pattern-matches on examples far more reliably than parsing natural language instructions.

3. Post-processing catches what prompts can’t enforce.

The AI won’t always:

  • Add [Feature Area] prefixes to titles
  • Distribute tests across positive/negative/edge categories
  • Include all ISTQB technique labels

Code-based post-processing handles these reliably. Trust AI for content, trust code for formatting. My pipeline has a postProcess step that enforces category distribution, adds feature area tags, scores flakiness risk, and flags shallow tests (fewer than 3 steps).

4. The Judge pass pays for itself.

Three API calls cost ~3x more than one. But the quality difference means users generate once instead of regenerating three times. Net token cost is actually lower. And the Judge catches real issues: a Worker test that says “Verify the page loads” gets rewritten to “Verify the checkout page displays cart items with prices, quantities, and subtotal matching the cart state.”

5. Speed vs quality is a false tradeoff.

The three-pass pipeline takes 30-60 seconds on GPT-5.4. Users are fine waiting one minute for test cases they can actually automate. They are not fine getting instant results they have to rewrite manually.

I added a three-phase progress bar showing Worker, Judge, Optimizer passes so users see progress instead of staring at a spinner. Perception of speed matters more than actual speed.

Beyond test cases

The same two-pass pattern (Worker + Judge) powers three tools now:

  • CasePilot — test case generation from User Stories
  • BugPilot — structured bug reports from vague descriptions (repro steps, severity, root cause, impact radius)
  • StoryPilot — complete User Story enrichment from a title (description, AC, priority, story points, risks, DoD)

The pattern works because review is fundamentally different from generation. The Worker creates. The Judge evaluates against the source material. Two different cognitive tasks that don’t combine well in a single prompt.

Try it

CasePilot is on the Azure DevOps Marketplace and coming to Jira. Free tier: 20 test cases/month, no credit card.

If you want to use the flakiness prediction and boundary value generation in your own test framework, I open-sourced those as a standalone npm package: npm install @iklab/testkit. Zero dependencies, works with Jest, Vitest, Playwright, anything.

I’m interested in how other people handle AI output quality for structured data. The three-pass approach works for test cases. Does it generalize to other domains where AI output needs to be precise and actionable? Let me know in the comments.

Ihor Kosheliev — Senior QA Automation Engineer. Building AI tools for QA at iklab.dev.

Building VoiceAgent: From Speech to Safe Action

Introduction

Voice interfaces feel natural to humans, but systems require structure, validation, and control.

VoiceAgent was built to bridge that gap — a system that takes voice input, understands intent, and executes actions safely.

This article focuses on the architecture, design choices, and challenges behind building the system.

System Architecture

The system follows a structured pipeline:

Voice → Text → Intent → Validation → Approval → Action

Each stage plays a critical role in ensuring both functionality and safety.

1. Speech-to-Text (Whisper)

For transcription, I used a local Whisper model.

Why Whisper?

  • High accuracy for speech recognition
  • Works offline (no API dependency)
  • No cost involved

Key Consideration

Handling audio input required:

  • Converting audio to float32 format
  • Normalizing amplitude
  • Resampling to 16 kHz for consistent input

2. Intent Detection (Groq + LLM)

Once text is generated, it is passed to a language model via Groq.

Why Groq?

  • Fast inference speed
  • Free tier available
  • Reliable for structured prompting

Approach

Instead of free-form output, I enforced structured JSON responses:

{
  "intent": "...",
  "params": {...},
  "reasoning": "..."
}

This ensured:

  • Predictability
  • Easier parsing
  • Better control over execution

3. Validation Layer

Before executing any action, the system performs strict validation:

  • Filename sanitization
  • Allowed file extensions only
  • File size limits
  • Prevention of overwriting existing files

This layer ensures that the system remains safe and controlled.

4. Human-in-the-Loop

For file-related actions, execution is not automatic.

The system pauses and asks for user confirmation.

This prevents unintended or harmful actions and adds an extra safety layer.

5. Execution Engine

Once approved, the system executes the action:

  • File creation
  • Code writing
  • Text responses

All operations are restricted to a local output/ directory.

Challenges Faced

1. Audio Handling

Handling both microphone input and file uploads required a unified processing pipeline. Different formats and sampling rates had to be normalized.

2. Transcription Noise

Speech models can produce unexpected outputs when audio is unclear. This was addressed using normalization and controlled inference settings.

3. Safe Execution

Allowing an AI system to create files introduces risk. The solution was a combination of:

  • Validation
  • Restricted directories
  • User confirmation

4. Structured LLM Output

Ensuring consistent JSON output from the model required careful prompt design and fallback handling.

Key Design Decisions

  • Use local Whisper to avoid API costs and enable offline capability
  • Use Groq for fast and efficient inference
  • Enforce structured JSON output for reliability
  • Add human confirmation for safety
  • Restrict execution to a sandboxed directory

Conclusion

VoiceAgent is not just about converting speech to text.

It is about building a system that:

  • Understands
  • Validates
  • Executes

— all while keeping the user in control.

This project highlights that in AI systems, safety and structure are just as important as intelligence.

Links

GitHub: https://github.com/Suraj308/VoiceAgent
Demo Video: https://youtu.be/gGnH3v7BVdQ

Servo Now on crates.io: What Rust Devs Need to Know

Servo Now on crates.io: What Rust Devs Need to Know

Meta Description: Servo is now available on crates.io, making the embeddable browser engine accessible to Rust developers. Here’s what it means, how to use it, and why it matters.

TL;DR: Servo, the experimental browser engine originally developed by Mozilla and now maintained by the Linux Foundation, is now available as a crate on crates.io. This means Rust developers can embed a real, modern web rendering engine directly into their applications with a single dependency. It’s a significant milestone for the Rust ecosystem and for anyone building apps that need HTML/CSS rendering without shipping a full browser.

Key Takeaways

  • Servo is now available on crates.io, making it trivially easy to add browser-engine capabilities to any Rust project
  • The crate enables embedding HTML, CSS, and JavaScript rendering directly into desktop and embedded applications
  • This is a major step toward Servo becoming a practical, production-ready alternative to WebView-based solutions
  • Early adopters should expect some API instability — this is still maturing software
  • The move signals growing confidence from the Servo project team and the broader Rust community in the engine’s stability

What Is Servo, and Why Does This Matter?

If you’ve been following the Rust ecosystem for any length of time, you’ve probably heard of Servo. Originally born inside Mozilla Research around 2012, Servo was an ambitious attempt to build a next-generation browser engine from scratch — one that could take full advantage of parallelism, memory safety, and modern systems programming techniques.

After Mozilla’s restructuring in 2020, the project was transferred to the Linux Foundation, where it has continued to evolve with renewed community energy. Fast-forward to today, and Servo is now available on crates.io — a milestone that fundamentally changes how Rust developers can interact with the project.

Why does this matter? Because before this, integrating Servo into your project meant cloning a massive repository, wrestling with complex build dependencies, and hoping nothing broke between commits. Now, you can add it as a dependency like any other crate. That’s a qualitative shift in accessibility.

[INTERNAL_LINK: Rust ecosystem overview]

The State of Browser Engines in Rust Applications

Before diving into the specifics of the Servo crate, it’s worth understanding the landscape that makes this announcement significant.

The Problem With Existing Solutions

Rust developers who need to render HTML and CSS in their applications have historically had a few options, none of them particularly elegant:

  • WebView wrappers (like Tauri): Use the operating system’s built-in browser engine (WebKit on macOS/iOS, WebView2 on Windows, WebKitGTK on Linux). This keeps binary sizes small but means inconsistent rendering behavior across platforms.
  • CEF (Chromium Embedded Framework): Powerful and consistent, but you’re shipping a significant portion of Chromium with your app. Expect binary sizes in the hundreds of megabytes.
  • Custom renderers: Some applications (game engines, terminal UIs) implement just enough HTML/CSS parsing for their needs. Fragile and expensive to maintain.
  • Building from Servo’s source directly: Technically possible, but the barrier to entry was high.

None of these options are universally great. WebView gives you inconsistency. CEF gives you bloat. Custom renderers give you maintenance nightmares.

Where Servo Fits

Servo aims to occupy a middle ground: a full-featured, spec-compliant web engine that you can embed in your application, with a Rust-native API, and without the overhead of bundling all of Chromium. Now that Servo is available on crates.io, that middle ground is actually reachable for working developers.

Getting Started: Adding Servo to Your Rust Project

Let’s get practical. Here’s what you need to know to actually use the Servo crate today.

Basic Installation

Adding Servo to your Cargo.toml is now as straightforward as any other dependency:

[dependencies]
servo = "0.0.1"  # Check crates.io for the latest version

You’ll want to check crates.io/crates/servo directly for the current version, as the project is iterating quickly.

System Prerequisites

Servo still has native system dependencies that Cargo can’t fully manage on its own. Before building, you’ll need:

  • GStreamer (for media playback support)
  • OpenGL or a compatible graphics backend
  • Platform-specific libraries depending on your target OS

The project’s documentation covers platform-specific setup in detail. On Linux, most dependencies are available through your package manager. On macOS and Windows, the setup is somewhat more involved, though the Servo team has been actively improving this story.

A Minimal Embedding Example

Here’s a simplified look at what embedding Servo can look like conceptually:

// Note: API is subject to change — always check the latest docs
use servo::Servo;
use servo::embedder_traits::EmbedderMsg;

fn main() {
    // Initialize Servo with your window/surface handle
    let mut servo = Servo::new(/* embedder config */);

    // Load a URL
    servo.load_url("https://example.com".parse().unwrap());

    // Run the event loop
    loop {
        servo.handle_events(vec![]);
        // Handle embedder messages, render frames, etc.
    }
}

This is deliberately simplified — the actual API involves event loops, surface management, and embedder trait implementations. The Servo embedding documentation and the servoshell example application (which ships with the project) are your best reference points for real implementation.

[INTERNAL_LINK: Rust GUI frameworks comparison]

What the Servo Crate Actually Gives You

It’s worth being specific about capabilities, because “browser engine” can mean a lot of things.

What’s Included

Feature Status
HTML5 parsing and rendering ✅ Supported
CSS layout (Flexbox, Grid) ✅ Actively developed
JavaScript (via SpiderMonkey) ✅ Supported
WebGL ✅ Supported
Media playback (video/audio) ✅ Via GStreamer
WebAssembly ✅ Supported
Accessibility tree 🔄 In progress
Full CSS3 compliance 🔄 Ongoing work
WebGPU 🔄 Experimental

What to Be Realistic About

Servo is not Chromium. There will be websites and web apps that don’t render perfectly, particularly those relying on browser-specific behaviors or very recent web APIs. For embedding use cases — rendering documentation, displaying UI built with HTML/CSS, running controlled web content — Servo is increasingly capable. For rendering arbitrary web content from the open internet, you’ll encounter rough edges.

The project has been transparent about this. The Servo team actively publishes compatibility progress, and the trajectory is clearly positive.

Real-World Use Cases for the Servo Crate

So who should actually be excited about this? Let’s be concrete.

Desktop Application UIs

If you’re building a desktop application in Rust and want to use HTML/CSS for your UI layer — without the electron-style overhead or the platform inconsistency of WebView — Servo is now a genuinely viable option to evaluate. Think of it as a lighter-weight alternative to what Tauri does, but with more control over the rendering engine itself.

Document and Report Rendering

Applications that need to render HTML documents — whether that’s a PDF-generation pipeline, an email client, or a documentation browser — can now embed Servo to handle that rendering in a consistent, spec-compliant way.

Embedded and Kiosk Systems

Servo’s architecture was designed with parallelism and memory efficiency in mind. For kiosk displays, automotive infotainment systems, or other embedded Linux environments where you want web-based UI without the weight of a full browser, Servo is worth serious consideration.

Game Engine UI Overlays

Several game engines and simulation environments use HTML/CSS for their UI layers. With Servo available on crates.io, Rust-based game engines (like those built with Bevy) could potentially integrate web-based UI directly.

Developer Tools and IDEs

Rich developer tools that need to render documentation, changelogs, or UI components described in HTML could benefit from a native Rust rendering engine rather than spinning up a separate browser process.

Comparing Your Options: Servo vs. Alternatives

Servo (crates.io) Tauri/WebView CEF Custom Renderer
Binary size impact Medium Small Very Large Small
Rendering consistency High Low (OS-dependent) High Varies
Rust-native API ✅ Yes Partial ❌ No ✅ Yes
JavaScript support ✅ Yes ✅ Yes ✅ Yes ❌ Usually No
Maintenance burden Low (crate) Low Medium High
Production readiness Maturing Mature Mature Varies
License MPL 2.0 MIT/Apache BSD N/A

The honest takeaway: if you need production-grade stability today for rendering arbitrary web content, Tauri or CEF are safer bets. If you’re building something new, have some tolerance for API evolution, and want a Rust-native solution with a bright future, Servo on crates.io is now worth serious evaluation.

The Bigger Picture: What This Means for the Rust Ecosystem

The availability of Servo on crates.io isn’t just a convenience improvement — it’s a signal.

Ecosystem Maturity

For a project as complex as a browser engine to publish on crates.io, the build system, dependency management, and public API surface have to reach a certain level of stability. The Servo team making this move indicates confidence that the project is ready for broader adoption and experimentation.

Competing With Electron’s Dominance

One of the most persistent criticisms of the modern app development landscape is the proliferation of Electron-based applications — apps that ship an entire Chromium instance to render what is essentially a website. The combination of Rust’s performance characteristics and Servo’s embedding-focused architecture represents a genuine alternative path. It won’t replace Electron overnight, but the building blocks are getting real.

Attracting Contributors

Publishing on crates.io dramatically lowers the barrier to experimentation, which means more developers will try Servo, find bugs, write fixes, and contribute back. This is how open source projects accelerate.

[INTERNAL_LINK: Contributing to Rust open source projects]

Practical Advice for Early Adopters

If you’re planning to start experimenting with the Servo crate, here’s what I’d recommend based on the current state of the project:

  1. Start with servoshell: Before writing your own embedder, run the reference shell application. It’ll help you understand how the embedding API is meant to be used.

  2. Pin your version carefully: The API is evolving. Use a specific version in your Cargo.toml and update deliberately, reviewing the changelog each time.

  3. Join the community: The Servo project is active on GitHub and has a Zulip chat. If you’re building something with the crate, engaging with the community will save you significant debugging time.

  4. Don’t use it for untrusted content yet: If your use case involves rendering arbitrary user-supplied HTML from the internet, be cautious. Security hardening for embedding use cases is ongoing.

  5. Contribute your findings: If you hit a bug or limitation, file an issue. The team is responsive, and early-adopter feedback directly shapes the API.

Frequently Asked Questions

Q: Is Servo production-ready now that it’s on crates.io?

Not universally. For controlled use cases — rendering your own HTML/CSS content, building application UIs, displaying documentation — Servo is increasingly capable and the crates.io publication reflects meaningful stability. For rendering arbitrary web content from the open internet, you’ll encounter compatibility gaps. Evaluate it against your specific requirements.

Q: How does Servo’s performance compare to Chromium or WebKit?

Servo was architecturally designed to leverage parallelism in ways that older engines like Blink (Chromium) and WebKit weren’t. In specific benchmarks, particularly around CSS layout, Servo can be competitive or faster. In overall real-world browsing performance, the comparison is more nuanced. For embedding use cases, Servo’s performance profile is generally favorable.

Q: Can I use the Servo crate in a commercial application?

Yes. Servo is licensed under the Mozilla Public License 2.0 (MPL 2.0), which is a file-level copyleft license. You can use it in commercial applications; you’re required to make available any modifications you make to MPL-licensed files themselves, but your application code remains your own. Consult a lawyer for your specific situation.

Q: Does the Servo crate work on all platforms?

Servo supports Linux, macOS, and Windows. Android support is in progress. The degree of polish varies by platform — Linux tends to be best-supported given the development environment of most contributors. Check the project’s current platform support matrix before committing to a target.

Q: What’s the difference between Servo and the WebRender crate?

WebRender is Servo’s GPU-accelerated rendering backend, which was actually adopted by Firefox as its production rendering engine. WebRender handles the final painting of pixels. Servo is the full browser engine stack — HTML parsing, CSS layout, JavaScript execution, and WebRender for the final render. If you just need GPU-accelerated 2D graphics, WebRender might be the more focused tool; if you need a full web rendering pipeline, Servo is what you want.

The Bottom Line

Servo is now available on crates.io, and that’s genuinely exciting news for the Rust ecosystem. It represents years of work reaching a new level of accessibility, and it opens up use cases that were previously impractical for most developers.

Is it ready to replace your production WebView setup today? Probably not for every use case. Is it worth experimenting with if you’re building a new Rust application that needs HTML rendering? Absolutely yes.

The best way to form your own opinion is to try it. Add the crate, run the examples, and see how it fits your use case. The Servo team has made that easier than ever.

Have you tried embedding Servo in a Rust project? Drop your experience in the comments — real-world usage reports help the whole community understand where the project stands today.

HTTP Security Headers: The Complete Guide to Securing Your Website

TL;DR
HTTP security headers are your first line of defense against cross-site scripting (XSS), clickjacking,
MIME sniffing, and data injection attacks. Despite being simple response headers, a 2024 scan of the
top 1 million websites found that fewer than 12% deploy a Content Security Policy. This guide
covers every critical security header with production-ready Nginx and Apache configurations.

📑 Table of Contents

  • Why Security Headers Matter
  • Content Security Policy (CSP)
  • Strict-Transport-Security (HSTS)
  • X-Frame-Options
  • X-Content-Type-Options
  • Referrer-Policy
  • Permissions-Policy
  • Additional Useful Headers
  • Nginx & Apache Configuration
  • Best Practices
  • Common Mistakes
  • Tools
  • References

Why Security Headers Matter

Security headers instruct browsers on how to handle your content — which scripts can run,
whether your page can be framed, and what information is leaked in referrers. They cost nothing to deploy
and defend against entire categories of attacks identified in the OWASP Top 10.

📖 Definition — HTTP security headers are response headers sent by the server that activate browser-side security mechanisms, restricting behavior that could be exploited by attackers.

Content Security Policy (CSP)

CSP is the most powerful security header. It defines an allowlist of content sources, effectively neutralizing
XSS, data injection, and unauthorized inline scripts.

Content-Security-Policy:
  default-src 'self';
  script-src 'self' https://cdn.example.com;
  style-src 'self' 'unsafe-inline';
  img-src 'self' data: https:;
  font-src 'self' https://fonts.gstatic.com;
  connect-src 'self' https://api.example.com;
  frame-ancestors 'none';
  base-uri 'self';
  form-action 'self';
  upgrade-insecure-requests;
Directive Controls Recommended Value
default-src Fallback for all resource types 'self'
script-src JavaScript sources 'self' + specific CDNs
style-src CSS sources 'self' (avoid 'unsafe-inline')
img-src Image sources 'self' data: https:
frame-ancestors Who can embed your page 'none'
base-uri Restricts “ element 'self'

🎯 Start with Content-Security-Policy-Report-Only to log violations without blocking. Use the report-uri or report-to directive to collect reports, then tighten the policy iteratively.

🚫 Never use ‘unsafe-eval’ in production CSP. It re-enables eval(), completely undermining XSS protection. Refactor code that calls eval(), new Function(), or inline event handlers.

Strict-Transport-Security (HSTS)

Forces browsers to connect over HTTPS only, preventing protocol downgrade attacks and SSL stripping.

`http
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
`

⚠️ Once HSTS is deployed with includeSubDomains, every subdomain must have a valid TLS certificate. Rolling this out without full HTTPS coverage will break subdomains.

X-Frame-Options

Prevents your page from being embedded in , , or
“ elements on other sites — blocking clickjacking attacks.

X-Frame-Options: DENY
Value Behavior
DENY Never allow framing (most secure)
SAMEORIGIN Allow framing only from same origin

💡 CSP’s frame-ancestors directive is the modern replacement for X-Frame-Options, offering more granular control. Deploy both for backward compatibility.

X-Content-Type-Options

Prevents browsers from MIME-sniffing a response away from the declared Content-Type.
Blocks attacks that disguise executable content as harmless file types.

X-Content-Type-Options: nosniff

Referrer-Policy

Controls how much referrer information is sent when navigating away from your site.

Referrer-Policy: strict-origin-when-cross-origin
Policy Same-Origin Cross-Origin (HTTPS→HTTPS) Downgrade (HTTPS→HTTP)
no-referrer None None None
strict-origin Full URL Origin only None
strict-origin-when-cross-origin Full URL Origin only None
no-referrer-when-downgrade Full URL Full URL None

Permissions-Policy

Controls which browser features (camera, microphone, geolocation, etc.) your site and embedded iframes can use.
Formerly known as Feature-Policy.

Permissions-Policy: camera=(), microphone=(), geolocation=(), payment=(self), usb=()

Pro Tip: Set unused features to () (empty allowlist) to explicitly disable them. This prevents embedded third-party scripts from silently accessing sensitive APIs like the camera or microphone.

Additional Useful Headers

Header Value Purpose
Cross-Origin-Opener-Policy same-origin Isolates browsing context, enables SharedArrayBuffer
Cross-Origin-Embedder-Policy require-corp Ensures all embedded resources opt-in to being loaded
Cross-Origin-Resource-Policy same-origin Prevents other origins from loading your resources
X-DNS-Prefetch-Control off Prevents speculative DNS lookups (privacy)

Nginx & Apache Configuration

Nginx

# /etc/nginx/snippets/security-headers.conf
add_header Content-Security-Policy "default-src 'self'; script-src 'self'; style-src 'self'; img-src 'self' data: https:; frame-ancestors 'none'; base-uri 'self'; form-action 'self';" always;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
add_header X-Frame-Options "DENY" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "camera=(), microphone=(), geolocation=()" always;
add_header Cross-Origin-Opener-Policy "same-origin" always;
add_header Cross-Origin-Embedder-Policy "require-corp" always;

# Include in server block:
# include snippets/security-headers.conf;

Apache

# .htaccess or httpd.conf
Header always set Content-Security-Policy "default-src 'self'; script-src 'self'; frame-ancestors 'none';"
Header always set Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
Header always set X-Frame-Options "DENY"
Header always set X-Content-Type-Options "nosniff"
Header always set Referrer-Policy "strict-origin-when-cross-origin"
Header always set Permissions-Policy "camera=(), microphone=(), geolocation=()"

Best Practices

Deploy CSP in report-only mode first, analyze violations, then enforce.

Use nonce-based CSP ('nonce-{random}') instead of 'unsafe-inline' for inline scripts.

Add the always keyword in Nginx to send headers on all response codes (including 4xx/5xx).

Test headers in staging before production — overly strict CSP can break legitimate functionality.

Audit headers regularly with automated scanners as your site’s dependencies evolve.

Common Mistakes

Mistake Impact Fix
Using 'unsafe-inline' + 'unsafe-eval' in CSP Nullifies XSS protection Use nonces or hashes instead
Missing always keyword in Nginx Headers absent on error pages Add always to every add_header
HSTS without full HTTPS coverage Subdomains become unreachable Ensure all subdomains have valid TLS certs first
Forgetting frame-ancestors in CSP Clickjacking still possible Add frame-ancestors 'none' to CSP
Setting Referrer-Policy: unsafe-url Full URL leaked to third parties Use strict-origin-when-cross-origin

Tools

Scan your website’s security headers:

  • 🔧 Security Header Scanner — Analyze all security headers and get an actionable report with grades.

References

  • 📄 MDN — Content Security Policy (CSP)

  • 📄 MDN — Strict-Transport-Security

  • 📄 MDN — Permissions-Policy

  • 📄 OWASP Secure Headers Project

  • 📄 OWASP HTTP Headers Cheat Sheet

  • 📄 MDN — Referrer-Policy

🎯 Key Takeaway: Security headers are free, high-impact defenses. At minimum, deploy CSP, HSTS, X-Frame-Options,
X-Content-Type-Options, Referrer-Policy, and Permissions-Policy. Start CSP in report-only mode,
iterate based on real violation reports, then enforce. Combine with a regular scanning cadence
to catch regressions as third-party dependencies change.

Originally published on StarNomina ToolBox. Try our free online tools — no signup required.

DNS Records: The Complete Reference Guide for Every Record Type

TL;DR
DNS (Domain Name System) translates human-readable domain names into IP addresses and service endpoints.
With over 1.1 trillion DNS queries handled daily worldwide, understanding every record type — from
the ubiquitous A record to specialized CAA and SRV entries — is fundamental to deploying, securing,
and troubleshooting any internet service. This reference covers all major record types with real-world examples.

📑 Table of Contents

  • How DNS Works
  • A & AAAA Records
  • CNAME Records
  • MX Records
  • TXT Records
  • NS & SOA Records
  • SRV Records
  • CAA Records
  • PTR Records
  • Understanding TTL
  • Best Practices
  • Common Mistakes
  • Tools
  • References

How DNS Works

A DNS query follows a hierarchical resolution path: your device’s stub resolver asks a
recursive resolver (e.g., 1.1.1.1 or 8.8.8.8), which queries root servers,
then the TLD nameserver (.com, .org), and finally the domain’s authoritative nameserver
to return the answer. Responses are cached at each level according to the record’s TTL.

📖 Definition — A DNS record (Resource Record) is an entry in a zone file that maps a domain name to a specific value — an IP address, mail server, text string, or another domain name.

A & AAAA Records

The most fundamental record types. A records map a domain to an IPv4 address;
AAAA records map to an IPv6 address.

; A Record — IPv4
example.com.    300    IN    A      93.184.216.34

; AAAA Record — IPv6
example.com.    300    IN    AAAA   2606:2800:220:1:248:1893:25c8:1946

🎯 Always publish both A and AAAA records for dual-stack compatibility. IPv6 adoption crossed 40% globally in 2024.

CNAME Records

A CNAME (Canonical Name) record aliases one domain to another. The DNS resolver follows the chain
until it reaches an A/AAAA record.

www.example.com.    3600    IN    CNAME    example.com.
blog.example.com.   3600    IN    CNAME    myhost.github.io.

⚠️ A CNAME cannot coexist with any other record type at the same name (RFC 1034 §3.6.2). You cannot place a CNAME at the zone apex alongside SOA/NS records. Use ALIAS/ANAME (provider-specific) for apex domains.

MX Records

MX (Mail Exchanger) records direct email to the correct mail servers. The priority
value determines failover order — lower numbers are tried first.

example.com.    3600    IN    MX    10    mail1.example.com.
example.com.    3600    IN    MX    20    mail2.example.com.
Priority Server Role
10 mail1.example.com Primary mail server
20 mail2.example.com Backup mail server

TXT Records

TXT records store arbitrary text and are heavily used for email authentication, domain verification, and security policies.

; SPF — Authorize mail senders
example.com.    3600    IN    TXT    "v=spf1 include:_spf.google.com ~all"

; DKIM — Email signature verification
selector._domainkey.example.com.    3600    IN    TXT    "v=DKIM1; k=rsa; p=MIGfMA0G..."

; DMARC — Email policy
_dmarc.example.com.    3600    IN    TXT    "v=DMARC1; p=reject; rua=mailto:dmarc@example.com"

; Domain verification
example.com.    3600    IN    TXT    "google-site-verification=abc123..."

💡 A single domain can have multiple TXT records. However, only one SPF record is allowed per domain — multiple SPF records cause authentication failures (RFC 7208 §3.2).

NS & SOA Records

NS records delegate a zone to specific nameservers. SOA (Start of Authority) records
define the zone’s primary nameserver, admin email, and serial/refresh/retry/expire timers.

; NS Records
example.com.    86400    IN    NS    ns1.provider.com.
example.com.    86400    IN    NS    ns2.provider.com.

; SOA Record
example.com.    3600    IN    SOA    ns1.provider.com. admin.example.com. (
                        2024031501  ; Serial
                        7200        ; Refresh (2h)
                        3600        ; Retry (1h)
                        1209600     ; Expire (14d)
                        86400       ; Minimum TTL (1d)
)

SRV Records

SRV records specify the host and port for specific services (e.g., SIP, XMPP, LDAP).

; _service._protocol.name    TTL    class    SRV    priority weight port target
_sip._tcp.example.com.    3600    IN    SRV    10 60 5060 sip1.example.com.
_sip._tcp.example.com.    3600    IN    SRV    10 40 5060 sip2.example.com.

💡 The weight field enables load balancing among servers with the same priority. Higher weight = more traffic share.

CAA Records

CAA (Certificate Authority Authorization, RFC 8659) records specify which CAs are permitted to issue
certificates for a domain — a critical security control.

example.com.    3600    IN    CAA    0 issue "letsencrypt.org"
example.com.    3600    IN    CAA    0 issuewild ";"
example.com.    3600    IN    CAA    0 iodef "mailto:security@example.com"

🎯 Use issuewild “;” to explicitly block wildcard certificate issuance if you don’t need wildcards. The iodef tag notifies you of policy violations.

PTR Records

PTR (Pointer) records provide reverse DNS — mapping an IP address back to a domain name.
Essential for mail server reputation and network diagnostics.

; Reverse DNS for 93.184.216.34
34.216.184.93.in-addr.arpa.    3600    IN    PTR    example.com.

Understanding TTL

TTL Value Duration Use Case
60 1 minute Failover, migrations, testing
300 5 minutes Dynamic services, CDNs
3600 1 hour Standard web records
86400 24 hours Stable records (NS, MX)

Pro Tip: Before a planned DNS change, lower the TTL to 60–300 seconds at least 48 hours in advance (to let the old high TTL expire from caches). After the change propagates, raise TTL back to its normal value.

Best Practices

Publish both A and AAAA records for every public hostname.

Set CAA records to restrict certificate issuance to your chosen CA.

Configure SPF + DKIM + DMARC TXT records for every domain that sends email.

Use at least two geographically diverse NS records.

Set up PTR records for all mail server IPs.

Lower TTL before migrations, restore afterward.

Common Mistakes

Mistake Impact Fix
CNAME at zone apex Broken NS/SOA coexistence Use ALIAS/ANAME or A record
Multiple SPF TXT records SPF PermError — email fails auth Merge into one v=spf1 record
Missing trailing dot in zone files Relative name interpreted wrong Always use FQDN with trailing dot
TTL too high before migration Long propagation delays Pre-lower TTL 48h before changes
No CAA records Any CA can issue certs for your domain Publish restrictive CAA records

Tools

Inspect and verify your DNS configuration:

  • 🔧 DNS Lookup — Query A, AAAA, MX, NS, SOA, SRV, and other record types.

  • 🔧 TXT Record Lookup — Inspect SPF, DKIM, DMARC, and verification records.

  • 🔧 CNAME Lookup — Trace CNAME chains to their canonical target.

References

  • 📄 RFC 1035 — Domain Names: Implementation and Specification

  • 📄 RFC 8659 — DNS Certification Authority Authorization (CAA)

  • 📄 RFC 7208 — Sender Policy Framework (SPF)

  • 📄 RFC 2782 — A DNS RR for Specifying the Location of Services (SRV)

  • 📄 Cloudflare DNS Documentation

🎯 Key Takeaway: DNS is the invisible foundation of every internet service. Master the record types — A/AAAA for addresses,
CNAME for aliases, MX for mail, TXT for authentication, CAA for certificate control, and SRV for service
discovery. Combine proper TTL management with email authentication (SPF/DKIM/DMARC) to build a secure,
resilient DNS configuration.

Originally published on StarNomina ToolBox. Try our free online tools — no signup required.