CtF Submissions for DEF CON 34 are now open.

Challenge submissions for the AppSec Village Wargame Contest at DEF CON 34 are now open.

Think you have what it takes to make the most interesting AppSec challenge? Now is a good time to get started.

Build challenges with the Open Source SecDim Play SDK and win prizes at DEF CON 34.

👉 https://secdim.com/defcon/

Your Team Has 5 CLAUDE.md Files and They All Say Different Things

Your Team Has 5 CLAUDE.md Files and They All Say Different Things

You have a team. Everyone uses Claude Code. Everyone wrote their own CLAUDE.md.

Alice wrote: “Never push to production without confirmation.”
Bob wrote: “Always ask before running tests.”
Carlos wrote: “Explain changes before executing.”
Diana wrote: “Use TypeScript strict mode on all new files.”
Elena wrote nothing. She is winging it.

Five developers. Five different agents. No shared behavior. No baseline. No consistency.

And nobody noticed — because each person only sees their own agent.

Why This Happens

CLAUDE.md is not a global config file. It is a per-project, per-developer document that each agent reads at session start and interprets independently.

There is no registry. No enforcement layer. No way to know whether your team’s five CLAUDE.md files agree on anything.

This is fine when you work alone. It becomes a serious problem the moment your codebase is shared.

What Inconsistent CLAUDE.md Files Look Like in Practice

Scenario 1: Merge conflict in behavior, not in code

Alice and Bob both work on the same feature. Alice’s agent asks for confirmation before writing tests. Bob’s agent runs them automatically. The PR reviews look different. The commit histories look different. The behavior of the codebase under AI assistance is inconsistent.

Nobody merges a CLAUDE.md conflict. They just live with it.

Scenario 2: The missing rule gap

Your team decides: no any types in TypeScript. Alice adds it to her CLAUDE.md. Bob does not. Bob’s agent keeps suggesting any. Bob thinks Claude Code is broken. It is not. His CLAUDE.md just never got the memo.

Scenario 3: The rule that traveled

Your company started with two rules. You are now at twelve. Alice has twelve. Bob has eight. Carlos has five. Nobody knows which three caused the regressions last sprint.

The Real Cost

Inconsistent CLAUDE.md files mean:

  • Your AI agent enforces different standards depending on who runs it
  • Bugs introduced in one developer’s session would not have happened in another developer’s session
  • You cannot onboard a new developer with a reliable baseline
  • You cannot diagnose whether a compliance failure was a CLAUDE.md problem or a model problem
  • Code review becomes “review the human” instead of “review the agent behavior”

The Fix: A Shared CLAUDE.md Baseline

The solution is not to enforce one CLAUDE.md for everyone. Developers have legitimate personal preferences. The solution is to separate shared rules from personal rules.

Layer 1: Shared baseline (committed to the repo)

## Mandatory rules (apply to all contributors)
- Never push to production without explicit confirmation
- Never run destructive database operations without a dry run
- Always explain what you are about to do before executing it
- Do not modify files outside the current task scope
- When in doubt, ask. Do not guess.

## Code standards
- TypeScript: strict mode on all new files, no `any` types
- Tests: do not skip, do not mock without labeling mocks
- Commits: conventional commit format required

Layer 2: Developer-specific additions (gitignored)

# CLAUDE.md.local — Personal additions (not committed)
- I prefer step-by-step explanations over summaries
- Always suggest tests before implementation

Layer 3: A quarterly review

Someone on the team owns the shared CLAUDE.md. Once a quarter, you compare behavior across team members and update the baseline rules to match what actually works.

Rule Ordering Matters Too

Within your shared CLAUDE.md, put the most critical safety rules first. Not last.

Claude Code reads the file top to bottom. As the context window fills across a long session, rules near the bottom get weaker. Your “never push to production” rule should be the first thing Claude reads, not the seventh.

Tooling for Shared CLAUDE.md Management

If you want pre-written, team-tested rules that cover the most common failure modes — scope creep, silent pushes, test skipping, destructive operations, ambiguous instructions — the CLAUDE.md Rules Pack includes a structured template built for teams.

It is not a single file you paste blindly. It is a modular structure: shared baseline in the repo, personal layer gitignored, rule ordering that survives context compaction.

If you want to start for free first: Download the free starter — includes the core structure and a minimal shared baseline you can test with your team today.

Summary

Problem Impact
Each developer writes CLAUDE.md from scratch No shared baseline
No agreed rule set Agent behaves differently per developer
No rule ordering discipline Critical rules buried and deprioritized
No review process Drift compounds silently

The fix: shared committed baseline + gitignored personal layer + quarterly review.

If your team is running Claude Code without a shared CLAUDE.md, you do not have one AI agent. You have five.

CLAUDE.md Rules Pack — structured rules for teams and solo devs | Free starter

Built Log Stripper: A VS Code Extension to Remove Debug Logs Across 23+ Languages

Every developer has a version of this story.

You’re about to open a pull request. The code works. The tests pass. Everything looks good.

Then you do one final scan and find this:

console.log("HERE");
console.log("user:", user);
console.log("final:", result);

And then, before the PR:

Search → Delete. Search → Delete. Search → Delete.

I got tired of it. So I built a tool to automate it.

What is Log Stripper?

Log Stripper is a VS Code extension that removes debug, log, and print statements from 23+ programming languages – with a preview so you always see what will be deleted before anything changes.

Log Stripper Icon

🔗 VS Code Marketplace
🔗 GitHub

The Features

🔍 Preview modeCtrl+Shift+D

Shows every line that will be removed. You confirm. Only then does anything change.

🎨 Highlight modeCtrl+Shift+H

Marks debug lines in red. No file modification. Review first, strip later.

🌍 23+ languages

JS, TS, Python, Java, Go, Rust, C#, Swift, Kotlin, Dart, Ruby, PHP, C, C++, Shell, Lua, Scala, Elixir, Haskell, R, Vue, Svelte, JSX/TSX

🏢 Workspace cleanup

Strip debug statements from an entire project in one command.

Why not just use grep/sed?

You could. But:

  • No preview
  • No multiline support
  • No VS Code integration
  • No safety rules
  • Works differently on Windows vs Mac vs Linux
    Log Stripper handles all of this inside VS Code with a consistent UX.

The Interesting Technical Part: Multiline Removal

Removing a single-line console.log("x") is trivial. The interesting case is:

console.log(
  "user:",
  JSON.stringify(user, null, 2),
  "role:",
  user.role
);

To remove this correctly, you need to:

  1. Find the opening (
  2. Track paren depth across lines
  3. Handle parens inside strings (don’t count those)
  4. Find the exact closing ) and optional ;
  5. Remove the entire block without touching surrounding code
    Here’s the core of skipParenBlock:
function skipParenBlock(lines: string[], startLine: number) {
  const openIdx = lines[startLine].indexOf("(");
  let bal = 1, li = startLine, ci = openIdx + 1;
  let inStr: string | null = null;

  while (li < lines.length) {
    const s = lines[li];
    while (ci < s.length) {
      const ch = s[ci];
      if (inStr) {
        if (ch === "\") { ci += 2; continue; }
        if (ch === inStr) inStr = null;
      } else if (ch === '"' || ch === "'" || ch === "`") {
        inStr = ch;
      } else if (ch === "(") {
        bal++;
      } else if (ch === ")") {
        bal--;
        if (bal === 0) {
          ci++;
          if (s[ci] === ";") ci++;
          return { endLine: li, endCol: ci };
        }
      }
      ci++;
    }
    li++; ci = 0;
  }
}

Architecture Decision

I separated the core logic from the VS Code layer:

src/
├── extension.ts   ← VS Code commands, UI, workspace
└── stripper.ts    ← Pure logic, zero VS Code deps

Why? Because stripper.ts can be tested without launching a VS Code host:

node --test out-test/test/stripper.test.js

151 tests. Fast. No mocking needed.

Safety Rules

The extension will never remove:

// console.log("commented") ← comment line, always kept
return console.log(x);        inline code, always kept
const x = doThing() || console.log("fallback");  inline, kept

Only whole-statement debug calls that are the entire line (after indentation) are removed.

The Languages Covered

Language Examples
JS/TS console.log, debugger
Python print(), logging.debug(), breakpoint()
Java System.out.println, logger.debug()
Go fmt.Println, log.Fatal
Rust println!, dbg!, eprintln!
C# Console.WriteLine, Debug.Write
Swift print(), NSLog()
Ruby puts, binding.pry
PHP var_dump(), dd()
Shell echo, printf
…13 more

Try It

# Install from Marketplace
code --install-extension saurabhchoudhary.log-stripper

Or search “Log Stripper” in the VS Code extensions panel.

Open a file with debug statements → Ctrl+Shift+D → see the preview → confirm.

  • Toggle Debug Highlights | Highlight/unhighlight debug lines
    Toggle Debug Highlights | Highlight/unhighlight debug lines
  • Preview & Strip Current File | Shows preview, then strips on confirm
    Preview & Strip Current File | Shows preview, then strips on confirm
  • Command Palette | Strip Entire Workspace | Clean whole project
    Command Palette | Strip Entire Workspace | Clean whole project
  • Right-click → menu | Strip Current File | Strip without preview
    Right-click → menu | Strip Current File | Strip without preview
  • Editor toolbar icons (eye / eye-closed / trash)
    Editor toolbar icons (eye / eye-closed / trash)

Built this because I couldn’t find a tool that did all of this properly. Turned out to be one of the most satisfying weekend projects I’ve done.

If you try it, I’d love feedback in the comments or as a GitHub issue. 🙌

By Saurabh Choudhary – Software Engineer

Introduction

Every developer has a version of this story.

You’re about to open a pull request. The code works. The tests pass. You’re ready. Then you do one last scroll through the file and see it:

console.log("HERE");
console.log("user data:", JSON.stringify(userData));
console.log("response", res);

Three debug logs you forgot to remove. You delete them, re-run the linter, and open the PR.

Two weeks later, someone reports that production logs are noisy with debug output. A developer on your team had left a logger.debug() in a hot path. Nobody caught it in review.

This isn’t a rare occurrence. It’s a pattern that plays out in every team, on every codebase, in every programming language.

I’ve experienced it across JavaScript, TypeScript, PHP, Go, Python, Vue.js, NestJS, Laravel, and Angular projects. Different languages, different ecosystems – same repetitive problem.

Eventually I stopped accepting it as “just part of the workflow” and decided to build a proper solution.

That solution is Log Stripper – a VS Code extension that removes debug, log, and print statements from 23+ programming languages with a safety-first, preview-driven approach.

The Problem

The problem has two dimensions that are easy to underestimate.

First: it’s repetitive. Before every commit, before every PR, before every release, developers go through the same manual process of searching for and deleting debug statements. On large codebases with dozens of files touched in a single feature branch, this becomes genuinely time-consuming.

Second: it’s error-prone. Humans miss things. A console.log inside a nested callback, a print() inside a conditional branch, a System.out.println buried in a Java service – these are easy to overlook when scanning files manually. And when they slip through, they end up in production.

There’s also a subtler issue: developers working under time pressure will rush the cleanup. The more pressure, the more likely something gets missed.

This is exactly the kind of task that should be automated. It is deterministic, repetitive, and rule-based. A computer should do it.

Why Existing Workflows Were Frustrating

Before building Log Stripper, I searched for existing solutions.

I found a few extensions that handled console.log removal for JavaScript. Some worked well for that specific case. But they had significant limitations:

Language coverage was narrow. Most extensions focused exclusively on JavaScript. If you wrote Python, Go, Java, or Rust, you were on your own.

No preview mode. Several extensions deleted statements immediately without showing you what would be removed. That’s unsafe. A developer who accidentally removes a legitimate logger call in production code will stop trusting the tool immediately.

No highlight mode. Sometimes I want to see the debug statements in a file without removing anything yet. I want to review them, understand them, decide manually. None of the tools I found offered non-destructive inspection.

No workspace cleanup. File-by-file cleanup doesn’t scale. Before a major release, I want to scan an entire project and clean everything in one pass.

Unreliable multiline handling. A console.log that spans multiple lines is common:

console.log(
  "processing user:",
  JSON.stringify(user, null, 2)
);

Many tools would only remove the first line and leave the rest, creating syntax errors.

Why I Built Log Stripper

The decision to build rather than adapt came down to one realization: the problem deserved a proper solution, not a workaround.

I wanted an extension that:

  1. Worked across every major language I use day-to-day
  2. Showed a preview before making any changes
  3. Could highlight debug lines without modifying files
  4. Could clean an entire workspace safely
  5. Was smart enough to never break code it shouldn’t touch
  6. Was backed by automated tests
    None of those requirements are unreasonable. Together, they’re the minimum bar for a tool I’d actually trust.

The weekend I started building it, I intended it to be a one-day project. It grew into something more substantial as I realized the edge cases, the safety requirements, and the value of doing it properly.

Architecture

The most important architectural decision I made was separating the core logic from the VS Code integration layer.

The repository has two source files:

src/
├── extension.ts    ← VS Code commands, UI, workspace operations
└── stripper.ts     ← Core strip logic (zero VS Code dependencies)

stripper.ts has no awareness of VS Code. It takes a string of text and a language ID, and returns a new string with debug statements removed plus metadata about what was changed. That’s it.

This separation has two major benefits:

Testability. I can test the core logic without launching a VS Code Extension Development Host. The 151 automated tests run with a plain node --test command. Fast, simple, reliable.

Maintainability. If VS Code changes its API, or if I want to port the logic to a CLI tool or a git pre-commit hook, the core logic doesn’t need to change.

The VS Code layer handles everything user-facing: commands, keybindings, settings, decorations, progress notifications, and workspace file scanning.

Multi-Language Support

Supporting 23 languages sounds ambitious. The implementation is actually straightforward once you have the right architecture.

Each language is a key in the LANGUAGE_PATTERNS object, and its value is an array of regular expressions:

const LANGUAGE_PATTERNS: Record<string, RegExp[]> = {
  javascript: [
    /^(s*)console.(log|debug|info|warn|error|trace|...)s*(/,
    /^(s*)debuggers*;?$/,
  ],
  python: [
    /^(s*)prints*(/,
    /^(s*)logging.(debug|info|warning|error|critical)s*(/,
    /^(s*)breakpoints*(s*)/,
  ],
  // ... 21 more languages
};

Language aliases handle cases where VS Code’s language ID differs from the pattern group:

const LANG_ALIASES: Record<string, string> = {
  svelte: "javascript",
  "vue-html": "vue",
  typescriptreact: "typescript",
};

When the extension receives a file, it looks up the language ID, finds the pattern list, and applies each pattern against every line. If no patterns are found for a language, the file is returned unchanged.

Adding a new language is a matter of adding one entry to LANGUAGE_PATTERNS and a set of test cases to stripper.test.ts. The infrastructure handles the rest.

Preview Mode

The preview mode was the feature I was most deliberate about designing.

When a developer triggers “Preview & Strip” with Ctrl+Shift+D, the extension:

  1. Runs the stripping logic internally (without modifying the file)
  2. Collects the list of lines that would be removed
  3. Shows a modal dialog listing those lines with their line numbers
  4. Waits for explicit confirmation before making any changes
    This turns a potentially destructive operation into a fully transparent, opt-in action. The developer sees exactly what will happen and retains full control.

The modal includes the first 20 matching lines and appends “…and N more” if there are additional matches. This keeps the dialog readable on files with heavy debug coverage.

If the developer cancels, nothing changes. Not a single character in the file is modified.

Highlight Mode

Highlight mode solves a different need: I want to see the debug statements but I’m not ready to remove them yet.

When the developer presses Ctrl+Shift+H, the extension:

  1. Finds all lines matching debug patterns using findDebugLineIndices()
  2. Applies a VS Code text decoration that highlights those lines in red
  3. Adds an inline annotation “← debug” at the end of each line
    The file is not modified. This is purely a visual overlay.

Pressing Ctrl+Shift+H again (or clicking the eye-closed icon in the toolbar) removes the highlights.

The highlight controller also handles three lifecycle events automatically:

  • When you strip the file, highlights are cleared (no red lines on lines that no longer exist)
  • When you edit the file and refreshHighlightsOnEdit is enabled, highlights recompute after a 120ms debounce

– When you switch tabs and clearHighlightsOnTabChange is enabled, highlights are cleared automatically

Workspace Cleanup

The workspace strip command scans every file in the project matching the supported language extensions and strips debug statements from each one.

The implementation uses VS Code’s workspace.findFiles() API with exclusion patterns to skip node_modules, dist, .git, vendor, build, and out by default. All exclusion patterns are configurable in settings.

The operation runs with a progress notification showing the current file being processed and a cancellation option. After completion, a summary shows how many statements were removed across how many files.

Because this modifies files on disk, the documentation explicitly recommends using version control. A git diff after the workspace strip is always informative.

Safety Design

Safety was a first-class concern throughout the design.

The extension enforces several rules to avoid breaking code it shouldn’t touch:

Rule 1: Whole-line only. A debug call is only removed if it is the entire statement on the line (after indentation). This prevents removal of:

return console.log(x);         // kept - inline with return
const result = doWork() || console.log("fallback");  // kept - chained

Rule 2: Comments are never touched. Any line where the first non-whitespace characters form a comment prefix (//, #, --, /*, *) is skipped entirely.

Rule 3: Multiline paren balancing. When a debug call spans multiple lines, the extension tracks open and close parentheses (accounting for parens inside strings) to identify the exact end of the statement. The closing ) and optional ; are consumed, and any remaining code on that final line is preserved.

Rule 4: No modification on cancel. Preview mode never touches the file unless the user explicitly confirms. The stripping logic runs on an in-memory copy of the text, and the result is only written back if the user says yes.

Testing Strategy

I wanted the extension to be trustworthy. That required a proper test suite.

The tests live in test/stripper.test.ts and use Node’s built-in test runner – no Jest, no Mocha, no additional dependencies. The test command is:

tsc -p tsconfig.test.json && node --test out-test/test/stripper.test.js

The 151 test cases cover:

  • Every language in LANGUAGE_PATTERNS
  • Simple single-line removal
  • Multiline call removal with paren balancing
  • Comment preservation
  • Inline code preservation
  • Real-world code samples (NestJS controllers, Python classes, Go functions)
  • Language aliases (Vue, Svelte, JSX, TSX)
  • Unknown languages (should be a no-op)
  • The findDebugLineIndices() function (used by highlight mode)
    Each test specifies the input code, the expected output, and the expected removal count. Any test that would cause the extension to modify code it shouldn’t – or fail to modify code it should – is a failing test.

Writing tests before polishing features was the right decision. It caught several edge cases during development and gave me confidence when refactoring the paren-balancing logic.

Publishing to the VS Code Marketplace

The Marketplace publishing process is simpler than I expected.

The key steps:

  1. Create a publisher account at marketplace.visualstudio.com
  2. Generate a Personal Access Token from Azure DevOps with Marketplace → Manage scope
  3. Install @vscode/vsce globally: npm install -g @vscode/vsce
  4. Run vsce package to create the .vsix file
  5. Run vsce publish --pat TOKEN or upload the .vsix manually via the publisher portal
    The .vscodeignore file controls what goes into the package. I excluded src/, test/, and node_modules/ – only the compiled out/ directory ships. This keeps the extension lightweight.

One thing I’d emphasize: set the repository.url in package.json before publishing. It shows up prominently on the Marketplace listing and signals to users that the extension is open source and maintainable.

Lessons Learned

Separate your core logic from your integration layer. This applies far beyond VS Code extensions. Any time you can isolate the testable business logic from the framework or platform code, do it. It makes everything easier.

Safety features build trust. The preview mode and the “never remove inline code” rule weren’t technically necessary for the extension to work. But they’re what make a developer comfortable trusting the tool with real production code.

Test the edge cases first. Multiline calls, strings containing parens, empty files, unsupported languages – these were the cases most likely to cause silent failures. Writing tests for them before the happy path forced me to build robust logic from the start.

Ship before it’s perfect. The first version didn’t have highlight mode. The workspace strip was added later. Shipping v1 with the core feature – preview and strip – got real feedback faster than spending another two weeks on features users might not need.

Small tools have real value. This isn’t a SaaS platform. It’s not an AI product. It’s a focused tool that does one thing well. Developer tools like this get used quietly but consistently. That’s enough.

Future Improvements

Several improvements are on the roadmap:

Custom pattern rules. Allow developers to define their own patterns to remove. For teams using custom logging frameworks, this would make Log Stripper work with any codebase.

Pre-commit hook integration. A script or CLI wrapper that can run as a git pre-commit hook, catching debug statements before they ever reach a commit.

Team-wide configuration. A .logstripper.json configuration file that defines excluded patterns per project, shareable across a team through version control.

Ignore comments. A // log-stripper-ignore annotation that tells the extension to skip a specific line or block.

More language coverage. Zig, Nim, OCaml, and Erlang are candidates for future additions.

Conclusion

Log Stripper is a small tool. It solves one problem, and it solves it well.

But the process of building it taught me something more valuable than any individual feature: the instinct to look at a repetitive problem and ask “why am I doing this manually?” is one of the most productive instincts an engineer can develop.

We accept small frictions as part of the job. We work around limitations instead of fixing them. We solve the same problem repeatedly instead of automating it once.

Log Stripper exists because I stopped accepting one of those frictions.

If you’re reading this and you have something similar – a small repetitive task, an annoying workflow step, a gap in your tooling – build the solution. The process will teach you more than you expect, and the tool will serve you for years.

Try Log Stripper:

VS Code Marketplace: https://marketplace.visualstudio.com/items?itemName=saurabhchoudhary.log-stripper

GitHub: https://github.com/saurabhzaiswal/Log-Stripper-VS-Code

I’m actively improving Log Stripper, so real-world feedback is incredibly valuable.

If Log Stripper helps keep a few forgotten debug statements out of production, then it has already paid for itself.

One-Second BLE Pairing: UX and Security Best Practices

  • Why the One-Second Pair Is the UX North Star
  • Choosing Pairing Modes with Speed and Security in Mind
  • Advertising and Scanning Patterns for Instant Discovery
  • Bonding, Reconnection, and Key Management
  • Handling Pairing Failures and User Recovery
  • Practical Checklist for One-Second Pairing

A one-second BLE pairing is not marketing fluff — it’s a systems design constraint. Delivering that blink-fast experience requires synchronizing advertising duty cycle, the selected pairing method, the OS scanner heuristics, and how keys are stored and resolved.

Devices that miss the one-second target show the same symptoms: frustrated users tapping “retry”, poor conversion on first use, and support tickets asking why setup takes so long. You’re seeing long discover times, repeated OS permission dialogs, or pairing stalls where encryption never completes — all of which typically point to mismatched radio schedules or an inappropriate pairing method for the device’s I/O capabilities.

Why the One-Second Pair Is the UX North Star

A fast pairing is the single interaction users remember. When pairing takes seconds rather than milliseconds the product feels unreliable; when it’s instant it feels invisible. For many consumer products the practical goal is to make the first-connect flow complete during the time a user has the phone in hand and attention focused — roughly one second. This means you must budget the sequence: discovery → connect → security handshake → service discovery, and tune each stage to shave milliseconds wherever possible.

  • Fast discovery only happens when the peripheral advertises aggressively while the phone actively scans with low-latency settings. The Android Fast Pair workstream demonstrates how OS-level orchestration and special BLE advertisements can dramatically reduce UI friction for first-time pairing and account association.
  • Security choice dominates the CPU/latency budget: LE Secure Connections uses P‑256 (ECDH) for authenticated key exchange and is cryptographically stronger than legacy pairing, but it consumes CPU and therefore time on constrained MCUs. Use the Bluetooth Security Manager specification as the reference for methods and their guarantees.
  • Advertising intervals and duty-cycle strategies are the practical lever you control in firmware; BLE profiles such as the Heart Rate Profile provide recommended fast/slow advertising cadence patterns (e.g., short aggressive burst windows followed by a long low-power period). Use those patterns as starting points for consumer-facing fast-pair flows.

Choosing Pairing Modes with Speed and Security in Mind

You need a decision framework rather than a single “best” method. Pairing modes trade user friction against MITM protection and CPU cost. The Bluetooth Security Manager enumerates the methods you can use (Just Works, Passkey Entry, Numeric Comparison, OOB) and clarifies which provide MITM protection.

Pairing Method MITM protection? User friction Speed (typical) Recommended when
Just Works No None Fast Headless sensors, initial quick-demo; only if threat model allows
Passkey Entry / Passkey Display Yes Medium (user types or reads) Moderate Devices with keypad or display
Numeric Comparison Yes Low–Medium (user taps confirm) Moderate Devices with simple display + phone UI
Out-of-Band (OOB) Yes (strong) Variable (requires external channel) Fast (if OOB already available) Paired ecosystems or secure provisioning

Concrete rules-of-thumb you can apply:

  • When the device has no input and no display, Just Works is the only practical initial option; mitigate risk by restricting services until a UX consent step happens in-app.
  • When the device can show a 6-digit code or accept a code, use passkey pairing for authenticated MITM protection when practical. The security properties are defined in the Security Manager.
  • Use OOB (NFC, QR provisioning) when you can — it moves the authentication off-air and can be fast and secure for first-time setup, but requires additional hardware and process changes.

Decision-tree pseudo-code (use this in firmware/product docs and as the basis for acceptance tests):

// Pseudocode: pairing_mode_select()
if (has_display && phone_ui_supports_numeric_comparison) {
    return NUMERIC_COMPARISON;
} else if (has_input_or_keypad && can_enter_passkey) {
    return PASSKEY_ENTRY;
} else if (oob_channel_available) {
    return OOB;
} else {
    return JUST_WORKS; // fallback, reduce exposed services until app consent
}

Cite pairing guarantees to the Bluetooth Security Manager for exact trade-offs.

Advertising and Scanning Patterns for Instant Discovery

Discovery is an on-air scheduling problem. Treat advertising as a budgeted resource: high duty cycle for the first 20–30 seconds, then back off. The Heart Rate Profile recommends an initial advertising interval of 20–30 ms for the first 30 seconds and then a lower interval to conserve battery. Use that exact two-phase pattern as your baseline for first-use UX.

Practical advertising primitives and how to use them:

  • Use connectable undirected advertising for first-time pairing; switch to directed advertising when reconnecting to a known central to get deterministic, near-instant reconnection. The Link Layer/GAP defines directed advertising and how the TargetA field lets you address a known peer using RPAs or identity addresses.
  • Keep advertising packets small and focused: include only the minimum AD fields required for discovery: Service UUID, short local name (if needed), and optionally the Tx Power Level AD field (AD Type 0x0A) to enable proximity heuristics on the phone.
  • For Android, prefer ScanSettings with SCAN_MODE_LOW_LATENCY and apply a ScanFilter for your service UUID so the OS spends fewer cycles and reports results immediately. The Android BLE guide documents these APIs and explains background vs foreground scanning behavior.
  • For iOS, use scanForPeripherals(withServices:options:) and be aware background scanning behaves differently — CBCentralManagerScanOptionAllowDuplicatesKey is ignored in background and the OS coalesces discovery events to preserve battery. Use service-filtered scans and state restoration for reliable reacquisition.

Example: peripheral advertising pattern (pseudo-C for Zephyr / Nordic SDK)

/* aggressive advertising for initial pairing */
const bt_le_adv_param adv_fast = BT_LE_ADV_CONN_NAME(
    BT_LE_ADV_OPT_USE_IDENTITY,  // generate RPA when appropriate
    0x0014, // 20 ms (0x0014 * 0.625ms => 20ms)
    0x001E  // 30 ms upper bound
);

bt_le_adv_start(&adv_fast, ad, ARRAY_SIZE(ad), sd, ARRAY_SIZE(sd));
/* after timeout, switch to slow adv: 1s - 2.5s */

Example: Android Kotlin scanner snippet (simplified)

val filter = ScanFilter.Builder()
    .setServiceUuid(ParcelUuid(UUID.fromString("0000feed-0000-1000-8000-00805f9b34fb")))
    .build()

val settings = ScanSettings.Builder()
    .setScanMode(ScanSettings.SCAN_MODE_LOW_LATENCY)
    .build()

bluetoothLeScanner.startScan(listOf(filter), settings, scanCallback)

Use allowDuplicates in foreground only when you need continuous RSSI updates or dynamic adv data; avoid it in general because duplicate callbacks cost CPU and power.

Important: Directed advertising for bonded peers gives the fastest reconnection but consumes controller/airtime and should only be enabled briefly when you expect an immediate reconnect. The Link Layer supports high- and low-duty-cycle directed adv modes; prefer low-duty-cycle unless low-latency reconnection is essential.

Bonding, Reconnection, and Key Management

Bonding is what makes the one-second reconnect possible. The security manager defines the keys exchanged during pairing: the Long Term Key (LTK), Identity Resolving Key (IRK), and optional CSRK. The LTK enables encrypted reconnects; the IRK enables resolvable private addresses (RPA) so devices can preserve privacy while still recognizing each other.

Operational checklist you must implement in firmware:

  • After a successful pairing that results in bonding, add the peer’s IRK/LTK to the Controller’s resolving list and (optionally) to the controller white list so the controller can resolve RPAs and filter events without waking the host. This reduces host wakeups and power.
  • Securely persist keys in protected flash with checksums and versioning. Corruption or an interrupted write must not leave the device with a partially valid bond — provide atomic updates or fallback staging area.
  • Implement a deterministic bond eviction policy (LRU or oldest-bond) and expose a clear OTA/maintenance path for handling exhausted bond storage on devices with limited NVM.
  • Protect LTKs and IRKs with hardware-backed crypto or secure enclaves when available; do not send keys to cloud backup unless you have a robust threat model and clear user consent.

How reconnection typically works:

  1. Central starts scanning (often filtered for service UUID).
  2. Peripheral advertises using an RPA; the controller resolves it using the resolving list (if populated), then the controller/host applies the white list policy and accepts the connection.
  3. On a reconnect, the central may send the Start Encryption Request using EDIV and Rand to allow the peripheral to look up the correct LTK and resume encryption without re-pairing.

Keep an eye on IRK lifecycle: if a device is reset or a bond is erased on one side the other peer will have stale entries in its resolving list; design the mobile app and device to handle this gracefully (clear stale entries or re-establish bond). Recent Bluetooth work also encourages randomized RPA update strategies that move address randomization into the controller for power and privacy benefits; follow the Core 6.x guidance for controller-offloaded RPA updates if your controller supports it.

Handling Pairing Failures and User Recovery

Pairing failures happen for a small set of repeatable reasons: MITM detected, incompatible IO capabilities, key mismatch after reset, or OS-level permission issues. The Security Manager defines Pairing Failed messages with error codes you can use to diagnose problems.

A robust recovery flow (embed this as telemetry events and a troubleshooting UI step):

  1. Detect and log the Pairing Failed error code and increment a per-device failure counter.
  2. On the mobile app, show a single concise instruction: “Put the device into pairing mode (hold X for Y seconds) — reconnecting will be automatic.” Avoid verbose security explanations. Use visuals; people scan for an instruction and the timer.
  3. If the device fails to respond after N attempts, trigger a bond reset option: this should clear the device’s local keys and the host-side bond (present “Forget this device” pattern). Make the reset action explicit and protected (long press / hardware button) so it’s not accidentally triggered.
  4. If automatic reconnection fails because of an RPA/IRK mismatch (common after factory reset of the peripheral), have the mobile app attempt a fresh discovery (no white-list) and present a guided re-pair flow; include a “factory reset” fallback path if necessary.

Diagnostics to report in logs and support tools:

  • HCI/LL events for advertisement reception and resolution success/failure.
  • Pairing Failed code and the IO capability negotiation values.
  • Key store status (number of bonds, last bond timestamp).
    Use that data to refine the device’s advertising window, pairing method, or NVM bonding capacity.

Practical Checklist for One-Second Pairing

Below is a deployable checklist you can use in sprint planning, firmware releases, and mobile-app acceptance tests.

Firmware checklist

  • [ ] Implement two advertising modes: fast initial (20–30 ms intervals for ~20–30 s) and slow background.
  • [ ] Support connectable undirected advertising for first-time pairing, and directed connectable advertising for fast reconnects to bonded devices.
  • [ ] On successful bonding: store LTK/IRK atomically, populate the Controller resolving list, and optionally add to the controller white list.
  • [ ] Provide a secure, user-accessible factory-reset method to clear bonds.

Mobile app checklist

  • [ ] Use OS filtering: Android ScanFilter + SCAN_MODE_LOW_LATENCY.
  • [ ] For iOS, scan for specific service UUIDs and implement state preservation/restoration for background reconnections.
  • [ ] Keep the pairing UI focused: one action, visible progress (0–100%), and clear failure text that maps to device hardware steps.
  • [ ] Implement robust “forget device” and “retry pairing” flows in the app with telemetry for failures.

Testing matrix (minimum)

  • First-time pairing: clean phone, clean device.
  • Reconnect after sleep: bonded device reconnects when in range.
  • Reconnect after peripheral reboot: keys present on phone, device restarted.
  • Reconnect after phone factory reset: peripheral must accept new bond.
  • Bond capacity: exceed N bonds and validate eviction policy.
  • RPA resolution tests: verify controller resolves RPAs when resolving list is full vs not full.

Sample acceptance test for “one-second” (practical)

  • Setup: phone screen awake, app in foreground, device 50 cm from phone.
  • Criteria: discovery + connect + secure pairing + service access completes < 1s in 9/10 runs; log distribution to find outliers. Use real-world reference phones, and measure with automated scripts as part of your QA runs. Note: certification testbeds (e.g., Fast Pair validator) have formal pass/fail metrics that can be stricter or different in scope.

Sources

Bluetooth Core Specification — Part H: Security Manager Specification – Definitions of pairing methods (Just Works, Passkey, Numeric Comparison, OOB), key distribution (LTK, IRK, CSRK), and Pairing Failed semantics used to reason about MITM and key-management trade-offs.

Bluetooth Heart Rate Profile (Profile guidance on advertising intervals) – Practical recommended advertising cadence (e.g., 20–30 ms fast window then slower background intervals) used as a baseline for consumer fast-pair flows.

Bluetooth Core Specification — Generic Access Profile & Link Layer (directed advertising, resolving list) – Rules for directed vs undirected advertising, resolvable private address (RPA) resolution and how the resolving list and target address fields work.

Bluetooth® Technology Blog — Randomized RPA Updates (privacy & controller offload) – Recent guidance on controller-offloaded/resolution and randomized RPA updates that affect privacy and power trade-offs.

Google Fast Pair Service — Introduction & BLE device spec – Fast Pair design and features that show how OS-level integration and a special BLE advertising flow reduce user friction for instant pairing.

Android Developers — Bluetooth Low Energy (BLE) Overview – Official Android guidance for scanners: ScanFilter, ScanSettings (low-latency), and background/foreground scanning behavior referenced for mobile-side orchestration.

Apple Developer — Core Bluetooth Background Processing for iOS Apps (archived) – Official Apple guidance on scanning and advertising differences when apps are in background, duplicate coalescing, and state preservation.

Bluetooth Assigned Numbers — AD Types & Characteristics (Tx Power, Reconnection Address) – AD Type mapping (0x0A = Tx Power Level) and GATT characteristic UUID references (e.g., Reconnection Address) for advertising payload design.

SimpleLink BLE5 Stack — GAP Bond Manager / Resolving List (TI docs) – Practical description of the resolving list and white list semantics and how controller-side lists are maintained for power-efficient reconnection.

Nordic DevZone — scanning/extended advertising discussion (practical Android/extended adv notes) – Field discussion and pointers about extended advertising, Android scanning incompatibilities (legacy vs extended), and practical developer observations when implementing modern advertising schemes.

A one-second pair is an orchestration problem: align your advertising, choose the right pairing method for the device’s I/O, populate the resolving/white lists on the controller, and design the mobile app to scan and connect aggressively only during the initial pairing window; when those pieces run in lockstep the pairing disappears into the background and your product feels polished.

Toolbox App 3.5: Better Remote Development Observability, More Reliable Enterprise Configuration, and Smoother Everyday Interactions

Toolbox App 3.5 focuses on making daily work smoother and managed development environments easier to monitor. The app now supports interface zooming with familiar shortcuts, provides OpenTelemetry metrics for enterprise remote development connections, and handles several long-standing reliability issues more gracefully.

Remote development observability

The Toolbox App now emits OpenTelemetry metrics for remote development connection latency and reliability. You can send them to Grafana, Datadog, Prometheus, or another OTEL-compatible stack to monitor connection health across your developer fleet.

Zoom controls

You can now zoom the Toolbox App interface using familiar keyboard shortcuts: Cmd/Ctrl + to zoom in, Cmd/Ctrl – to zoom out, and Cmd/Ctrl 0 to reset. The setting persists across restarts, so your preferred zoom level is preserved.

Cleaner update progress

Checking for updates no longer hides behind a generic spinner. You’ll now see what the app is checking, what it’s unpacking, and how far along it is – providing a clearer sense of progress.

Enterprise configuration

For enterprise customers using JetBrains IDE Services, the Toolbox App now sends static and dynamic headers together when communicating with backend services. Header updates are also pushed automatically to running IDEs – no need to restart an IDE to pick up new headers.

Bug fixes

  • IntelliJ-based IDEs no longer randomly disappear from the Toolbox App home view. 
  • Android Studio and other aliased IDEs keep their display name after updates.
  • The taskbar icon on KDE Plasma 6.6 and the tray and app icons on Pop!_OS now appear reliably.

Remote development fixes

  • SSH canonicalization failures no longer abort the connection.
  • The remote development environment list no longer shows an empty page when the canCreateNewEnvironments flag is set.
Download the latest version

We’d love to hear your thoughts on Toolbox App 3.5! Your feedback helps us improve the product, so please share your experience in the comments.

The JetBrains Toolbox App team

Top Agentic Frameworks for Building Applications 2026

In 2026, the world of AI is changing at a serious pace. The days of AI systems dealing solely in single-prompt interactions are coming to an end. Instead, these models are evolving into agentic systems – long-running, goal-driven software enabled by agentic frameworks that are becoming a critical layer in modern application architecture.

This rapid shift means that Python developers building autonomous systems are increasingly relying on agentic frameworks to manage reasoning, memory, tools, and collaboration among multiple agents.

You’ve probably already heard of some of the most popular frameworks. LangChain and AutoGen have risen to prominence, but there are dozens more, many of them open-source and only one to two years old. With so many frameworks promising different agentic capabilities, the real challenge is knowing which ones are best suited for the kind of application you want to build.

Let’s take a closer look at some of the most important agentic frameworks on the market in 2026, comparing what each does best and rating them based on our key comparison criteria to help you discover which is best for your projects.

What are AI agents?

An AI agent is a piece of software capable of autonomously reasoning, setting goals, and performing tasks on behalf of a user or another system. As the name suggests, AI agents have a level of agency to learn, adapt, and make decisions independently. This means they can improve their behavior and, over time, choose their own actions to achieve specific goals or outcomes.

AI agents work by following a perceive, reason, act, reflect (PRAR) cycle, which allows them to:

  • Perceive: Observe the environment, including user input, system state, tools, and memory, to understand the current context and constraints of the task.
  • Reason: Plan, make decisions, and select actions using a large language model (LLM) or hybrid logic.
  • Act: Execute actions like calling tools, updating memory, or triggering workflows.
  • Reflect: Evaluate the outcome of previous actions and adjust future decisions, plans, or prompts to improve results.

AI agents rely on the natural language processing capabilities of large language models, but unlike traditional LLMs and AI chatbots, they don’t require continuous user input to perform tasks. Agents are proactive, working autonomously to achieve a goal based on a specified set of rules and parameters.

What is an agentic framework?

An agentic framework provides the infrastructure needed to build, run, and control AI agents at scale. Most modern frameworks offer three core capabilities:

  • Orchestration: Controls how agents are sequenced, coordinated, or allowed to collaborate.
  • Tools: Define how agents interact with external systems like APIs or databases.
  • Memory: Sets out how agents retain and retrieve information across steps or sessions.

While it’s possible to build an agent without a framework, they’re vital in ensuring agents are reliable, scalable, and safe.

Agentic frameworks help turn experimental agent builds into maintainable software by facilitating:

  • Multi-agent coordination: When multiple agents communicate to plan, work together, and specialize in different areas of a task.
  • Human-in-the-loop (HITL) checkpoints: Intentional pause points where a human can review what an agent is about to do.
  • Observability, control, and reproducibility: The ability to see what an agent is doing, guide agent behavior, or re-run an agent and receive the same results.

Core orchestration paradigms

Before comparing individual frameworks, it’s important to understand how they operate. Let’s look at the three most commonly used orchestration models in 2026.

Graph-based orchestration

Graph-based orchestration provides maximum control by organizing agents and tools as nodes in a directed graph. Instead of letting an agent freely decide what to do next, the flow that agents are allowed to follow is clearly defined.

Strengths

  • More deterministic control: Predictable behavior is critical for production systems that require reliable results.
  • Easier debugging: Pinpoint exactly which node failed thanks to clear checkpoints and boundaries.
  • Production-grade reliability: This approach is ideal for customer-facing applications, enterprise systems, or regulated environments.

Limitations

  • More upfront design: The workflow must be defined in advance, which slows initial development.
  • Less “emergent” behavior: Agents are constrained by the graph, leaving less room for experimentation and creativity.

Role-based orchestration

Role-based orchestration is most effective when simplicity is a priority. Agents are assigned specific roles, such as “Planner”, “Researcher”, or “Builder”, and collaborate by sending messages to one another.

Strengths

  • Intuitive mental model: This type of operation is easy to understand because it effectively mirrors how human teams work.
  • Rapid prototyping: Minimal setup is required, allowing more time to explore outcomes.

Limitations

  • Harder-to-constrain behavior: Because agents have the freedom to decide what to do next, it’s difficult to enforce strict execution paths.
  • Limited determinism: The same input can yield different outcomes, making it tricky to reproduce results and achieve consistency.

Chain-based orchestration

Chain-based orchestration, also known as adaptive orchestration, arguably offers the greatest flexibility. Agents in this model operate in dynamic chains or loops, deciding the next step autonomously.

Strengths

  • Flexible workflows: Agents are not constrained to a pre-defined path and can freely explore different strategies.
  • Suitability for creative tasks: This approach is ideal for research, discovery, and experimentation, as agents can iteratively explore ideas, pivot strategies, and adapt their approach.

Limitations

  • Less predictability: Testing and debugging are more challenging because execution paths are harder to reproduce and trace.
  • More difficult governance at scale: This unpredictability grows as tasks become more complex.

Best agentic frameworks for your projects

Now that we’re familiar with the key orchestration paradigms of agentic frameworks, it’s time to compare some of the most popular frameworks on the market in 2026. Below, we evaluate each framework’s performance against our key comparison criteria:

  • Primary orchestration model.
  • Multi-agent support.
  • Memory capabilities.
  • Human-in-the-loop (HITL) support.
  • Best-fit applications.
Framework Orchestration model Multi-agent support Memory capabilities HITL support Best used for
LangChain Chain-based Partial Moderate Limited to moderate Rapid LLM app development
LangGraph Graph-based Yes Strong Strong Production-grade agent workflows
LlamaIndex Retrieval-centric Limited Strong Moderate Knowledge-heavy agents
Haystack Pipeline-based/modular Moderate Strong Moderate Production RAG and context-heavy AI systems
AutoGen Role-based Strong Moderate Limited Conversational multi-agent systems
CrewAI Role-based Strong Light Limited Task-oriented agent teams
Semantic Kernel Planner-based Moderate Moderate Strong Enterprise AI
smolagents Minimalist Limited Light Minimal Lightweight experiments
OpenAI Agents SDK Graph-based Yes Managed Strong Hosted agent applications
Phidata Agent-centric Limited to moderate Strong Moderate Data and tool-heavy agents

Let’s take a closer look at the strengths and weaknesses of each framework, along with the applications they’re most suited to.

LangChain

  • Core design: Chain-based orchestration.
  • Philosophy: Developer velocity and flexibility.

Launched in 2022, LangChain is one of the most widely adopted frameworks due to its broad ecosystem of integrations. It serves as an accessible interface for nearly any LLM and is an ideal starting point for enthusiasts or startups looking to explore agentic AI. While not strictly “agent-first”, it provides the building blocks for agentic behavior.

LangChain provides less control than other frameworks, but it’s still a fantastic entry point into agentic systems, especially for projects where speed and creativity take precedence over enforcing strict workflows.

Strengths

  • Huge ecosystem.
  • Easy tool integration.
  • Rapid prototyping.

Limitations

  • Less control than graph-based systems.
  • Agent logic that can be difficult to understand as it grows in complexity.

Best applications

  • Prototyping of agentic features.
  • Tool-augmented chatbots.
  • LLM-powered backend services.

If you want to go beyond the basics, read our LangChain Python Tutorial: A Complete Guide for 2026. It takes a deeper look at what LangChain offers and walks through real-world use cases for building AI agents in Python.

LangGraph

  • Core design: Graph-based orchestration.
  • Philosophy: Explicit control over agent behavior.

LangGraph has emerged as the leading standard for production-grade agent systems. Built on top of LangChain, it replaces implicit chains with explicit graphs, providing strict control over workflows and excellent HITL support via interrupts.

While the graph structure itself can actually make debugging easier by clearly mapping how agents and tools interact, LangGraph does come with a learning curve. Much of this complexity comes from designing the graph and managing explicit state between nodes. Once you understand these concepts, the framework becomes a powerful option for building predictable and controllable agent systems.

Strengths

  • Deterministic workflows.
  • Native state management.
  • Excellent HITL support via interrupts.
  • Suitability for regulated or mission-critical systems.

Limitations

  • Higher upfront design effort.
  • Steeper learning curve due to explicit graph and state management.
  • Reduced flexibility for open-ended tasks.

Best applications

  • Autonomous customer support systems.
  • AI-driven DevOps workflows.
  • Multi-step decision engines.

LlamaIndex

  • Core design: Retrieval-centric orchestration.
  • Philosophy: Data-first agents.

LlamaIndex is a Python framework designed to help AI systems understand, store, and retrieve information from large amounts of documents and data.

Rather than starting with agents and adding data later, LlamaIndex takes the opposite approach – it starts with data and then builds agent behavior around it. This is why it is often described as data-first or retrieval-centric.

Because it operates in this way, LlamaIndex excels at indexing, memory, and retrieval, making it ideal for building agents whose intelligence depends on accessing the right information rather than executing complex actions.

Strengths

  • Advanced document indexing.
  • Strong long-term memory patterns.

Limitations

  • Limited suitability for complex, action-heavy orchestration.
  • Limited support for multi-agent orchestration.

Best applications

  • Research assistants.
  • Knowledge base agents.
  • Enterprise document intelligence.

Haystack

  • Core design: Modular pipeline orchestration.
  • Philosophy: Context engineering and production-ready AI systems.

Haystack is an open-source AI orchestration framework created by deepset for building production-ready AI agents, retrieval-augmented generation (RAG) systems, and multimodal applications.

Instead of focusing purely on agent behavior, Haystack structures applications as explicit pipelines composed of retrievers, routers, memory layers, tools, evaluators, and generators. This modular architecture gives you control over how information flows through a system, allowing each component to be tested and improved independently.

Haystack is particularly strong in applications where the quality of retrieved information determines the quality of the model’s output. Its design also makes it well-suited for enterprise environments that require transparency and reliability in production systems.

Strengths 

  • Highly modular pipeline architecture.
  • Excellent support for RAG and document processing.
  • Strong ecosystem, particularly in search and RAG-focused enterprise use cases.
  • Flexible integrations with models and vector databases.

Limitations 

  • More infrastructure and setup than lightweight frameworks.
  • Less focus on emergent multi-agent collaboration.

Best applications

  • Retrieval-augmented generation (RAG) systems.
  • Enterprise document intelligence.
  • Data-heavy AI applications.
  • Production AI pipelines that require strong context control.

AutoGen

  • Core design: Role-based multi-agent collaboration.
  • Philosophy: Conversation-driven autonomy.

AutoGen, an open-source Microsoft framework, popularized the idea of agents collaborating through structured conversation, organizing systems as teams of agents, each with its own specific role. Unlike in other frameworks, there’s no central controller enforcing a strict execution path – the collaboration itself drives progress.

This approach makes AutoGen ideal for exploratory, creative, and research-driven multi-agent systems, at the cost of predictability, HITL, and strict execution control.

Strengths 

  • Natural multi-agent interaction. 
  • Minimal orchestration overhead. 
  • Suitability for emergent problem-solving. 

Limitations 

  • Limited execution control.
  • Weak HITL support.

Best applications

  • Coding agents.
  • Brainstorming systems.
  • AI research experiments.

CrewAI

  • Core design: Role-based task delegation.
  • Philosophy: Teams of specialized agents.

CrewAI is centered around building simple, structured multi-agent systems. It is similar to AutoGen, modeling AI agents as members of a “crew” where each agent has a clearly defined role. The goal is to make multi-agent systems approachable, even if you are new to agentic AI.

CrewAI prioritizes simplicity and speed over deep memory and production controls, making it easy to learn and a strong option for prototypes and small teams. However, its limited toolset for observability, HITL, and error handling at scale makes it less suited for larger systems.

Strengths

  • Very approachable API.
  • Clear role separation.
  • Fast setup.

Limitations

  • Lightweight memory.
  • Limited production controls.

Best applications

  • Content pipelines.
  • Market research automation.
  • Simple workflow agents.

Semantic Kernel

  • Core design: Planner-based orchestration.
  • Philosophy: Enterprise-grade AI integration.

Semantic Kernel is another open-source Microsoft framework, designed for building AI-powered applications that integrate with existing enterprise systems.

It was created with production concerns in mind from the start, emphasizing governance, safety, observability, and human oversight. Rather than maximizing agent autonomy, it focuses on making AI predictable, controllable, and auditable.

By combining structured workflows with LLM reasoning, it trades flexibility and emergent behavior for trust, safety, and operational reliability.

Strengths

  • Strong HITL support.
  • Enterprise-friendly architecture.
  • Good observability.

Limitations

  • Heavier upfront structure.
  • Less flexibility for open-ended autonomy.
  • Steeper learning curve.

Best applications

  • Internal enterprise tools.
  • AI copilots.
  • Business process automation.

smolagents

  • Core design: Minimalist chain-based.
  • Philosophy: Simplicity over scale.

smolagents is a bare-bones framework designed to make agentic AI as straightforward and transparent as possible. It prioritizes simple, readable code that makes it easy to understand how an agent works without needing to learn a large framework.

smolagents aims to make agent behavior accessible and easy to experiment with by keeping abstractions minimal and logic transparent. It offers first-class support for code-based and tool-calling agents, broad model and tool compatibility, and lightweight CLI utilities, while intentionally trading large-scale orchestration and production features for simplicity and clarity.

Strengths

  • Extremely lightweight design.
  • High degree of transparency.
  • Fast experimentation.

Limitations

  • Limited suitability for scaling
  • Minimal production features.

Best applications

  • Educational projects.
  • Proofs of concept.
  • Lightweight local agents.

OpenAI Agents SDK

  • Core design: Managed workflow-driven orchestration (often graph-based).
  • Philosophy: Hosted, production-ready agents.

Thanks to ChatGPT’s explosion in popularity, we’ve all heard of OpenAI. The Agents SDK is the company’s effort to provide a managed platform for building and running agents without having to maintain your own orchestration infrastructure.

Rather than assembling agents from scratch, you define agent behavior and workflows, while OpenAI provides orchestration, memory management, monitoring, and safety controls. This makes the Agents SDK particularly attractive for teams that want production-ready agents quickly.

Strengths

  • Minimal infrastructure burden.
  • Built-in safety and observability.
  • Strong multi-agent support.

Limitations

  • Reduced customization and control.
  • Limited suitability for experimental research.

Best applications

  • SaaS agent features.
  • Customer-facing autonomous systems.
  • Teams prioritizing speed over customization.

Phidata

  • Core design: Agent-centric, tool-heavy.
  • Philosophy: Practical agents for real-world data tasks.

Phidata is designed for building practical, tool-driven AI agents that operate on real-world data.

Rather than focusing on abstract orchestration patterns, Phidata centers the agent around direct interaction with systems such as APIs, databases, and internal services.

Its design reflects the fact that many agents spend most of their time fetching, transforming, and acting on data.

Strengths

  • Strong tool integration.
  • Suitability for data-centric workflows.

Limitations

  • Less emphasis on orchestration.
  • Limited multi-agent capabilities.

Best applications

  • Data analysis agents.
  • Finance and ops automation.
  • Tool-driven decision systems.

Choosing the right framework

Now that you’re familiar with many of the most popular frameworks in 2026, it’s time to choose the right one for your project. Let’s take a look at some of the key use cases, along with the frameworks that fit them best.

Orchestration model Where to use Recommended frameworks
Graph-based Projects involving complex branching logic and requiring high levels of reliability, auditability, and control. LangGraph, OpenAI Agents SDK
Role-based Projects involving rapid development and intuitive design that benefit from emergent collaboration between agents. AutoGen, CrewAI
Chain-based Projects requiring maximum flexibility, where agents need to adapt dynamically and determine next steps autonomously. LangChain
Retrieval-based Projects where deep, reliable access to knowledge matters more than high levels of autonomy. LlamaIndex, Haystack
Enterprise-oriented Projects where strong governance and human-in-the-loop processes are non-negotiable requirements. Semantic Kernel
Lightweight Rapid prototyping, educational use, and simple local agents where transparency and control matter more than orchestration complexity. smolagents
Tool-centric Building production agents that primarily interact with APIs, databases, and external systems rather than complex multi-step orchestration. Phidata

In 2026, agentic frameworks have evolved from experimental tools into foundational infrastructure for many applications. The key decision is no longer whether to use agents, but how much control, autonomy, and governance your systems require.

Your NAS Is Loud Because of Docker (and How to Fix It)

You buy a NAS for silent, always-on storage. It sits in a corner, humming quietly, doing its thing.

You installed Docker on it for the same reason I did: to save money. Every open-source service you’d otherwise pay a VPS for — your media server, your download automation, your file sync, your home automation bridge — all of it can run on the NAS for free. No monthly VPS bills, no cloud subscriptions, no $5/mo here and $10/mo there that add up to a second rent. Just one box, your box, doing everything.

The problem is that Docker wasn’t designed for spinning disks.

And suddenly the HDDs never stop. Seeking, spinning, clicking, whirring — not occasionally, not every few minutes, but constantly. At 2am you can hear it from the next room. Through a closed door. It drives you insane because you bought this thing specifically so it would not make noise.

I know, because I lived with it for months. Every night, the same clicking. Every morning, the same relief when the TV drowned it out. The NAS was supposed to be invisible, and instead it was the loudest thing in the house.

Here’s what causes it, and how I went from that to 99.9% less noise in one afternoon.

What’s Actually Causing the Noise

Mechanical HDDs make noise when the read/write head moves. The more random the I/O — small reads and writes scattered across the disk — the more seeking, the more noise. Sequential writes to a single file are quiet. Random I/O across thousands of small files is loud.

Docker is pathological for HDDs.

Docker overlay2

Docker’s default storage driver is overlay2. Every container runs on top of layered filesystems — the image layers are stacked, and a thin writable layer sits on top for each running container.

Every file operation inside a container that touches a file from a lower layer triggers a copy-on-write: the entire file gets copied up to the writable layer before the write happens. On an SSD this is fast and silent. On spinning HDDs with mechanical heads, every copy-on-write is a seek, a read, and a write — often scattered across the disk.

And it’s not just copy-on-write. Docker’s overlay2 metadata lives in small files across a deep directory tree. Container startup reads dozens of these. Log rotation writes to them. Health checks touch them. Any container doing anything at all generates constant scattered I/O.

Now multiply that by however many containers you’re running. Every single one is generating random I/O all day, every day. The HDDs never get a break.

Everything Else Piling On

Beyond Docker, a typical homelab NAS has:

  • System monitoring tools running on cron (every 5-10 minutes, writing stats to disk)
  • systemd journal flushing logs
  • The NAS OS itself doing housekeeping

None of these alone would be noticeable. But on top of Docker’s constant churn, the drives never spin down. Not for a second.

Diagnosing Which Process Is the Problem

Before moving anything, confirm what’s actually hitting the disk:

# Real-time I/O per process (needs sysstat)
sudo iotop -o

# Disk utilization over time
iostat -x 2 10

On a typical Docker setup you’ll see the Docker daemon at the top, with periodic spikes from cron jobs.

The Fix: Move Docker Off the HDDs

The HDDs are loud because they’re doing work they shouldn’t be doing. The solution is to give that work to something that doesn’t make noise.

An external SSD connected via USB is cheap, silent, and fast enough for everything Docker needs. USB 3.0 to a SATA SSD delivers 400+ MB/s — far more than any container workload requires.

The goal: the HDDs only handle the NAS OS and your actual data (media, documents, backups). Everything Docker-related moves to the SSD.

What to Migrate

Docker data-root — the overlay2 layers, image cache, container writable layers. This is the biggest source of random I/O and the highest-impact thing to move.

# /etc/docker/daemon.json
{"data-root": "/mnt/external-ssd/@docker"}

Bind-mount volumes — the persistent data your containers read and write (databases, config files). If these live on the HDD, container writes hit the HDD. Move them to the SSD.

For bind mounts, a symlink keeps things transparent — containers keep using the same paths, no reconfiguration needed:

ln -s /mnt/external-ssd/volumes /original/volumes/path

What Stays on the HDDs

  • The NAS operating system
  • System logs
  • Your actual data files (documents, media, backups) — these have sequential I/O patterns that HDDs handle well and don’t cause the constant seeking noise

A Note on Copying Files (UGOS Pro Caveat)

If your NAS runs UGOS Pro (UGREEN’s Debian-based OS), there’s a critical gotcha: rsync will silently corrupt permissions when copying from the NAS filesystem to an external drive. This is caused by proprietary kernel-level xattr hooks in UGOS Pro.

The fix and the full technical explanation are in a separate post: Why rsync Destroys Permissions on UGOS Pro — and the Only Fix That Works

Short version: use tar --xattrs-exclude='ug.*' instead of rsync for any file copy on UGOS Pro.

Mount the SSD Correctly

# /etc/fstab
UUID=<your-uuid> /mnt/external-ssd ext4 defaults,noatime,nofail 0 2

Two flags matter:

  • noatime — disables access time updates on every file read. Eliminates a whole class of unnecessary writes.
  • nofail — if the SSD disconnects and the NAS reboots, it boots normally instead of hanging at the fstab error.

The Result: Silence

After the migration, the difference is night and day. Before, the HDDs were seeking constantly — a low but relentless clicking that never stopped, day or night. It was the kind of noise you don’t notice during the day but drives you crazy at 3am when the house is silent.

Now? Nothing. The HDDs spin up occasionally — a system log flush, a cron job writing stats — but the constant background noise is gone. The drives spend most of their time parked, doing what they were designed to do: sit there quietly and hold your data.

I’d estimate 99.9% reduction in audible HDD activity. The clicking that used to drive me insane? Completely gone. The NAS is back to being the quiet box in the corner it was always supposed to be.

You don’t realize how much that noise was bothering you until it stops. Trust me.

If your NAS runs UGOS Pro, read the companion post before attempting the migration — the rsync issue will cost you time if you hit it blind: Why rsync Destroys Permissions on UGOS Pro

The Developer Job Description Quietly Rewrote Itself in 2026. Did You Notice?

Here’s a number that will either reassure you or unsettle you depending on where you sit in the developer hierarchy.

Workers with AI skills earned 56% more than colleagues doing the same jobs without those skills in 2026. The year before, that premium was 25%. The year before that, 18%. It’s not flattening. It’s accelerating.

Here’s the other number: senior software developers saw a 10% salary decline year-over-year. Senior. Not junior. Not mid-level. Senior developers with years of experience watched their market rate fall while a subset of their peers with specific AI fluency saw theirs climb at 9.2% annually.

The job didn’t collapse. It forked.

One fork leads toward a market where your value is measured by how much code you produce, and AI has made that kind of value dramatically cheaper. The other fork leads toward a market where your value is measured by whether you can design systems, direct agents, govern output quality, and make architecture decisions that AI can’t make on its own.

The developers choosing the second fork — deliberately, not accidentally — are the ones whose compensation trajectories look anomalous in a flat market. This blog is about how they made that choice and what it actually looks like in practice.

What happened to the job description

The traditional senior developer job description, simplified, looked like this: deep expertise in a specific language or framework, strong ability to write correct and maintainable code, ability to debug complex problems, ability to mentor juniors and review PRs.

The 2026 version of that job description at a well-run engineering organisation looks notably different. The emphasis has shifted from writing to directing, from syntax to architecture, from individual contributor to systems thinker.

Addy Osmani, engineering leader at Google, described the shift as moving from coder to conductor to orchestrator. At the start of 2024, AI-assisted programming resembled a significantly improved autocomplete. By 2026, the transition is to agent systems that operate on codebases over time.

What that means concretely: the most valuable thing a senior engineer can do in 2026 is not write a feature. It’s define the architecture that lets an agent write the feature correctly. It’s create the context files that make agents generate consistent, idiomatic code. It’s build the test infrastructure that catches what agents get wrong. It’s review the agent’s output with the judgment of someone who understands the system deeply enough to spot the subtle errors.

The skills that mattered most in 2021 — fast accurate implementation, deep framework knowledge, encyclopaedic API recall — are the exact skills that AI tools have made cheaper. The skills that matter most in 2026 are the ones AI tools don’t have: judgment, architectural thinking, understanding of tradeoffs, and the ability to know when AI output is wrong even when it looks right.

The two roles senior developers are splitting into

Senior developers in 2026 are effectively splitting into two categories: code validators and architects. If you choose validation — code review — you’ll be in demand but burned out.

That framing is a little stark, but it captures something real. Here’s a fuller picture of what each path looks like.

Path 1: The AI Orchestrator

This is the role that has the strongest compensation trajectory and the most clearly defined skill set.

An AI orchestrator doesn’t primarily write code. They design the system within which agents write code. They author the context files, rules, and architectural boundaries that guide agent behaviour. They define acceptance criteria that agents must satisfy. They review agent output for architectural soundness, not just syntactic correctness. They manage the MCP integrations that give agents access to the right tools. They run parallel background agents on different parts of the codebase and integrate the outputs.

The skills this requires are old and new simultaneously. Old: system design, domain modelling, architectural pattern knowledge, understanding of non-functional requirements (performance, security, scalability). These are the skills that take years to develop and can’t be replaced by prompt engineering. New: context engineering, agentic workflow design, knowing which tasks to delegate and how to brief them, governing AI output quality at scale.

The new standard for senior developers is context orchestration — the ability to guide agentic IDEs that understand the entire repository, documentation, and architectural patterns.

Path 2: The Specialist Who Goes Deeper

Not every developer needs to become an AI orchestrator. The other path with a strong compensation trajectory is going deep in areas where human expertise is still clearly irreplaceable and where AI tools augment rather than replace.

Security engineering is one example. The tools generate insecure code at a documented rate and are worst at the subtle architectural vulnerabilities that only deep security knowledge catches. Security engineers who understand the full threat model, can trace attack surfaces through AI-generated code, and can design systems that are secure by construction are in shorter supply in 2026 than they were in 2024, not longer. The AI made the problem bigger, not the expertise less valuable.

Performance engineering is another. AI generates functionally correct code. It doesn’t generate code that’s been profiled against your actual traffic patterns, optimised for your specific hardware characteristics, or designed with the full operational context in mind. The developer who can take AI-generated output and make it production-ready from a performance standpoint is doing work AI doesn’t do.

Distributed systems architecture is a third. The agent can implement a microservice. It can’t decide whether you should have a microservice, whether the boundaries are right, whether the consistency model is appropriate, or whether the failure modes are acceptable for your specific reliability requirements. Those decisions require human judgment informed by years of building things that broke in unexpected ways.

GenAI Engineer and MLOps Specialist postings are growing at 2–3× the rate of traditional roles year-over-year. Developers who layer AI fluency on top of solid software fundamentals are commanding salary premiums of 15–25% above peers without AI skills.

The salary data, stated plainly

AI Engineers earn 12% more than general Software Engineers in equivalent roles and levels. The premium is higher for specialised LLM developers, who average $209,000. Mid-level AI engineers saw the highest year-over-year gains at 9.2%. Senior software developers without AI specialisation saw a 10% salary decline year-over-year.

Workers with AI skills earn 56% more than same-role colleagues without those skills — a premium that has more than doubled from 25% in 2024. The premium accrues almost entirely to experienced practitioners with genuine production depth. Surface familiarity with AI tools is now table stakes, not a differentiator.

That last sentence matters more than the headline number. The 56% premium doesn’t accrue to developers who have used Copilot for six months. It accrues to developers who have built production AI systems, who understand the failure modes, who have fixed things when they broke, and who can make architectural decisions about where AI belongs in a system and where it doesn’t.

The premium is accessible. It’s not easy.

What “AI fluency” actually means for career development

There’s a version of this conversation that reduces to “learn prompt engineering.” That’s not what the salary data is tracking. Here’s what it’s actually tracking.

System-level thinking about AI integration. Understanding where AI agents belong in a technical architecture — what they should be trusted to do autonomously, what should require human review, what should never be delegated. This requires understanding the failure modes well enough to design around them. Developers who have shipped things that broke because of AI output and understand exactly why are more valuable than developers who have only shipped things that worked.

Context engineering as a discipline. The ability to create the information environment — CLAUDE.md files, rules files, architecture decision records, system prompts — that makes agents perform reliably and consistently. This is not writing prompts. It’s designing the persistent layer of institutional knowledge that makes AI tools behave like they understand your system.

Quality governance at scale. As AI generates more code faster, the bottleneck shifts to quality validation. The ability to review AI output efficiently, catch what’s subtly wrong, and build automated validation that scales — test suites, linters, security scans, architecture fitness functions — is increasingly what distinguishes teams that ship reliably from teams that ship fast and break things.

MCP and tool integration design. Deciding which external tools agents should have access to, what permissions are appropriate, how to scope access safely, and how to design workflows that use agents effectively for real tasks. This is infrastructure thinking applied to AI tooling.

None of these skills are learned by watching tutorials. They’re learned by building things, having them break, understanding why, and building better the second time. The developers who are making the transition well are the ones who have enough production experience that their AI fluency is anchored in real understanding — not just fluency with the tools themselves.

The honest roadmap for developers who want to make this shift

If you’re a developer who has been coding for three-plus years and wants to shift toward the higher-value end of this market, here’s what that actually looks like as a practice.

Build and break a production AI system. Not a toy project. Something with real traffic, real data, and real failure modes. The goal is to experience what happens when an AI component behaves unexpectedly in production — hallucination, context drift, reasoning errors under edge cases — and to understand how to build systems that handle those failures gracefully. This experience is what the premium is actually paying for.

Get deep on context engineering. Spend a week doing nothing but improving your CLAUDE.md and rules files and measuring the output quality difference. Build architecture decision records for your team’s non-obvious conventions. Create a Codebase Orientation document for the project you know best. This practice builds the intuition for what makes AI tools effective at the system level.

Take ownership of your team’s AI governance. If your team is using AI tools without systematic quality governance, offer to build it — the code review standards for AI-generated code, the security review checklist, the test coverage requirements. This work is undervalued, genuinely useful, and directly demonstrates the skills the market is paying for.

Write the architectural documentation that AI tools need. Go through your codebase and document the decisions that AI tools consistently get wrong because they don’t have the context. Why does the payments module use optimistic locking? Why is the notification system pull-based rather than push-based? Why does the user authentication flow have that seemingly redundant step? This documentation is valuable for the AI tools that will read it, valuable for the humans who will onboard to this codebase, and it demonstrates exactly the kind of architectural understanding that commands premium compensation.

Pick one deep specialisation and go all the way in. The developers with the strongest compensation trajectories in 2026 are not generalists with light AI fluency. They’re specialists — security engineers, performance engineers, distributed systems architects — with deep expertise in areas where AI tools create new problems rather than solving them, augmented by AI tool fluency that makes them dramatically more productive. Pick the specialisation that matches your existing depth and go deeper, not sideways.

What the next two years look like

The transition from coder to orchestrator is already taking place and is expected to be mostly complete by 2028. The transition represents a fundamental shift in how software is created — humans will spend less time writing code and more time directing AI agents to do so.

The framing of “will AI take developer jobs” is the wrong question. The right question is: which developer skills will be more valuable in two years and which will be less, and how do you make sure you’re developing the right ones?

The skills that will be less valuable: fast accurate implementation of known patterns, deep memorisation of framework APIs, the ability to write boilerplate quickly. AI has made all of these abundant.

The skills that will be more valuable: system architecture, security judgment, performance engineering, the ability to specify precisely what you want and verify that you got it, understanding of AI failure modes, and the accumulated intuition that comes from building production systems and watching them break.

AI didn’t change what matters. It just made the shortcuts disappear. The developers who thrived by “knowing the frameworks” are now struggling. The ones who understood why systems work the way they do are thriving.

The market has not been subtle about this. The compensation data is clear and the direction of the premium is consistent. The developers who are positioned well are the ones who treated the last two years as an opportunity to go deeper — not the ones who used AI tools as a reason not to.

Originally published on ZyVOP

Stop Fearing the Blinking Cursor: Overcoming Terminal Anxiety

The first time I opened a terminal, I stared at it for a solid minute.

No buttons. No menus. No helpful icons.

Just a black screen and a blinking cursor that seemed to be asking, “Well? What are you waiting for?”

If you’ve ever felt intimidated by the command line, you’re not alone. The terminal has an unfair reputation. Movies portray it as a mysterious tool used by elite hackers, while online forums are full of stories about people accidentally deleting important files with a single command.

It’s enough to make any beginner nervous.

The most common fear is simple:

“What if I type the wrong thing and break my computer?”

The good news is that the terminal is far less dangerous than most people think. In reality, it’s not a weapon waiting to be misused—it’s a conversation between you and your operating system.

Like any conversation, once you learn a few basic words, everything starts to make sense.

In this article, we’ll strip away the mystery and focus on a handful of beginner-friendly commands that help you navigate your system confidently. By the end, you’ll understand how to move around, find your way back when you’re lost, and—most importantly—stop being afraid of that blinking cursor.

The Myth of the “Self-Destruct Button”

The Fear

“I’m going to destroy my computer by typing the wrong thing.”

It’s a reasonable concern. The terminal looks powerful because it is powerful.

The Reality

Your operating system has safeguards in place. Most potentially dangerous actions require elevated privileges, often through commands such as sudo, which stands for SuperUser Do.

Without those permissions, your computer actively prevents many system-level changes.

For beginners learning navigation commands, you’re essentially exploring in a safe environment. You’re looking around, not rewiring the house.

Think of the Terminal as Read-Only Exploration

Before changing anything, let’s learn how to look around.

These commands are completely safe because they only display information.

pwd — Where Am I?

Think of pwd as your GPS.

pwd

Output:

/home/brendan/projects

It simply tells you your current location in the filesystem.

No files are modified.
Nothing is deleted.
Zero risk.

ls — What’s Here?

Think of ls as opening your eyes.

ls

Output:

Documents Downloads Pictures projects

It lists the contents of your current directory.

Again, nothing changes. You’re simply observing.

When I was learning the terminal, I probably ran ls hundreds of times because I was constantly checking where I was and what was around me.

That’s completely normal.

Taking Control: Moving Around

Once you know where you are, it’s time to move.

cd — Change Directory

Imagine clicking through several folders in a graphical file manager:

Documents → Projects → Portfolio → Website

In the terminal, you can jump there instantly:

cd Documents/Projects/Portfolio/Website

No clicking required.

The more projects you work on, the more you’ll appreciate how much faster this becomes.

cd .. — The Universal Undo

Made a wrong turn?

Use:

cd ..

This moves you back one directory level.

Think of it as the terminal equivalent of saying:

“Actually, take me back.”

It’s one of the most useful commands you’ll ever learn.

What to Do When You Get Stuck

Every beginner eventually runs into a moment where:

  • A command won’t stop running.
  • The screen fills with text.
  • Something looks confusing.
  • Panic starts creeping in.

Fortunately, the terminal comes with built-in escape hatches.

Ctrl + C — Emergency Stop

If a command seems stuck, press:

Ctrl + C

This tells the terminal:

Stop whatever you’re doing and give me my cursor back.

Learning this shortcut instantly made me more confident because I realized I wasn’t trapped if I made a mistake.

clear (or Ctrl + L) — Start Fresh

When your screen becomes cluttered:

clear

Or use:

Ctrl + L

The terminal clears the screen and gives you a clean workspace.

Sometimes the best debugging technique is simply removing the visual chaos and starting again.

Final Thoughts

The terminal isn’t scary because it’s dangerous.

It’s scary because it’s unfamiliar.

Every developer, system administrator, DevOps engineer, and cybersecurity professional started exactly where you are now: staring at a blinking cursor and wondering what to type next.

The secret isn’t knowing hundreds of commands.

It’s knowing a few basic ones well enough to explore confidently.

Start with:

  • pwd
  • ls
  • cd
  • cd ..
  • clear

And remember your panic button:

  • Ctrl + C

Once you master these, that blinking cursor stops looking intimidating and starts looking like an invitation.

Do you remember your first experience with the command line? What surprised you the most?