Uncategorized

Xoul – Local Personal Assistant Agent Release (Beta, v0.1.0-beta)

Xoul — An Open-Source AI Agent That Runs Locally

Introducing Xoul, a personal assistant agent powered by local LLMs and virtual machine isolation.

What Is Xoul

Xoul is a personal AI agent. It’s not a chatbot — it manages files, sends emails, browses the web, and runs code at the OS level. All actions run inside a QEMU virtual machine, keeping the host system untouched. When using a local LLM, personal data never leaves the machine.

Key Features

  • 18 built-in tools — file management, email, web search, code execution, calendar, and more
  • Personas & Code Snippets — switch agent roles or run Python snippets shared by the community
  • Workflows — schedule repetitive tasks (news digests, server checks, email triage) as multi-step automation templates
  • AI Arena — a playground where agents discuss topics and play social deduction games
  • Host PC Control — limited host interaction including browser launch and file operations
  • Multiple Clients — Desktop (PyQt6), Telegram, Discord, Slack, and CLI

Architecture

The Xoul agent runs inside a QEMU virtual machine. LLM inference is handled locally on the GPU via Ollama, while the desktop app serves as the host-side UI. VM isolation ensures the host system stays safe regardless of what the agent does.

Beyond local LLMs, Xoul also supports commercial APIs (Claude, GPT-5, Gemini, DeepSeek, Grok, Mistral) and external OpenAI-compatible servers (vLLM, LM Studio, etc.).

Supported Models

For local execution, models are automatically recommended based on available VRAM:

Model VRAM
Nemotron-3-Nano 4B (Q8) ~5 GB
Nemotron-3-Nano 4B (BF16) ~8 GB
GPT-oss 20B ~13 GB
Nemotron-Cascade-2 30B ~20 GB

BGE-M3 (embedding) and Qwen 2.5 3B (summarization, CPU-only) are also installed automatically.

System Requirements

Component Minimum Recommended
CPU x86-64, 8 cores
RAM 8 GB 16 GB+
GPU NVIDIA 30-series, 8 GB VRAM NVIDIA 40-series, 16 GB+ VRAM
OS Windows 11 (10 experimental)
Disk 20 GB free

Installation

Quick Start

  1. Download the release file
  2. Extract xoul_rel.zip
  3. Run install.bat inside the extracted folder

install.bat handles file placement, dependency installation, and configuration automatically. Python 3.12, Ollama, and QEMU are installed as needed. An interactive setup walks through language selection, LLM model, VM configuration, user profile, and optional service integrations (Gmail, Tavily, Telegram, etc.).

Install from Source

git clone https://github.com/xoul-project/xoul.git
cd xoul
.scriptssetup_env.ps1

Once setup completes, the Desktop App launches automatically. After that, you can start it with c:xouldesktopxoul.bat.

Community

Through Xoul Store, you can import workflows, personas, and code snippets created by other users with one click. You can also publish your own.

License

Released under the MIT License.

Links

  • Website: https://www.xoulai.net/
  • GitHub: https://github.com/xoul-project/xoul
  • Discussions: https://github.com/xoul-project/xoul/discussions

Kodee’s Kotlin Roundup: Kotlin 2.3.20, Interview With Josh Long, and More

March was a busy month for Kotlin, with a new language release, fresh tooling, ecosystem updates, and plenty of inspiration ahead of KotlinConf’26. From practical improvements to exciting steps in AI and multiplatform, there’s a lot worth exploring. Here are the stories that stood out to me most.

Where you can learn more

  • Workshops – KotlinConf 2026, May 20–22, Munich
  • Spring AI Kotlin Tutorials – Build AI-Powered Applications
  • Google Summer of Code 2026 Is Here: Contribute to Kotlin
  • Elevating AI-Assisted Android Development and Improving LLMs With Android Bench

YouTube highlights

  • Explicit Backing Fields are experimental in Kotlin 2.3
  • Kotlin Devs Diversify: Android Is 25% Now
  • How Major Metros Run on Kotlin Multiplatform | Talking Kotlin #145

Amper 0.10 – JDK Provisioning, a Maven Converter, Custom Compiler Plugins, and More

Amper 0.10.0 is out, and it brings a variety of new features, such as JDK provisioning, custom Kotlin compiler plugins, a Maven-to-Amper converter, and numerous IDE improvements! Read on for all of the details, and see the release notes for the full list of changes and bug fixes.

To get support for Amper’s latest features, use IntelliJ IDEA 2025.3.4  or IntelliJ IDEA 2026.1 (or newer). Make sure the latest version of the Amper plugin is installed. 

JDK provisioning

Amper needs a JDK (Java Development Kit) in order to perform various tasks in the project: compile Kotlin and Java sources, run tests, run JVM apps, etc.

Our philosophy is that you should be able to run your project without manually installing anything on your machine or having to configure anything. This is why Amper is able to provision a JDK automatically for you – JDK 21 by default. 

However, some projects require specific JDK versions. You can now specify the criteria for the necessary JDK in module.yaml, and Amper will download and install the matching JDK.

settings:
  jvm:
    jdk:
      version: 21 # major version
      distributions: [ zulu, temurin ] # acceptable distributions

Amper also takes the JAVA_HOME environment variable into account, since it is a common way to set the JDK to be used on the machine. You can read more about Amper’s JDK provisioning behavior in the documentation.

Maven converter and Maven plugin compatibility

If you have an existing Maven project, you don’t have to rewrite your build configuration from scratch. This release introduces a semi-automated conversion tool that reads your pom.xml files, including those in multi-module reactor projects, and generates the corresponding project.yaml and module.yaml files for you. To use it, simply run:

./amper tool convert-project

The converter maps your dependencies, BOMs, repositories, publishing coordinates, compiler flags, and other settings to their Amper equivalents. To support using both build systems during the transition, it sets layout: maven-like in every module so that your source directory structure, including src/main/java and src/main/kotlin, stays the same and no files need to be moved.

Well-known Maven plugins such as maven-compiler-plugin and spring-boot-maven-plugin are translated into built-in Amper settings. Other Maven plugins are added to the new mavenPlugins configuration section in module.yaml, and Amper can execute them during the build process through our Maven plugin compatibility layer.

The conversion is best-effort, so some projects may require tweaks afterward. For a full walkthrough and a list of limitations, see the documentation.

Kotlin compiler plugins

This release brings support for third-party Kotlin compiler plugins. Enabling this support is as easy as adding the following to module.yaml:

settings:
  kotlin:
    compilerPlugins:
      - id: org.example.my.plugin
        dependency: org.example:my-plugin:1.0.0
        options:
          myKey1: myValue1
          myKey2: myValue2

See the documentation for examples and how to enable IDE support for custom plugins.

We also added built-in support for the kotlinx.rpc and JsPlainObjects compiler plugins. 

IDE improvements

Reworked UX for running Amper commands

We’ve revisited the UI for creating and editing run configurations in the IDE. New custom views allow you to configure the options for run and test commands in a more convenient way:

Additionally, you can now create a configuration for any Amper command by choosing Amper in the Add New Configuration menu:

If you want to run a command in an ad-hoc way, you can use Run Anything (Ctrl+Ctrl) and prepend your command with amper:

Run gutters for native applications in module.yaml

Native applications (linux/app, macos/app, windows/app) can now be run from the IDE via the gutter:

Better test names in the Test tool window

The @DisplayName and @ParameterizedTest.name JUnit 5 annotations are now respected in the Test tool window when showing the test execution tree.

@ParameterizedTest(name = "Test #{0}")
@DisplayName("My parameterized test")
@ValueSource(ints = [1, 2, 3])
fun parameterized(i: Int) {}

Ktor plugin assistance

If your module has the Ktor server dependency, the module.yaml file provides support for searching and adding plugins via the Add Plugins… inlay:

Alternatively, you can use completion in the Kotlin code, which will add all the necessary dependencies to the module without you even having to touch the module.yaml file:

Support for profiling JVM applications

Note: This feature requires IntelliJ IDEA Ultimate.

The configuration for the run command in jvm/app modules can now be run using IntelliJ IDEA’s support for profilers:

Amper plugin development

The previous release of Amper brought the preview of Amper’s extensibility system. We received a lot of feedback, and we are working on extending the capabilities of plugins. While the ability to publish and share plugins is still a work in progress, a valuable improvement is already available in this release: you can now reference the module settings from the plugin using ${module.settings} in plugin.yaml:

Other improvements

Starting with version 0.10, Amper supports Maven profiles declared in the POM files of transitive dependencies.

In this release, we’ve also introduced the ability to add module descriptions in module.yaml. The description is formatted in Markdown and can occupy multiple lines. This text is used by the ./amper show modules command in the CLI, as well as by the IDE to show information about the module. For libraries, it is also used as a description in published metadata by default.

Updated default versions

We updated some of our default versions for toolchains and frameworks:

  • Kotlin: 2.3.20
  • Android minimum SDK: 23
  • Compose: 1.10.3
  • KSP: 2.3.6
  • Ktor: 3.4.1
  • Spring Boot: 4.0.5

Try Amper 0.10.0

To update an existing project, use the ./amper update command.

To get started with Amper, check out our Getting started guide. Take a look at some examples, follow a tutorial, or read the comprehensive user guide depending on your learning style.

Try Amper

Share your feedback

Amper is still experimental and under active development. You can provide feedback about your experience by joining the discussion in the Kotlinlang Slack’s #amper channel or sharing your suggestions and ideas in a YouTrack issue. Your input and use cases help shape the future of Amper!

Profile .NET Apps Without Restarting: Monitoring Comes to ReSharper

Tracking down performance bottlenecks in Visual Studio often means interrupting your workflow, restarting your application in profiling mode, and hoping you can reliably reproduce the exact issue. We think there’s a better way.

If you have already used Monitoring in Rider, this experience will feel familiar. Now, the same Monitoring experience is available in ReSharper, bringing real-time performance insights directly inside Visual Studio.

Try it in ReSharper 2026.1

Curious about what else is included in this update? Head over to the What’s New in ReSharper page to explore all the other improvements that have landed in the latest release.

What is Monitoring?

When you run or debug your app, ReSharper automatically opens the Monitoring tool window and shows you what’s happening in real time: CPU and memory usage, GC activity, counters, metrics, and so on.

The best part: profiling without restart

> Important: This feature requires a dotUltimate license.

The most powerful part of Monitoring is what happens after you notice a problem.

Monitoring does some profiling work in the background, so you can select a time range directly on the chart and open it in the built-in profiler. That means you do not need to stop the application, restart it in profiling mode, and try to reproduce the problem. You can go straight from “I see a spike here” to “show me the Call Tree for that exact interval”. In other words, you can open the selected range in the built-in profiler and inspect the collected data in detail: the call tree, call time, and related runtime events.

Automatic issue detection

Monitoring is not limited to charts. It can also automatically detect issues and list them below the timeline. Currently supported issue types include:

  • Performance hotspot
  • ASP.NET Core issues
    • Slow MVC action
    • Slow Razor page handler
    • Slow Razor view component
  • Database issues
    • Slow DB command
    • Excessive DB commands
    • Large DB result set
    • Excessive DB connections

These issues appear as the app runs, so you can catch bottlenecks while they occur rather than waiting until after a profiling session or checking logs.

Analyze issues in place

> Important: This feature requires a dotUltimate license.

The issue list is not just a report. It is also a starting point for investigation.

When Monitoring detects a problem, you can select that issue and analyze the corresponding time range in the built-in profiler. That gives you the same advantage as manual interval selection, but now the interesting intervals are found for you automatically.

Counters, metrics, and environment data

You can also use Monitoring as a live runtime dashboard. It includes tabs for counters, metrics, and environment data. This is especially handy when you want a single place to monitor both low-level runtime behavior and higher-level application signals during local development.

How to enable Monitoring in ReSharper

Monitoring is designed to be available by default. It starts automatically when you run or debug your project. If you want, you can change that behavior and keep Monitoring enabled only for debug sessions or turn it off completely. 

A simpler path from symptom to root cause

What makes Monitoring valuable is not any single chart or issue detector on its own. It is the workflow:

  1. You run the app.
  2. You notice a spike, slowdown, or detected issue.
  3. You select the interesting interval.
  4. You open it in the built-in profiler.
  5. You inspect the call tree and find the cause.

We are happy to bring Monitoring to ReSharper and make this runtime investigation workflow available in Visual Studio, as well.

Give it a try in the 2026.1 release, and as always, we would love to hear what works well, what is missing, and what you would like us to improve next.

Try it in ReSharper 2026.1

WCAG 2.2: What Changed, Why It Matters, and How to Implement It

Nine new success criteria. One removed. Here is what every frontend engineer needs to know.

WCAG 2.2 became an official W3C Recommendation on December 12, 2024. If your team is still targeting 2.1 as a compliance baseline, you are already behind. The W3C explicitly advises using 2.2 to maximize future applicability of accessibility efforts, and regulators in the EU, UK, and US are actively aligning their policies to the latest version.

This article covers every new success criterion using a consistent format: what the spec requires, why the criterion exists and who it protects, and how to implement it in practice.

What Was Removed First: 4.1.1 Parsing

Before the new criteria, one was cut. WCAG 2.2 removed 4.1.1 Parsing, which previously required well-formed HTML so assistive technologies could reliably parse it.

Why removed: Modern browsers and screen readers have become resilient enough to handle malformed markup without accessibility failures. The criterion no longer reliably predicted real-world accessibility outcomes, so the working group dropped it.

Practical note: If your organization is contractually obligated to WCAG 2.0 or 2.1 conformance, you may still need to test and report on 4.1.1 separately. For new 2.2 audits, it is gone.

The 9 New Success Criteria

1. Focus Not Obscured (Minimum) — 2.4.11 — Level AA

What: When a UI component receives keyboard focus, the focused element must not be entirely hidden by author-created content. Partially obscured is acceptable at this level. Entirely hidden is not.

Why: Users who navigate by keyboard (people with motor disabilities, switch device users, power users) need to see where focus is at all times. Sticky headers, floating cookie banners, fixed chat widgets, and bottom navigation bars are the most common offenders. When focus moves behind one of these layers and disappears completely, the user loses their place on the page with no visual cue for what is currently selected. This is especially disorienting for users with cognitive disabilities who are more sensitive to context loss during a task.

How to implement:

The core fix is ensuring scroll-padding-top accounts for any fixed header height so the browser scrolls enough to keep focused elements visible.

/* If your sticky header is 64px tall */
html {
  scroll-padding-top: 80px; /* header height + breathing room */
}

/* Alternatively, scoped to focusable elements */
a:focus,
button:focus,
[tabindex]:focus {
  scroll-margin-top: 80px;
}

For dynamic header heights (collapsing navs, announcement banners that appear after load), update the value from JavaScript:

function updateScrollPadding() {
  const header = document.querySelector('.sticky-header');
  const height = header?.getBoundingClientRect().height ?? 0;
  document.documentElement.style.scrollPaddingTop = `${height + 16}px`;
}

window.addEventListener('resize', updateScrollPadding);
updateScrollPadding();

Test it: Tab through your page with a sticky header visible. Every focused element should be at least partially visible above the fold.

2. Focus Not Obscured (Enhanced) — 2.4.12 — Level AAA

What: Same intent as 2.4.11, but stricter. The focused component must not be obscured at all, not even partially.

Why: At AA (2.4.11), a focused element that is 10% visible technically passes. For users with low vision who rely on high zoom levels or screen magnification, even partial obscuring can make the focus indicator undetectable in practice. The AAA version closes that gap entirely.

How to implement:

Everything from 2.4.11 applies. The additional requirement is that no part of the focused element is covered by overlapping author-created content. In practice this means:

  • scroll-padding values must fully clear the focused element above any sticky layers.
  • Fixed overlays (modals, drawers, sheets) must trap focus inside themselves while open, so keyboard focus can never land on content behind them.
// Trap focus inside an open modal
function trapFocus(modalElement) {
  const focusable = modalElement.querySelectorAll(
    'a, button, input, textarea, select, [tabindex]:not([tabindex="-1"])'
  );
  const first = focusable[0];
  const last = focusable[focusable.length - 1];

  modalElement.addEventListener('keydown', (e) => {
    if (e.key !== 'Tab') return;
    if (e.shiftKey) {
      if (document.activeElement === first) {
        e.preventDefault();
        last.focus();
      }
    } else {
      if (document.activeElement === last) {
        e.preventDefault();
        first.focus();
      }
    }
  });
}

3. Focus Appearance — 2.4.13 — Level AAA

What: When a keyboard focus indicator is visible, it must meet specific size and contrast requirements. The focus indicator area must be at least the perimeter of the unfocused component multiplied by 2 CSS pixels. The contrast ratio between focused and unfocused states must be at least 3:1 against adjacent colors.

Why: Browser default focus outlines are frequently invisible against common backgrounds, and many codebases globally suppress them with outline: none (still a widespread anti-pattern). Users with low vision, cognitive disabilities, and anyone relying entirely on keyboard navigation depend on a focus indicator that is visually obvious, not just technically present. A faint, thin blue ring at low contrast does not serve these users.

How to implement:

The first step is removing the global outline: none pattern. If you need to suppress the browser ring for mouse users, use :focus-visible instead of :focus:

/* Wrong: removes focus ring for everyone, including keyboard users */
*:focus {
  outline: none;
}

/* Right: removes ring only when pointer (not keyboard) is in use */
*:focus:not(:focus-visible) {
  outline: none;
}

/* Custom focus indicator that satisfies AAA geometric and contrast requirements */
*:focus-visible {
  outline: 3px solid #0f62fe;
  outline-offset: 2px;
  border-radius: 2px;
}

A practical shortcut: a 3px solid outline in a color with at least 3:1 contrast against the surrounding background satisfies the geometric requirement for most standard interactive components. For components on dark surfaces, check the contrast of your focus color against the dark background, not just the default page background.

4. Dragging Movements — 2.5.7 — Level AA

What: Any functionality that uses a dragging movement (click-and-drag, touch drag) must also be achievable with a single pointer action (click or tap) without dragging. Exceptions apply only when the drag is essential to the functionality itself.

Why: Dragging requires simultaneously pressing, holding, and moving a pointer. This compound gesture is unreliable or impossible for users with hand tremors, limited fine motor control, or motor disabilities affecting pointer precision. Sortable lists, kanban boards, sliders, map pan gestures, and date range pickers are common failure cases. The criterion does not prohibit drag interactions. It requires that a non-drag path exists to accomplish the same result.

How to implement:

For sortable lists, provide explicit move buttons alongside the drag handle:

function SortableItem({ item, onMoveUp, onMoveDown }) {
  return (
    <div draggable onDragStart={...} onDragEnd={...}>
      <span>{item.label}</span>
      <button aria-label={`Move ${item.label} up`} onClick={onMoveUp}></button>
      <button aria-label={`Move ${item.label} down`} onClick={onMoveDown}></button>
    </div>
  );
}

For range sliders, use native <input type="range"> wherever possible. It supports arrow key adjustment out of the box. Custom slider implementations frequently break keyboard support:

<input
  type="range"
  min={0}
  max={100}
  value={value}
  onChange={(e) => setValue(Number(e.target.value))}
  aria-label="Price range maximum"
/>

For map or canvas drag interactions, provide explicit pan controls: arrow-key panning and clickable pan buttons in the UI.

5. Target Size (Minimum) — 2.5.8 — Level AA

What: The size of the pointer target for interactive elements must be at least 24×24 CSS pixels. Exceptions apply when: the target’s offset from adjacent targets is at least 24px, the target is inline within text content, the browser controls the target size (default form controls), or a small size is essential to the information conveyed.

Why: Small tap targets fail users with tremors, limited dexterity, or motor disabilities who use alternative pointer devices with reduced precision. Tightly packed icon buttons, small checkboxes, link-dense navigation menus, and close buttons in notification toasts are the most common failure patterns. Note that 24x24px is the AA minimum. The AAA version (2.5.5, carried forward from 2.1) requires 44x44px. Most mobile UX guidelines already recommend 44px. WCAG 2.2 establishes the legal floor.

How to implement:

Set a baseline minimum for all interactive elements:

button,
a,
[role="button"],
input[type="checkbox"],
input[type="radio"] {
  min-width: 24px;
  min-height: 24px;
}

/* Prefer the AAA-level 44x44 on touch interfaces */
@media (pointer: coarse) {
  button,
  a,
  [role="button"] {
    min-width: 44px;
    min-height: 44px;
  }
}

For icon-only buttons where the visual size is constrained by design, expand the hit area using padding while keeping the visible footprint the same:

.icon-button {
  padding: 10px; /* expands hit area to 44x44 if icon is 24x24 */
  display: inline-flex;
  align-items: center;
  justify-content: center;
}

The spacing exception is a legitimate tool for constrained layouts. If two 16×16 icons are spaced so their center-to-center distance is 24px or more, they satisfy the minimum even without being 24×24 in physical size. Use it as a fallback, not a design default.

6. Consistent Help — 3.2.6 — Level A

What: If a web page provides a help mechanism (human contact, self-help documentation, automated contact, or a contact form), that mechanism must appear in the same relative location across all pages within the site.

Why: Users with cognitive disabilities often need help completing tasks and struggle when support resources appear in different places on different pages. If the help icon is in the top-right corner on the homepage but shifts to the footer on the checkout page, the inconsistency creates friction precisely when the user is most likely to need assistance. This criterion does not require that you have a help mechanism. It only requires that if you do, its location is stable.

How to implement:

This is primarily a layout and design system decision. Anchor help mechanisms inside a shared layout component so they cannot drift between pages:

// Layout.jsx
export function Layout({ children }) {
  return (
    <>
      <GlobalHeader />     {/* help trigger lives here, always */}
      <main>{children}</main>
      <GlobalFooter />
    </>
  );
}

Avoid conditionally hiding the help trigger on specific page types. If suppression is unavoidable (full-screen checkout flows, immersive experiences), make sure the mechanism reappears in the same location once normal layout resumes.

“Same relative location” means the same area of the page (top-right, bottom-right, etc.), not exact pixel coordinates. Responsive layouts that shift the help button between breakpoints are acceptable as long as it is consistently placed within each breakpoint’s layout pattern.

7. Redundant Entry — 3.3.7 — Level A

What: Information that a user has already provided in a multi-step process must either be auto-populated in subsequent steps or be selectable from previously entered values. Users must not be required to re-enter the same information within the same session unless re-entry is essential (e.g., password confirmation for security) or the information is no longer valid.

Why: Re-entering data is a significant cognitive and motor burden. For users with cognitive disabilities, being asked to retype a name or address they entered three steps ago interrupts task flow, increases error likelihood, and often causes abandonment. For users with motor disabilities, every additional keystroke carries a real physical cost. This criterion formalizes what good UX already recommends: do not ask for something you already have.

How to implement:

In a multi-step React form, store session state at a high level and pre-populate later steps:

// FormContext.jsx
const FormContext = React.createContext({});

export function FormProvider({ children }) {
  const [formData, setFormData] = React.useState({});

  const updateFormData = (values) => {
    setFormData((prev) => ({ ...prev, ...values }));
  };

  return (
    <FormContext.Provider value={{ formData, updateFormData }}>
      {children}
    </FormContext.Provider>
  );
}

// Step 3: Shipping -- pre-populate from billing when same
function ShippingStep() {
  const { formData, updateFormData } = React.useContext(FormContext);
  const [sameAsBilling, setSameAsBilling] = React.useState(false);

  const address = sameAsBilling
    ? formData.billingAddress
    : formData.shippingAddress;

  return (
    <>
      <label>
        <input
          type="checkbox"
          checked={sameAsBilling}
          onChange={(e) => setSameAsBilling(e.target.checked)}
        />
        Same as billing address
      </label>
      <AddressFields defaultValues={address} onChange={...} />
    </>
  );
}

The “same as billing” pattern already present in most e-commerce checkouts is a textbook 3.3.7 implementation. Apply the same logic to any multi-step flow where information asked in step N could have been collected in step N-1 or earlier.

8. Accessible Authentication (Minimum) — 3.3.8 — Level AA

What: A cognitive function test (memorizing a password, solving a puzzle, transcribing characters) must not be required at any step of an authentication process unless: an alternative authentication method is available that does not require a cognitive function test, a mechanism is available to help complete the test (such as copy-paste support or a password manager), or the test involves recognizing objects or personal content the user themselves provided.

Why: Password recall is itself a cognitive function test. Many users with cognitive disabilities, memory impairments, or learning disabilities cannot reliably memorize and recall complex passwords on demand. CAPTCHAs add another cognitive or visual puzzle on top of that. This criterion protects access to the authentication layer itself, which is a prerequisite for using everything else on the platform.

How to implement:

The highest-impact single change: allow paste into password fields and respect autocomplete attributes. Blocking paste breaks password managers and forces manual re-entry.

// Wrong: blocks paste, breaks password managers
<input
  type="password"
  onPaste={(e) => e.preventDefault()}
/>

// Right: paste allowed, autocomplete declared
<input
  type="password"
  autoComplete="current-password"
/>

Use the correct autocomplete values for the browser and password managers to fill credentials automatically:

<input type="email" autocomplete="username" />
<input type="password" autocomplete="current-password" />
<input type="password" autocomplete="new-password" /> <!-- registration -->

Additional paths to compliance:

  • Offer magic link login (no password to recall).
  • Support passkeys as an alternative.
  • If you use a CAPTCHA, provide an audio alternative and a non-CAPTCHA path for users who cannot complete visual challenges.

The object recognition exception covers CAPTCHAs that ask users to identify images they uploaded themselves (security images, personal photos). These are permitted because the cognitive anchor is personal memory, not abstract recall.

9. Accessible Authentication (Enhanced) — 3.3.9 — Level AAA

What: Same as 3.3.8, but removes the object recognition and personal content exceptions. No cognitive function test of any kind is permitted anywhere in the authentication flow.

Why: Even object recognition and personal image selection require a level of memory and visual processing that users with severe cognitive or visual disabilities may not be able to reliably perform. The AAA version is an absolute requirement: authentication cannot depend on cognitive tests at all.

How to implement:

The only conformant paths at AAA are authentication methods that require no cognitive recall:

Passkeys (WebAuthn/FIDO2): device-level biometric or PIN-based auth. No password to memorize or recall.

// Passkey authentication
const assertion = await navigator.credentials.get({
  publicKey: {
    challenge: serverGeneratedChallenge,
    allowCredentials: [{ type: 'public-key', id: existingCredentialId }],
    userVerification: 'preferred',
  },
});
// Send assertion to server for verification

Magic links: one-time login URLs delivered to a verified email or phone. The user clicks a link in their inbox. No password involved.

SSO delegation: the authentication burden is delegated to a trusted identity provider. The provider’s own authentication is outside your conformance boundary.

AAA is not required for most products, but passkeys are rapidly becoming the industry default regardless of compliance requirements. Implementing them satisfies the accessibility requirement and the general trend toward passwordless authentication simultaneously.

Audit Checklist for WCAG 2.2 Compliance

If you are auditing an existing product, prioritize in this order:

Level A (minimum baseline)

  • [ ] Help mechanisms appear in a consistent location across all pages (3.2.6)
  • [ ] Multi-step forms do not re-ask for information already collected in the session (3.3.7)

Level AA (legal and enterprise standard)

  • [ ] No focused element is entirely hidden by sticky headers, footers, or overlays (2.4.11)
  • [ ] All interactive targets are at least 24×24 CSS pixels or have adequate spacing (2.5.8)
  • [ ] Every drag interaction has a single-pointer alternative (2.5.7)
  • [ ] Password fields allow paste and declare correct autocomplete attributes (3.3.8)
  • [ ] No authentication step requires a cognitive function test without an accessible alternative (3.3.8)

Level AAA (aspirational or contractual)

  • [ ] No focused element is partially obscured by author-created overlays (2.4.12)
  • [ ] Focus indicators meet minimum size and 3:1 contrast requirements (2.4.13)
  • [ ] Authentication requires no cognitive function tests of any kind (3.3.9)

The Bigger Picture

WCAG 2.2’s additions are tightly scoped around three user groups: people with cognitive or learning disabilities, users with low vision, and users on mobile and touch devices. Every new criterion maps to a failure mode that real products ship regularly: password fields that block paste, drag interactions with no keyboard fallback, sticky headers that swallow focused elements, icon buttons too small to tap precisely.

None of these fixes are expensive once you know what to look for. The authentication changes are often one autocomplete attribute away. The target size and focus visibility issues are a few lines of CSS. The redundant entry problem is a state management question you have probably already partially solved elsewhere in your codebase.

The investment is low. The user impact is not.

Questions about implementing any of these criteria? Drop them in the comments.

Mastering the Orchestration Pattern in React: Taming Complex Component Logic

TL;DR: The Orchestration Pattern is a powerful way to manage complex interactions between components, API calls, and state updates in React. Instead of letting logic scatter across dozens of useEffect hooks and event handlers, you centralize it into a dedicated “orchestrator” component or hook. This approach makes your code more predictable, testable, and maintainable—especially in enterprise applications with complex workflows.

The Problem: When React Components Become Spaghetti

Let’s be honest. We’ve all been there. You start building a feature—say, a multi-step checkout form. Initially, it’s simple. A few inputs, a submit button.

But then requirements grow:

  • “We need to validate the address against a third-party API.”
  • “If the user is a returning customer, pre-fetch their saved payment methods.”
  • “Apply discount codes, but only after shipping is calculated.”
  • “If payment fails, show a specific error and roll back the shipping selection.”

Suddenly, your component looks like this:

const Checkout = () => {
  const [step, setStep] = useState(1);
  const [cart, setCart] = useState(null);
  const [shipping, setShipping] = useState(null);
  const [payment, setPayment] = useState(null);
  const [discount, setDiscount] = useState(null);
  const [errors, setErrors] = useState({});
  const [loading, setLoading] = useState(false);

  useEffect(() => {
    // Fetch cart on mount
  }, []);

  useEffect(() => {
    // Recalculate shipping when address changes
  }, [address]);

  useEffect(() => {
    // Apply discount when cart or code changes
  }, [discountCode, cart]);

  const handlePayment = async () => {
    // Complex logic with multiple steps and error handling
  };

  // 300+ more lines of imperative, hard-to-follow code...
};

This is imperative spaghetti. The “what” (user wants to checkout) is buried in the “how” (fetch this, update that, call this API, show this error). It’s hard to test, hard to debug, and even harder for new team members to understand.

Enter the Orchestration Pattern.

What Is the Orchestration Pattern in React?

Inspired by backend microservices architecture (where an orchestrator coordinates multiple services), the Orchestration Pattern in React applies the same principle: centralize complex workflow logic into a single coordinator.

Think of it like a movie director:

  • Orchestrator (Director): Knows the script. Calls “Action!” to the camera team, tells the actor when to enter, signals the lighting crew.
  • Components/APIs (Actors/Crew): Do one thing well. They don’t know the full script—they just respond to commands.

In React terms, the orchestrator manages:

  • The sequence of operations (API call A, then B, then C)
  • Branching logic (if response X, do Y; else do Z)
  • Error handling and compensation (if step 3 fails, roll back step 2)
  • State transitions (loading → success → error)
  • Side effect coordination (avoiding race conditions)

A Simple Orchestration Pattern Implementation

Let’s refactor the checkout example using a custom hook as our orchestrator.

Step 1: Define the Orchestrator Hook

// hooks/useCheckoutOrchestrator.js
import { useReducer, useCallback } from 'react';
import { validateAddress } from '../api/address';
import { calculateShipping } from '../api/shipping';
import { applyDiscount } from '../api/discount';
import { processPayment } from '../api/payment';

// State machine for the checkout process
const initialState = {
  status: 'idle', // idle, validating, calculating, paying, success, error
  step: 1,
  cart: null,
  address: null,
  shipping: null,
  discount: null,
  paymentResult: null,
  error: null,
};

function checkoutReducer(state, action) {
  switch (action.type) {
    case 'SET_CART':
      return { ...state, cart: action.payload };
    case 'SET_ADDRESS':
      return { ...state, address: action.payload };
    case 'VALIDATION_START':
      return { ...state, status: 'validating', error: null };
    case 'VALIDATION_SUCCESS':
      return { ...state, status: 'idle', step: 2 };
    case 'VALIDATION_ERROR':
      return { ...state, status: 'error', error: action.payload };
    case 'SHIPPING_START':
      return { ...state, status: 'calculating' };
    case 'SHIPPING_SUCCESS':
      return { ...state, status: 'idle', shipping: action.payload, step: 3 };
    case 'PAYMENT_START':
      return { ...state, status: 'paying' };
    case 'PAYMENT_SUCCESS':
      return { ...state, status: 'success', paymentResult: action.payload, step: 4 };
    case 'PAYMENT_ERROR':
      return { ...state, status: 'error', error: action.payload };
    case 'RESET':
      return initialState;
    default:
      return state;
  }
}

export function useCheckoutOrchestrator() {
  const [state, dispatch] = useReducer(checkoutReducer, initialState);

  const setCart = useCallback((cart) => {
    dispatch({ type: 'SET_CART', payload: cart });
  }, []);

  const setAddress = useCallback((address) => {
    dispatch({ type: 'SET_ADDRESS', payload: address });
  }, []);

  // The orchestrator's main workflow
  const validateAndProceed = useCallback(async (address) => {
    dispatch({ type: 'VALIDATION_START' });

    try {
      // Step 1: Validate address
      const isValid = await validateAddress(address);
      if (!isValid) {
        throw new Error('Invalid address format');
      }
      dispatch({ type: 'VALIDATION_SUCCESS' });

      // Step 2: Calculate shipping based on validated address
      dispatch({ type: 'SHIPPING_START' });
      const shippingOptions = await calculateShipping(address, state.cart);
      dispatch({ type: 'SHIPPING_SUCCESS', payload: shippingOptions });

    } catch (error) {
      dispatch({ type: 'VALIDATION_ERROR', payload: error.message });
    }
  }, [state.cart]);

  const applyDiscountCode = useCallback(async (code) => {
    if (!state.cart) return;

    try {
      const discount = await applyDiscount(code, state.cart);
      dispatch({ type: 'SET_DISCOUNT', payload: discount });
      // Recalculate shipping with discount applied
      dispatch({ type: 'SHIPPING_START' });
      const updatedShipping = await calculateShipping(state.address, state.cart, discount);
      dispatch({ type: 'SHIPPING_SUCCESS', payload: updatedShipping });
    } catch (error) {
      dispatch({ type: 'SET_ERROR', payload: error.message });
    }
  }, [state.cart, state.address]);

  const processPaymentAndComplete = useCallback(async (paymentDetails) => {
    dispatch({ type: 'PAYMENT_START' });

    try {
      const result = await processPayment({
        cart: state.cart,
        shipping: state.shipping,
        discount: state.discount,
        paymentDetails,
      });

      dispatch({ type: 'PAYMENT_SUCCESS', payload: result });

      // Optional: Navigate to success page
      return result;

    } catch (error) {
      dispatch({ type: 'PAYMENT_ERROR', payload: error.message });

      // Compensation logic: if payment fails, shipping selection remains
      // but we might want to show a retry option
      throw error;
    }
  }, [state.cart, state.shipping, state.discount]);

  const reset = useCallback(() => {
    dispatch({ type: 'RESET' });
  }, []);

  return {
    // State
    status: state.status,
    step: state.step,
    cart: state.cart,
    shipping: state.shipping,
    discount: state.discount,
    error: state.error,
    paymentResult: state.paymentResult,

    // Actions (the public API of our orchestrator)
    setCart,
    setAddress,
    validateAndProceed,
    applyDiscountCode,
    processPaymentAndComplete,
    reset,
  };
}

Step 2: Consume the Orchestrator in Components

Now your components become “dumb” presentational components that simply call the orchestrator’s methods:

// CheckoutPage.jsx
import { useCheckoutOrchestrator } from '../hooks/useCheckoutOrchestrator';
import { AddressForm } from './AddressForm';
import { ShippingSelector } from './ShippingSelector';
import { PaymentForm } from './PaymentForm';
import { LoadingSpinner } from './LoadingSpinner';
import { ErrorAlert } from './ErrorAlert';

export const CheckoutPage = () => {
  const {
    status,
    step,
    cart,
    shipping,
    error,
    validateAndProceed,
    applyDiscountCode,
    processPaymentAndComplete,
    reset,
  } = useCheckoutOrchestrator();

  // Components don't need to know the complex flow!
  // They just call the orchestrator's methods.

  const handleAddressSubmit = async (addressData) => {
    await validateAndProceed(addressData);
  };

  const handleDiscountApply = async (code) => {
    await applyDiscountCode(code);
  };

  const handlePaymentSubmit = async (paymentDetails) => {
    try {
      await processPaymentAndComplete(paymentDetails);
      // Navigation happens automatically in the orchestrator
    } catch (err) {
      // Error is already in state, but we can show a toast if needed
    }
  };

  if (status === 'success') {
    return <OrderConfirmation order={paymentResult} onNewOrder={reset} />;
  }

  return (
    <div className="checkout">
      {error && <ErrorAlert message={error} onDismiss={() => reset()} />}

      {status === 'validating' || status === 'calculating' || status === 'paying' ? (
        <LoadingSpinner message="Processing your order..." />
      ) : (
        <>
          {step === 1 && (
            <AddressForm onSubmit={handleAddressSubmit} />
          )}

          {step === 2 && (
            <>
              <DiscountInput onApply={handleDiscountApply} />
              <ShippingSelector 
                options={shipping} 
                onSelect={handleShippingSelect} 
              />
              <button onClick={() => setStep(3)}>Continue to Payment</button>
            </>
          )}

          {step === 3 && (
            <PaymentForm 
              total={calculateTotal(cart, shipping, discount)}
              onSubmit={handlePaymentSubmit}
            />
          )}
        </>
      )}
    </div>
  );
};

Benefits of the Orchestration Pattern

1. Separation of Concerns

  • Components focus on presentation and user interactions
  • Orchestrator handles the “how” and “when”
  • API layers handle raw data fetching

2. Testability

Test the orchestrator in isolation without rendering UI:

test('checkout flow handles validation failure', async () => {
  const { result } = renderHook(() => useCheckoutOrchestrator());

  // Mock API to fail
  jest.spyOn(api, 'validateAddress').mockRejectedValue(new Error('Invalid'));

  await act(async () => {
    await result.current.validateAndProceed({ street: '123 Main' });
  });

  expect(result.current.status).toBe('error');
  expect(result.current.error).toBe('Invalid');
  expect(result.current.step).toBe(1); // Still on address step
});

3. Reusability

The same orchestrator can be used across different UI implementations:

  • Mobile checkout screen
  • Desktop checkout modal
  • Admin panel order creation

4. Observability

Centralized logic makes it easy to add logging, analytics, or error tracking:

const validateAndProceed = useCallback(async (address) => {
  analytics.trackEvent('checkout_address_validation_started');

  try {
    // ... validation logic
    analytics.trackEvent('checkout_address_validation_success');
  } catch (error) {
    analytics.trackEvent('checkout_address_validation_failed', { error });
    Sentry.captureException(error);
  }
}, []);

When to Use Orchestration (and When Not To)

Great Use Cases:

  • Multi-step forms (checkout, onboarding, surveys)
  • Wizard-style workflows (report generation, deployment pipelines)
  • Features with complex dependencies (dashboard with sequential data fetches)
  • Operations requiring rollback/compensation (bank transfers, reservations)

Overkill For:

  • Simple CRUD forms with one API call
  • Independent, isolated components with no coordination needs
  • Small applications where complexity doesn’t justify abstraction

Best Practices

  1. Keep orchestrators stateless where possible — store state in React state or a state machine, not in the orchestrator instance itself.

  2. Use TypeScript — define clear interfaces for your orchestrator’s context and events:

interface CheckoutContext {
  cart: Cart | null;
  address: Address | null;
  shipping: ShippingOption[] | null;
}

type CheckoutEvent = 
  | { type: 'VALIDATE_ADDRESS'; address: Address }
  | { type: 'SELECT_SHIPPING'; method: ShippingMethod }
  | { type: 'PROCESS_PAYMENT'; details: PaymentDetails };
  1. Single responsibility — an orchestrator should coordinate ONE business process. Don’t create a “god orchestrator” that handles checkout, profile updates, and notifications all in one place.

  2. Compose orchestrators — for complex apps, create smaller orchestrators that work together:

function useOrderOrchestrator() {
  const cart = useCartOrchestrator();
  const checkout = useCheckoutOrchestrator();
  const payment = usePaymentOrchestrator();

  // Compose them into a higher-level workflow
}

Conclusion

The Orchestration Pattern transforms React applications from a collection of scattered useEffect hooks and imperative logic into a clean, declarative system. By centralizing workflow coordination, you get:

  • Components that are simple and focused on presentation
  • Orchestrators that clearly express business logic
  • Code that’s easier to test, debug, and maintain

Whether you implement it with custom hooks, XState, or a full workflow engine, the principle remains the same: coordinate complexity in one place, not everywhere.

Have you used the Orchestration Pattern in your React apps? What challenges did you face? Let me know in the comments! 👇

Rebooting a Production VM on Oracle Cloud: A Reference Guide

Commands, explanations, and real output — for engineers who want to understand what’s actually happening, not just copy-paste their way through it.

☁️ Pre-Flight Checklist

Before we taxi down the runway, here’s your flight plan. Keep this handy to navigate your flight path. Welcome aboard the cloud! ☁️

🌥️ Takeoff

  • Prerequisites

⛅️ Cruising Altitude

  • Part 1 — Pre-Reboot Checklist
  • Part 2 — Running the Reboot
  • Part 3 — Post-Reboot Verification
  • Part 4 — Measuring Time to Recovery (TTR)

🌤️ Landing & Taxi

  • Quick Reference: All Commands
  • Troubleshooting Reference

Enjoy your flight! ☁️

There’s a specific kind of anxiety that comes with running sudo reboot on a server with real users on it. You know the system should come back, but “should” feels a lot less reassuring at the moment your SSH session freezes. This guide removes the guesswork. It covers everything from reading your apt upgrade output intelligently, to verifying your stack is healthy after the reboot, to measuring your actual recovery time with real commands and real numbers so that the next time you need to do this, it’s a procedure, not a gamble.

Prerequisites

This guide assumes:

  • Ubuntu 22.04 on an OCI Compute instance (ARM or x86)
  • Docker + Docker Compose managing your services
  • All long-running services configured with restart: always in your docker-compose.yml
  • SSH access to the instance

If restart: always isn’t set on your services, your containers will not come back after a reboot. Check this first.

services:
  backend:
    image: your-backend-image
    restart: always  # ✅ restarts automatically after reboot or crash

  migrations:
    image: your-migrations-image
    # no restart policy  # ✅ correct — this should run once and exit

restart: always tells Docker to relaunch the container whenever it stops — whether from a crash or a full system reboot. The one exception to be deliberate about is one-shot containers like database migrations: they’re designed to run once and exit cleanly, so no restart policy is the right call for those.

Part 1 — Pre-Reboot Checklist

Never reboot without completing this checklist. It takes under two minutes and prevents the most common post-reboot problems.

1.1 Verify no critical process is mid-flight

docker ps

What to look for:

STATUS Meaning
Up 2 days (healthy) Safe to reboot
Up 3 minutes Something recently restarted — investigate
Restarting (1) Container is crash-looping — fix before rebooting
Up 2 hours (unhealthy) Health check is failing — fix before rebooting

If everything shows Up [days/weeks] (healthy), you are clear.

Why this matters: If a database migration container is mid-run, or a background job is processing a large task, a reboot will kill it mid-execution. You want to reboot during a quiet moment.

1.2 Validate your Compose configuration

cd ~/your-project
docker compose config

Expected output: Your full resolved docker-compose.yml printed to the terminal, with no errors.

Why this matters: docker compose config resolves all environment variables and validates YAML syntax. If there’s a broken variable reference or a typo in your file, this command catches it now — not after the reboot when containers silently fail to start. A common mistake is editing a .env file or docker-compose.yml and not realising you’ve introduced a syntax error. This is your safety net.

1.3 Read your apt upgrade output

When you run sudo apt update && sudo apt upgrade -y before a reboot, the output tells you exactly what changed on your system. Don’t skip past it.

Here’s a real upgrade output and what each part means:

The following packages will be upgraded:
  containerd.io coreutils docker-ce docker-ce-cli
  docker-ce-rootless-extras docker-compose-plugin docker-model-plugin
  gitlab-runner gitlab-runner-helper-images libnftables1 nftables
  python3-pyasn1

How to read this list:

Package What it is Reboot needed?
docker-ce, containerd.io, docker-ce-cli The Docker engine and its runtime Recommended
docker-compose-plugin The docker compose CLI plugin No
nftables, libnftables1 Linux kernel firewall/networking Yes
coreutils Fundamental Linux utilities (ls, cp, etc.) Recommended
gitlab-runner, gitlab-runner-helper-images CI/CD runner agent Service restarts during upgrade
python3-pyasn1 Python crypto library No

The rule of thumb: If the upgrade touches anything in the kernel, networking stack, or container runtime — reboot. If it’s only application-level packages — a reboot is optional but never harmful.

1.4 Understand the service restart messages

After apt upgrade, Ubuntu’s needrestart tool prints which services were restarted automatically and which were deferred:

Restarting services...
 systemctl restart irqbalance.service ssh.service rsyslog.service ...

Service restarts being deferred:
 systemctl restart networkd-dispatcher.service
 systemctl restart systemd-logind.service

“Restarting services” — These were restarted immediately. Your SSH connection stayed alive because ssh.service restarts in-place without dropping existing sessions.

“Service restarts being deferred” — These require a full reboot to apply safely. systemd-logind manages user sessions; restarting it mid-session can cause issues, so Ubuntu defers it to the next clean boot.

No containers need to be restarted.

This line means Docker detected that running container images are still current — no container needed to be replaced. This is expected if you haven’t rebuilt your application images.

1.5 Check available disk space

df -h /

Example output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        48G   12G   36G  23% /

You want at least 20% free on your root partition. Docker image pulls and accumulated log files are the two most common causes of a full disk, which can prevent containers from starting after a reboot.

Tip: The apt upgrade process often reclaims space automatically by pruning unused Docker build cache layers. In a real upgrade run, this printed:

Total reclaimed space: 4.165GB

Part 2 — Running the Reboot

Once the checklist is complete:

sudo reboot

What happens next, step by step:

  1. The OS sends SIGTERM to all running processes, giving them time to shut down cleanly.
  2. Docker receives the signal and stops all containers gracefully.
  3. The kernel shuts down and the VM restarts.
  4. Your SSH session prints Connection to [ip] closed by remote host. and terminates. This is normal.

How long to wait: OCI ARM instances (Ampere A1) typically reboot in 45–90 seconds. Wait at least 60 seconds before trying to reconnect.

ssh -i ~/.ssh/id_rsa ubuntu@YOUR_IP

Part 3 — Post-Reboot Verification

Run these checks in order. Each one builds on the last.

3.1 Check the Docker daemon

sudo systemctl status docker

Expected output:

 docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled)
     Active: active (running) since Mon 2026-03-30 15:55:51 UTC; 5min ago

Key things to check:

  • Active: active (running) — the daemon is running ✅
  • enabled — it is configured to auto-start on every future boot ✅

If the daemon isn’t running:

sudo systemctl enable docker   # ensure it starts on future reboots
sudo systemctl start docker    # start it now

3.2 Check all containers are up

docker ps

Example output:

CONTAINER ID   IMAGE              COMMAND        CREATED      STATUS                  PORTS    NAMES
fc46f84c7bd5   app-backend        "uv run uvi…"  2 days ago   Up 5 minutes (healthy)  8000/tcp app_backend
a3e9a2eeb160   redis:alpine       "docker-ent…"  2 weeks ago  Up 5 minutes (healthy)  6379/tcp app_redis
f4afe2edb00c   caddy:alpine       "caddy run …"  4 weeks ago  Up 5 minutes (healthy)  80, 443  caddy_proxy

What to check:

  • Every service you expect should be present. If one is missing, it crashed on startup.
  • STATUS should be Up or Up (healthy). (health: starting) is fine for the first 30 seconds after boot.
  • The CREATED timestamp does not reset on reboot — it reflects when the container was first created with docker compose up. This is normal.

If a container is missing or in a restart loop:

docker compose logs [service_name] --tail=50

This shows the last 50 log lines for that specific service, which will usually tell you exactly why it failed.

3.3 Watch the live logs

cd ~/your-project
docker compose logs -f --tail=20

The -f flag follows the log stream in real time. --tail=20 shows the last 20 lines per service as a starting point.

What healthy output looks like:

app_gate    | 127.0.0.1 - - [30/Mar/2026:16:00:00 +0000] "GET / HTTP/1.1" 200 4140
app_backend | INFO: 127.0.0.1:58562 - "GET /health HTTP/1.1" 200 OK
caddy_proxy | {"level":"info","msg":"received request","uri":"/config/"}
app_redis   | * Ready to accept connections tcp

What a transient (non-critical) error looks like:

app_worker | redis.exceptions.ConnectionError: Error while reading
            from redis:6379 : (104, 'Connection reset by peer')
app_worker | 15:56:15: Starting worker for 1 functions: process_message
app_worker | 15:56:15: redis_version=8.6.1 mem_usage=1.38M clients_connected=1

This pattern — an error followed immediately by a successful connection message — is normal during cold starts. When all containers launch simultaneously, a dependent service (like a worker) may attempt its first connection before its dependency (like Redis) has finished initialising. The container retries and connects successfully on the next attempt. This is expected behavior.

What a critical error looks like:

app_backend | sqlalchemy.exc.OperationalError: connection refused
app_backend | [after 5 retries] giving up

A critical error is one that does not resolve on its own. If you see continuous errors without a recovery line following them, press Ctrl+C and investigate that service.

3.4 Check additional system services

If you run a CI/CD runner or similar agent alongside Docker:

sudo gitlab-runner status

Expected output:

gitlab-runner: Service is running

If it’s not running:

sudo gitlab-runner start

Part 4 — Measuring Time to Recovery (TTR)

TTR is the total time from sudo reboot to the moment your application is serving healthy responses. Measuring it gives you accurate data for maintenance window planning and user communications.

4.1 Measure OS boot time

systemd-analyze

Example output:

Startup finished in 3.617s (kernel) + 19.608s (userspace) = 23.225s
graphical.target reached after 18.845s in userspace

Breaking this down:

Phase Time What’s happening
Kernel 3.6s The Linux kernel loads into memory and initialises hardware drivers
Userspace 19.6s All systemd services start in parallel (networking, Docker, SSH, etc.)
Total 23.2s OS is fully booted

4.2 Find the bottleneck in the boot sequence

systemd-analyze blame | head -20

This lists every service sorted by how long it took to start, slowest first:

12.186s docker.service
 4.821s cloud-init.service
 1.204s snapd.service
   38ms docker.socket

In this case, Docker itself accounted for 12 of the 23 total seconds. This is normal — Docker has to read its state from disk, re-attach networks, and prepare to launch containers.

Why this is useful: If your boot time is unexpectedly long, systemd-analyze blame tells you exactly which service is the bottleneck.

4.3 Find the exact moment containers started

docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(docker ps -q)

Example output:

/app_ftp_bridge:  2026-03-30T15:55:57.766Z
/app_worker:      2026-03-30T15:55:57.695Z
/app_backend:     2026-03-30T15:55:57.646Z
/app_gate:        2026-03-30T15:55:57.830Z
/app_admin:       2026-03-30T15:55:57.742Z
/app_redis:       2026-03-30T15:55:57.794Z
/caddy_proxy:     2026-03-30T15:55:57.615Z

Every container launched within the same second. This is because Docker starts all containers in parallel as soon as the daemon is ready. Note: this timestamp reflects when Docker launched the container process, not when the application inside it was ready to serve traffic. A container may take a further 5–30 seconds to pass its health check after this point.

4.4 Build your full TTR timeline

Combining the data from the above commands:

Event Time (relative to reboot)
sudo reboot executed T+0s
SSH connection closed T+~5s
Kernel boot complete T+~8s
Userspace boot complete (OS ready) T+~28s
Docker daemon ready T+~28s (12s of the userspace phase)
All containers launched T+~28s
Redis accepting connections T+~30s
Backend /health returning 200 T+~35s
All health checks passing T+~55s
Total TTR ~55–60 seconds

4.5 Use TTR to plan user communications

With a measured TTR, you can set honest expectations.

Internal / engineering team:

“Maintenance reboot at [time]. Expected downtime: ~2 minutes.”

The 2-minute internal window gives a buffer above the measured ~60 seconds for anything unexpected.

External users:

“Scheduled maintenance in progress. Services will be restored within 5 minutes.”

The 5-minute external window is deliberately conservative. If a container fails its first health check and requires a full restart cycle (up to 5 retries × 5 seconds = 25 extra seconds), you’re still within your stated window. Under-promise, over-deliver.

Quick Reference: All Commands

# --- PRE-REBOOT ---
docker ps                              # check container states
docker compose config                  # validate compose file syntax
df -h /                                # check available disk space

# --- REBOOT ---
sudo reboot                            # initiate the reboot

# --- POST-REBOOT ---
sudo systemctl status docker           # confirm daemon is running
docker ps                              # confirm containers are up
docker compose logs -f --tail=20       # watch live logs
sudo gitlab-runner status              # check runner (if applicable)

# --- TTR MEASUREMENT ---
systemd-analyze                        # total OS boot time
systemd-analyze blame | head -20       # per-service boot time breakdown
docker inspect --format='{{.Name}}: {{.State.StartedAt}}' $(docker ps -q)
                                       # exact container start timestamps

Troubleshooting Reference

Symptom Likely cause Fix
Container missing from docker ps Crashed on startup docker compose logs [service] --tail=50
Container stuck in (health: starting) after 2+ minutes Health check command failing docker inspect [id] → check Health.Log
Docker daemon not running Not enabled in systemd sudo systemctl enable docker && sudo systemctl start docker
SSH times out for more than 3 minutes VM didn’t boot cleanly Check OCI console → instance serial console for kernel panic output
All containers up but app unreachable externally Reverse proxy (Caddy/Nginx) issue docker compose logs caddy --tail=50
Persistent container errors after cold start Dependency started before its dependency was ready Wait 60 seconds, then re-check — most resolve automatically

Cover photo by BoliviaInteligente on Unsplash

GenAIOps on AWS: End-to-End Observability Stack – Part 3

Reading time: ~22-25 minutes

Level: Intermediate to Advanced

Series: Part 3 of 4 – End-to-End Observability

What you’ll learn: Build comprehensive observability for GenAI systems with CloudWatch GenAI Observability, X-Ray distributed tracing, and custom metrics

The Problem: When GenAI Goes Wrong at 3 AM

It’s 3 AM. PagerDuty wakes you up:

You open your logs. 10,000 lines of JSON. Where do you start?

Everything returns 200. But users are complaining. What’s actually failing?

  • Is retrieval slow? Can’t tell from these logs
  • Is the LLM hallucinating? No quality metrics captured
  • Why is cost 5x higher? Token counts missing
  • Which model is being used? Not tracked
  • What context was retrieved? Lost in the void

Traditional observability wasn’t built for this. You need GenAI-specific observability that captures the full story: retrieval quality, token consumption, model behavior, and end-to-end traces showing exactly where things break.

This is what we’re building today.

The GenAI Observability Challenge

GenAI systems are fundamentally different from traditional microservices:

Traditional Microservice Request

GenAI System Request

The challenge: A request can succeed (200 OK) but still fail the user:

  • Retrieved wrong documents → bad answer
  • LLM hallucinated → user misinformed
  • Cost spiked 5x → budget blown
  • Latency is 8s → user abandoned request

Traditional observability captures success/failure. GenAI observability captures quality/cost/performance at every step.

AWS CloudWatch GenAI Observability

AWS launched CloudWatch GenAI Observability in preview (Q4 2024) and GA (October 2025). It’s purpose-built for LLM applications.

What It Provides Out-of-the-Box

1. Model Invocation Dashboard

Automatic tracking of:

  • Invocation metrics: Count, success rate, throttles
  • Token metrics: Input tokens, output tokens, total tokens
  • Cost attribution: Per-model, per-request costs
  • Latency breakdown: Time-to-first-token, generation latency
  • Error tracking: Model errors, throttling, timeouts

2. AgentCore Agent Dashboard

For Amazon Bedrock AgentCore agents:

  • Session tracking: Duration, turn count, completion
  • Tool usage: Which tools called, frequency, success rate
  • Memory operations: Reads, writes, retrieval performance
  • Gateway metrics: API latency, auth failures
  • Reasoning traces: Step-by-step agent decision logs

3. OpenTelemetry Integration

  • Distributed tracing: End-to-end request flows
  • Custom spans: Instrument your components
  • Automatic instrumentation: AWS SDK calls auto-traced
  • X-Ray integration: Service maps and bottleneck detection

Architecture: Complete Observability Stack

Setting Up OpenTelemetry with ADOT

AWS Distro for OpenTelemetry (ADOT) is AWS’s distribution of OpenTelemetry, pre-configured for AWS services.

Installation

Basic Configuration

Auto-Instrumentation Setup

Instrumenting Your RAG Application

Now let’s instrument a complete RAG pipeline:

# instrumented_rag_system.py
import boto3
import json
from typing import List, Dict
from opentelemetry import trace
from datetime import datetime

class InstrumentedRAGSystem:
    """
    Fully instrumented RAG system with distributed tracing

    Captures:
    - End-to-end request traces
    - Per-component latency
    - Token consumption and costs
    - Quality signals
    - Error details
    """

    def __init__(self):
        self.bedrock_runtime = boto3.client('bedrock-runtime')
        self.opensearch = boto3.client('opensearchserverless')
        self.cloudwatch = boto3.client('cloudwatch')

        # Get tracer
        self.tracer = trace.get_tracer(__name__)

        # Model pricing (per 1K tokens)
        self.pricing = {
            "anthropic.claude-sonnet-4-20250514": {
                "input": 0.003,
                "output": 0.015
            },
            "amazon.titan-embed-text-v2:0": {
                "input": 0.0001,
                "output": 0
            }
        }

    def query(self, user_query: str, user_id: str = None) -> Dict:
        """
        Process RAG query with full instrumentation

        Args:
            user_query: User's question
            user_id: Optional user identifier for tracking

        Returns:
            Dict with answer and metadata
        """

        # Start root span
        with self.tracer.start_as_current_span("rag_query") as root_span:

            # Add request attributes
            root_span.set_attribute("query", user_query)
            root_span.set_attribute("query_length", len(user_query))
            if user_id:
                root_span.set_attribute("user_id", user_id)
            root_span.set_attribute("timestamp", datetime.now().isoformat())

            try:
                # Step 1: Generate embeddings
                with self.tracer.start_as_current_span("generate_embeddings") as span:
                    embeddings, embed_cost = self._generate_embeddings(user_query)

                    span.set_attribute("embedding_dimension", len(embeddings))
                    span.set_attribute("embedding_cost_usd", embed_cost)
                    span.set_attribute("model", "amazon.titan-embed-text-v2:0")

                # Step 2: Vector search
                with self.tracer.start_as_current_span("vector_search") as span:
                    contexts = self._vector_search(embeddings, top_k=5)

                    span.set_attribute("documents_retrieved", len(contexts))
                    if contexts:
                        avg_score = sum(c['score'] for c in contexts) / len(contexts)
                        span.set_attribute("avg_similarity_score", round(avg_score, 3))
                        span.set_attribute("top_score", round(contexts[0]['score'], 3))

                    # Publish retrieval quality metric
                    self._publish_metric(
                        "RetrievalQuality",
                        avg_score if contexts else 0,
                        namespace="GenAI/RAG/Retrieval"
                    )

                # Step 3: Rerank (optional but recommended)
                with self.tracer.start_as_current_span("rerank_documents") as span:
                    contexts = self._rerank_contexts(user_query, contexts, top_k=3)

                    span.set_attribute("documents_after_rerank", len(contexts))
                    if contexts:
                        span.set_attribute("top_rerank_score", round(contexts[0]['rerank_score'], 3))

                # Step 4: Build prompt and count tokens
                with self.tracer.start_as_current_span("prompt_construction") as span:
                    prompt = self._build_prompt(user_query, contexts)
                    input_tokens = self._estimate_tokens(prompt)

                    span.set_attribute("input_tokens", input_tokens)
                    span.set_attribute("context_documents", len(contexts))
                    span.set_attribute("prompt_length_chars", len(prompt))

                    # Check context window
                    max_context_window = 200000  # Claude Sonnet 4
                    if input_tokens > max_context_window:
                        span.set_attribute("error", "context_window_exceeded")
                        raise ValueError(f"Input tokens ({input_tokens}) exceed context window")

                # Step 5: Generate response
                with self.tracer.start_as_current_span("llm_generation") as span:
                    response = self._generate_response(prompt)

                    # Extract metrics
                    usage = response.get('usage', {})
                    input_tokens = usage.get('input_tokens', 0)
                    output_tokens = usage.get('output_tokens', 0)
                    model_id = "anthropic.claude-sonnet-4-20250514"

                    # Calculate cost
                    cost = self._calculate_cost(
                        model_id=model_id,
                        input_tokens=input_tokens,
                        output_tokens=output_tokens
                    )

                    # Add span attributes
                    span.set_attribute("model_id", model_id)
                    span.set_attribute("input_tokens", input_tokens)
                    span.set_attribute("output_tokens", output_tokens)
                    span.set_attribute("total_tokens", input_tokens + output_tokens)
                    span.set_attribute("generation_cost_usd", cost)
                    span.set_attribute("stop_reason", response.get('stop_reason', 'unknown'))

                    # Publish token metrics
                    self._publish_metric("InputTokens", input_tokens, namespace="GenAI/Tokens")
                    self._publish_metric("OutputTokens", output_tokens, namespace="GenAI/Tokens")
                    self._publish_metric("GenerationCost", cost, namespace="GenAI/Cost")

                # Step 6: Extract answer
                answer = response['content'][0]['text']

                # Add overall metrics to root span
                total_cost = embed_cost + cost
                root_span.set_attribute("total_cost_usd", round(total_cost, 4))
                root_span.set_attribute("total_tokens", input_tokens + output_tokens)
                root_span.set_attribute("answer_length", len(answer))
                root_span.set_attribute("status", "success")

                # Publish overall metrics
                self._publish_metric("RequestCost", total_cost, namespace="GenAI/Cost")
                self._publish_metric("RequestSuccess", 1, namespace="GenAI/Quality")

                return {
                    "answer": answer,
                    "metadata": {
                        "input_tokens": input_tokens,
                        "output_tokens": output_tokens,
                        "total_cost": round(total_cost, 4),
                        "contexts_used": len(contexts),
                        "model": model_id
                    }
                }

            except Exception as e:
                # Capture error in span
                root_span.set_attribute("error", True)
                root_span.set_attribute("error_type", type(e).__name__)
                root_span.set_attribute("error_message", str(e))
                root_span.set_attribute("status", "error")

                # Publish error metric
                self._publish_metric("RequestErrors", 1, namespace="GenAI/Errors")

                # Re-raise
                raise

    def _generate_embeddings(self, text: str) -> tuple:
        """Generate embeddings with Bedrock Titan"""

        response = self.bedrock_runtime.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            body=json.dumps({
                "inputText": text,
                "dimensions": 1024,
                "normalize": True
            })
        )

        result = json.loads(response['body'].read())
        embeddings = result['embedding']

        # Calculate cost
        token_count = len(text.split()) * 1.3  # Rough estimate
        cost = (token_count / 1000) * self.pricing["amazon.titan-embed-text-v2:0"]["input"]

        return embeddings, cost

    def _vector_search(self, embeddings: List[float], top_k: int = 5) -> List[Dict]:
        """
        Search OpenSearch vector index

        Note: This is automatically traced via boto3 instrumentation
        """

        # OpenSearch vector search
        # In production, use actual OpenSearch client

        # Mock response for example
        return [
            {
                "id": "doc_1",
                "score": 0.89,
                "text": "Electronics can be returned within 30 days..."
            },
            {
                "id": "doc_2",
                "score": 0.76,
                "text": "Damaged items require photo documentation..."
            },
            {
                "id": "doc_3",
                "score": 0.71,
                "text": "Restocking fees apply to opened electronics..."
            }
        ]

    def _rerank_contexts(
        self,
        query: str,
        contexts: List[Dict],
        top_k: int = 3
    ) -> List[Dict]:
        """
        Rerank contexts using cross-encoder

        In production, use:
        - Bedrock reranking model
        - Cohere rerank
        - Custom cross-encoder
        """

        # For example, just return top contexts
        # In production, apply reranking model
        for ctx in contexts[:top_k]:
            ctx['rerank_score'] = ctx['score'] * 1.1  # Mock rerank

        return contexts[:top_k]

    def _build_prompt(self, query: str, contexts: List[Dict]) -> str:
        """Build prompt from query and contexts"""

        context_text = "nn".join([
            f"Document {i+1}:n{ctx['text']}"
            for i, ctx in enumerate(contexts)
        ])

        prompt = f"""You are a helpful customer service assistant. Answer the user's question based on the provided context.

Context:
{context_text}

Question: {query}

Answer the question using only information from the context. If the context doesn't contain enough information, say so."""

        return prompt

    def _estimate_tokens(self, text: str) -> int:
        """Rough token estimation"""
        # 1 token ≈ 0.75 words for English
        return int(len(text.split()) * 1.3)

    def _generate_response(self, prompt: str) -> Dict:
        """Generate response with Bedrock"""

        response = self.bedrock_runtime.invoke_model(
            modelId="anthropic.claude-sonnet-4-20250514",
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 2048,
                "temperature": 0.7,
                "messages": [
                    {"role": "user", "content": prompt}
                ]
            })
        )

        return json.loads(response['body'].read())

    def _calculate_cost(
        self,
        model_id: str,
        input_tokens: int,
        output_tokens: int
    ) -> float:
        """Calculate request cost"""

        pricing = self.pricing.get(model_id, {"input": 0, "output": 0})

        cost = (
            (input_tokens / 1000) * pricing["input"] +
            (output_tokens / 1000) * pricing["output"]
        )

        return cost

    def _publish_metric(
        self,
        metric_name: str,
        value: float,
        namespace: str = "GenAI/Custom"
    ):
        """Publish custom metric to CloudWatch"""

        try:
            self.cloudwatch.put_metric_data(
                Namespace=namespace,
                MetricData=[
                    {
                        'MetricName': metric_name,
                        'Value': value,
                        'Unit': 'None',
                        'Timestamp': datetime.now()
                    }
                ]
            )
        except Exception as e:
            # Don't fail request if metric publishing fails
            print(f"Warning: Failed to publish metric {metric_name}: {e}")

# Usage
rag_system = InstrumentedRAGSystem()

response = rag_system.query(
    user_query="What's the return policy for damaged electronics?",
    user_id="user_12345"
)

print(f"Answer: {response['answer']}")
print(f"Cost: ${response['metadata']['total_cost']}")
print(f"Tokens: {response['metadata']['total_tokens']}")

AWS X-Ray Integration

X-Ray provides the service map and bottleneck detection that traces alone can’t give you.

Enabling X-Ray Active Tracing

Lambda Function:

Terraform Configuration:

Custom X-Ray Segments

# custom_xray_segments.py
from aws_xray_sdk.core import xray_recorder

class XRayInstrumentedRAG:
    """RAG system with custom X-Ray segments"""

    def query(self, user_query: str):
        """Process query with custom segments"""

        # Retrieval segment
        with xray_recorder.capture('retrieval') as segment:
            contexts = self._retrieve_contexts(user_query)

            # Add annotations (indexed for filtering)
            segment.put_annotation('documents_found', len(contexts))
            segment.put_annotation('avg_relevance', 
                                  sum(c['score'] for c in contexts) / len(contexts))

            # Add metadata (not indexed)
            segment.put_metadata('retrieval_method', 'vector_search')
            segment.put_metadata('top_documents', [c['id'] for c in contexts[:3]])

        # Generation segment
        with xray_recorder.capture('generation') as segment:
            response = self._generate(user_query, contexts)

            # Annotations
            segment.put_annotation('input_tokens', response['input_tokens'])
            segment.put_annotation('output_tokens', response['output_tokens'])
            segment.put_annotation('cost_usd', response['cost'])

            # Metadata
            segment.put_metadata('model_id', response['model_id'])
            segment.put_metadata('stop_reason', response['stop_reason'])

        return response

X-Ray Service Map Insights

X-Ray automatically generates service maps showing:

Building Comprehensive CloudWatch Dashboards

Create unified dashboards showing the full picture:

# comprehensive_dashboard.py
import boto3
import json
from typing import Dict, List

class GenAIDashboardBuilder:
    """Build comprehensive CloudWatch dashboards for GenAI systems"""

    def __init__(self):
        self.cloudwatch = boto3.client('cloudwatch')

    def create_production_dashboard(self) -> str:
        """
        Create production-grade dashboard with:
        - Quality metrics
        - Performance metrics
        - Cost tracking
        - Error monitoring
        - User satisfaction
        """

        dashboard_body = {
            "widgets": self._build_all_widgets()
        }

        response = self.cloudwatch.put_dashboard(
            DashboardName='GenAI-Production-Observability',
            DashboardBody=json.dumps(dashboard_body)
        )

        dashboard_url = (
            f"https://console.aws.amazon.com/cloudwatch/home"
            f"?region=us-east-1#dashboards:name=GenAI-Production-Observability"
        )

        print(f"✓ Dashboard created: {dashboard_url}")
        return dashboard_url

    def _build_all_widgets(self) -> List[Dict]:
        """Build all dashboard widgets"""

        widgets = []

        # Row 1: Quality Metrics (0, 0)
        widgets.append(self._quality_metrics_widget(x=0, y=0))
        widgets.append(self._quality_distribution_widget(x=12, y=0))

        # Row 2: Performance Metrics (0, 6)
        widgets.append(self._latency_breakdown_widget(x=0, y=6))
        widgets.append(self._throughput_widget(x=12, y=6))

        # Row 3: Cost & Tokens (0, 12)
        widgets.append(self._cost_metrics_widget(x=0, y=12))
        widgets.append(self._token_usage_widget(x=8, y=12))
        widgets.append(self._cost_per_user_widget(x=16, y=12))

        # Row 4: Errors & Alerts (0, 18)
        widgets.append(self._error_rate_widget(x=0, y=18))
        widgets.append(self._error_breakdown_widget(x=8, y=18))
        widgets.append(self._recent_errors_log_widget(x=16, y=18))

        # Row 5: Model Performance (0, 24)
        widgets.append(self._model_comparison_widget(x=0, y=24))
        widgets.append(self._stop_reasons_widget(x=12, y=24))

        # Row 6: User Experience (0, 30)
        widgets.append(self._user_satisfaction_widget(x=0, y=30))
        widgets.append(self._session_metrics_widget(x=12, y=30))

        # Row 7: X-Ray Service Map (0, 36)
        widgets.append(self._xray_service_map_widget(x=0, y=36))

        return widgets

    def _quality_metrics_widget(self, x: int, y: int) -> Dict:
        """Real-time quality metrics"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Quality", "Faithfulness", {
                        "stat": "Average",
                        "label": "Faithfulness"
                    }],
                    [".", "AnswerRelevancy", {
                        "stat": "Average",
                        "label": "Relevancy"
                    }],
                    [".", "ContextPrecision", {
                        "stat": "Average",
                        "label": "Context Precision"
                    }],
                    [".", "ContextRecall", {
                        "stat": "Average",
                        "label": "Context Recall"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "📊 RAG Quality Metrics",
                "period": 300,
                "yAxis": {
                    "left": {
                        "min": 0,
                        "max": 1,
                        "label": "Score"
                    }
                },
                "annotations": {
                    "horizontal": [
                        {
                            "value": 0.85,
                            "label": "Target",
                            "color": "#2ca02c"
                        },
                        {
                            "value": 0.75,
                            "label": "Warning",
                            "color": "#ff7f0e"
                        },
                        {
                            "value": 0.60,
                            "label": "Critical",
                            "color": "#d62728"
                        }
                    ]
                }
            }
        }

    def _quality_distribution_widget(self, x: int, y: int) -> Dict:
        """Quality score distribution"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Quality", "Faithfulness", {
                        "stat": "p50",
                        "label": "P50"
                    }],
                    ["...", {
                        "stat": "p90",
                        "label": "P90"
                    }],
                    ["...", {
                        "stat": "p99",
                        "label": "P99"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "📈 Faithfulness Distribution (P50/P90/P99)",
                "period": 300
            }
        }

    def _latency_breakdown_widget(self, x: int, y: int) -> Dict:
        """Latency breakdown by component"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Performance", "EmbeddingLatency", {
                        "stat": "Average",
                        "label": "Embeddings"
                    }],
                    [".", "VectorSearchLatency", {
                        "stat": "Average",
                        "label": "Vector Search"
                    }],
                    [".", "RerankLatency", {
                        "stat": "Average",
                        "label": "Reranking"
                    }],
                    [".", "GenerationLatency", {
                        "stat": "Average",
                        "label": "LLM Generation"
                    }],
                    [".", "EndToEndLatency", {
                        "stat": "Average",
                        "label": "Total",
                        "color": "#1f77b4"
                    }]
                ],
                "view": "timeSeries",
                "stacked": True,
                "region": "us-east-1",
                "title": "⚡ Latency Breakdown (Stacked)",
                "period": 300,
                "yAxis": {
                    "left": {
                        "label": "Milliseconds"
                    }
                }
            }
        }

    def _throughput_widget(self, x: int, y: int) -> Dict:
        """Request throughput"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Throughput", "RequestCount", {
                        "stat": "Sum",
                        "label": "Total Requests"
                    }],
                    [".", "SuccessfulRequests", {
                        "stat": "Sum",
                        "label": "Successful"
                    }],
                    [".", "FailedRequests", {
                        "stat": "Sum",
                        "label": "Failed"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "🔄 Request Throughput",
                "period": 300
            }
        }

    def _cost_metrics_widget(self, x: int, y: int) -> Dict:
        """Cost tracking"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Cost", "TotalCost", {
                        "stat": "Sum",
                        "label": "Total Cost"
                    }],
                    [".", "EmbeddingCost", {
                        "stat": "Sum",
                        "label": "Embeddings"
                    }],
                    [".", "GenerationCost", {
                        "stat": "Sum",
                        "label": "Generation"
                    }]
                ],
                "view": "timeSeries",
                "stacked": True,
                "region": "us-east-1",
                "title": "💰 Cost Breakdown (USD)",
                "period": 300,
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        }

    def _token_usage_widget(self, x: int, y: int) -> Dict:
        """Token consumption"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Tokens", "InputTokens", {
                        "stat": "Sum",
                        "label": "Input Tokens"
                    }],
                    [".", "OutputTokens", {
                        "stat": "Sum",
                        "label": "Output Tokens"
                    }]
                ],
                "view": "timeSeries",
                "stacked": True,
                "region": "us-east-1",
                "title": "🎫 Token Usage",
                "period": 300
            }
        }

    def _cost_per_user_widget(self, x: int, y: int) -> Dict:
        """Cost per user/query"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Cost", "CostPerQuery", {
                        "stat": "Average",
                        "label": "Avg per Query"
                    }],
                    ["...", {
                        "stat": "p95",
                        "label": "P95 per Query"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "💵 Cost Per Query",
                "period": 300,
                "yAxis": {
                    "left": {
                        "label": "USD"
                    }
                }
            }
        }

    def _error_rate_widget(self, x: int, y: int) -> Dict:
        """Error rate tracking"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Errors", "ErrorRate", {
                        "stat": "Average",
                        "label": "Error Rate %"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "❌ Error Rate",
                "period": 300,
                "yAxis": {
                    "left": {
                        "label": "Percentage",
                        "min": 0,
                        "max": 100
                    }
                },
                "annotations": {
                    "horizontal": [
                        {
                            "value": 1,
                            "label": "Target < 1%",
                            "color": "#2ca02c"
                        },
                        {
                            "value": 5,
                            "label": "Critical > 5%",
                            "color": "#d62728"
                        }
                    ]
                }
            }
        }

    def _error_breakdown_widget(self, x: int, y: int) -> Dict:
        """Error breakdown by type"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Errors", "RetrievalErrors", {
                        "stat": "Sum",
                        "label": "Retrieval"
                    }],
                    [".", "GenerationErrors", {
                        "stat": "Sum",
                        "label": "Generation"
                    }],
                    [".", "ThrottlingErrors", {
                        "stat": "Sum",
                        "label": "Throttling"
                    }],
                    [".", "ValidationErrors", {
                        "stat": "Sum",
                        "label": "Validation"
                    }]
                ],
                "view": "timeSeries",
                "stacked": True,
                "region": "us-east-1",
                "title": "🔍 Error Breakdown",
                "period": 300
            }
        }

    def _recent_errors_log_widget(self, x: int, y: int) -> Dict:
        """Recent errors from logs"""
        return {
            "type": "log",
            "x": x,
            "y": y,
            "width": 8,
            "height": 6,
            "properties": {
                "query": """
                SOURCE '/aws/lambda/rag-api'
                | fields @timestamp, @message, error_type, request_id
                | filter @message like /ERROR/
                | sort @timestamp desc
                | limit 20
                """,
                "region": "us-east-1",
                "title": "📋 Recent Errors",
                "view": "table"
            }
        }

    def _model_comparison_widget(self, x: int, y: int) -> Dict:
        """Compare model performance"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Models", "AvgLatency", {
                        "stat": "Average",
                        "dimensions": {"ModelId": "claude-sonnet-4"}
                    }],
                    ["...", {
                        "dimensions": {"ModelId": "claude-opus-4"}
                    }],
                    ["...", {
                        "dimensions": {"ModelId": "claude-haiku-4"}
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "🤖 Model Latency Comparison",
                "period": 300
            }
        }

    def _stop_reasons_widget(self, x: int, y: int) -> Dict:
        """LLM stop reasons distribution"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Behavior", "StopReason", {
                        "stat": "SampleCount",
                        "dimensions": {"Reason": "end_turn"}
                    }],
                    ["...", {
                        "dimensions": {"Reason": "max_tokens"}
                    }],
                    ["...", {
                        "dimensions": {"Reason": "stop_sequence"}
                    }]
                ],
                "view": "timeSeries",
                "stacked": True,
                "region": "us-east-1",
                "title": "🛑 Stop Reasons",
                "period": 300
            }
        }

    def _user_satisfaction_widget(self, x: int, y: int) -> Dict:
        """User feedback scores"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/UserExperience", "FeedbackScore", {
                        "stat": "Average",
                        "label": "Avg Satisfaction"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "⭐ User Satisfaction (1-5)",
                "period": 300,
                "yAxis": {
                    "left": {
                        "min": 1,
                        "max": 5
                    }
                }
            }
        }

    def _session_metrics_widget(self, x: int, y: int) -> Dict:
        """Session-level metrics"""
        return {
            "type": "metric",
            "x": x,
            "y": y,
            "width": 12,
            "height": 6,
            "properties": {
                "metrics": [
                    ["GenAI/Sessions", "AvgSessionDuration", {
                        "stat": "Average",
                        "label": "Avg Duration (s)"
                    }],
                    [".", "AvgTurnsPerSession", {
                        "stat": "Average",
                        "label": "Avg Turns"
                    }]
                ],
                "view": "timeSeries",
                "stacked": False,
                "region": "us-east-1",
                "title": "💬 Session Metrics",
                "period": 300
            }
        }

    def _xray_service_map_widget(self, x: int, y: int) -> Dict:
        """X-Ray service map"""
        return {
            "type": "trace",
            "x": x,
            "y": y,
            "width": 24,
            "height": 8,
            "properties": {
                "title": "🗺️ X-Ray Service Map - RAG System",
                "region": "us-east-1"
            }
        }

# Create dashboard
builder = GenAIDashboardBuilder()
dashboard_url = builder.create_production_dashboard()

Alarming Strategy for GenAI Systems

Set up intelligent alarms that catch real issues:

# genai_alarms.py
import boto3
from typing import Dict, List

class GenAIAlarmManager:
    """Comprehensive alarming for GenAI systems"""

    def __init__(self, sns_topic_arn: str):
        self.cloudwatch = boto3.client('cloudwatch')
        self.sns_topic_arn = sns_topic_arn

    def create_all_alarms(self):
        """Create complete alarm suite"""

        alarms = [
            # Quality alarms
            self._quality_degradation_alarm(),
            self._faithfulness_critical_alarm(),

            # Performance alarms
            self._high_latency_alarm(),
            self._latency_spike_alarm(),

            # Cost alarms
            self._cost_spike_alarm(),
            self._daily_budget_alarm(),

            # Error alarms
            self._high_error_rate_alarm(),
            self._retrieval_failure_alarm(),

            # Composite alarms
            self._system_degraded_composite_alarm()
        ]

        for alarm_config in alarms:
            self.cloudwatch.put_metric_alarm(**alarm_config)
            print(f"✓ Created alarm: {alarm_config['AlarmName']}")

    def _quality_degradation_alarm(self) -> Dict:
        """Alert when quality metrics drop"""
        return {
            'AlarmName': 'RAG-Quality-Degradation',
            'ComparisonOperator': 'LessThanThreshold',
            'EvaluationPeriods': 2,
            'DatapointsToAlarm': 2,  # 2 out of 2
            'MetricName': 'Faithfulness',
            'Namespace': 'GenAI/Quality',
            'Period': 300,
            'Statistic': 'Average',
            'Threshold': 0.75,
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Faithfulness score dropped below 0.75 for 10 minutes',
            'TreatMissingData': 'notBreaching'
        }

    def _faithfulness_critical_alarm(self) -> Dict:
        """Critical alarm for severe quality drop"""
        return {
            'AlarmName': 'RAG-Faithfulness-Critical',
            'ComparisonOperator': 'LessThanThreshold',
            'EvaluationPeriods': 1,
            'DatapointsToAlarm': 1,
            'MetricName': 'Faithfulness',
            'Namespace': 'GenAI/Quality',
            'Period': 300,
            'Statistic': 'Average',
            'Threshold': 0.60,
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'CRITICAL: Faithfulness below 0.60 - immediate action required',
            'TreatMissingData': 'breaching'
        }

    def _high_latency_alarm(self) -> Dict:
        """Alert on high P95 latency"""
        return {
            'AlarmName': 'RAG-High-Latency-P95',
            'ComparisonOperator': 'GreaterThanThreshold',
            'EvaluationPeriods': 3,
            'DatapointsToAlarm': 2,
            'MetricName': 'EndToEndLatency',
            'Namespace': 'GenAI/Performance',
            'Period': 300,
            'ExtendedStatistic': 'p95',
            'Threshold': 5000,  # 5 seconds
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'P95 latency exceeded 5 seconds',
            'TreatMissingData': 'notBreaching'
        }

    def _latency_spike_alarm(self) -> Dict:
        """Detect sudden latency spikes using anomaly detection"""
        return {
            'AlarmName': 'RAG-Latency-Anomaly',
            'ComparisonOperator': 'GreaterThanUpperThreshold',
            'EvaluationPeriods': 2,
            'Metrics': [
                {
                    'Id': 'm1',
                    'ReturnData': True,
                    'MetricStat': {
                        'Metric': {
                            'Namespace': 'GenAI/Performance',
                            'MetricName': 'EndToEndLatency'
                        },
                        'Period': 300,
                        'Stat': 'Average'
                    }
                },
                {
                    'Id': 'ad1',
                    'Expression': 'ANOMALY_DETECTION_BAND(m1, 2)',
                    'Label': 'Latency (expected)'
                }
            ],
            'ThresholdMetricId': 'ad1',
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Latency anomaly detected (2 standard deviations)',
            'TreatMissingData': 'notBreaching'
        }

    def _cost_spike_alarm(self) -> Dict:
        """Alert on unexpected cost spikes"""
        return {
            'AlarmName': 'RAG-Cost-Spike',
            'ComparisonOperator': 'GreaterThanThreshold',
            'EvaluationPeriods': 1,
            'DatapointsToAlarm': 1,
            'MetricName': 'TotalCost',
            'Namespace': 'GenAI/Cost',
            'Period': 300,
            'Statistic': 'Sum',
            'Threshold': 50.0,  # $50 per 5 minutes
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Cost spike detected: >$50 in 5 minutes',
            'TreatMissingData': 'notBreaching'
        }

    def _daily_budget_alarm(self) -> Dict:
        """Alert when approaching daily budget"""
        return {
            'AlarmName': 'RAG-Daily-Budget-Warning',
            'ComparisonOperator': 'GreaterThanThreshold',
            'EvaluationPeriods': 1,
            'DatapointsToAlarm': 1,
            'MetricName': 'TotalCost',
            'Namespace': 'GenAI/Cost',
            'Period': 86400,  # 24 hours
            'Statistic': 'Sum',
            'Threshold': 800.0,  # $800 per day (80% of $1000 budget)
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Daily cost approaching budget limit (80%)',
            'TreatMissingData': 'notBreaching'
        }

    def _high_error_rate_alarm(self) -> Dict:
        """Alert on elevated error rate"""
        return {
            'AlarmName': 'RAG-High-Error-Rate',
            'ComparisonOperator': 'GreaterThanThreshold',
            'EvaluationPeriods': 2,
            'DatapointsToAlarm': 2,
            'MetricName': 'ErrorRate',
            'Namespace': 'GenAI/Errors',
            'Period': 300,
            'Statistic': 'Average',
            'Threshold': 5.0,  # 5% error rate
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Error rate exceeded 5%',
            'TreatMissingData': 'notBreaching'
        }

    def _retrieval_failure_alarm(self) -> Dict:
        """Alert on retrieval failures"""
        return {
            'AlarmName': 'RAG-Retrieval-Failures',
            'ComparisonOperator': 'GreaterThanThreshold',
            'EvaluationPeriods': 1,
            'DatapointsToAlarm': 1,
            'MetricName': 'RetrievalErrors',
            'Namespace': 'GenAI/Errors',
            'Period': 300,
            'Statistic': 'Sum',
            'Threshold': 10,  # 10 retrieval failures in 5 min
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': 'Multiple retrieval failures detected',
            'TreatMissingData': 'notBreaching'
        }

    def _system_degraded_composite_alarm(self) -> Dict:
        """Composite alarm for multiple degradation signals"""
        return {
            'AlarmName': 'RAG-System-Degraded',
            'AlarmRule': (
                '(ALARM("RAG-Quality-Degradation") OR ALARM("RAG-High-Latency-P95")) '
                'AND ALARM("RAG-High-Error-Rate")'
            ),
            'ActionsEnabled': True,
            'AlarmActions': [self.sns_topic_arn],
            'AlarmDescription': (
                'System degraded: Multiple quality/performance/error issues detected'
            )
        }

# Usage
alarm_manager = GenAIAlarmManager(
    sns_topic_arn='arn:aws:sns:us-east-1:123456789:genai-alerts'
)

alarm_manager.create_all_alarms()

Integration with Existing Observability Tools

Grafana Integration

Datadog Integration

Key Takeaways

  1. CloudWatch GenAI Observability is purpose-built – Provides out-of-the-box dashboards for Bedrock model invocations and AgentCore agents. No custom instrumentation needed for basic metrics.

  2. OpenTelemetry + ADOT enables custom observability – Use ADOT to instrument your application with custom spans capturing retrieval quality, token usage, and costs. Automatically traces boto3 AWS SDK calls.

  3. X-Ray provides the service map – Distributed tracing shows bottlenecks across your RAG pipeline. Service maps visualize dependencies and highlight slow components (typically vector search).

  4. Comprehensive dashboards require custom metrics – Quality scores (faithfulness, relevancy), cost per query, and token breakdowns need custom CloudWatch metrics alongside out-of-the-box Bedrock metrics.

  5. Intelligent alarming prevents incidents – Set thresholds for quality degradation, cost spikes, and latency. Use composite alarms for multi-signal degradation detection. Anomaly detection catches unusual patterns.

  6. Integration extends visibility – Export to Grafana, Datadog, or existing observability stacks using CloudWatch exporters or direct API integration. Don’t build in isolation.

  7. Traces + Metrics + Logs = Complete picture – You need all three: traces for request flows, metrics for aggregates, logs for debugging specific failures. CloudWatch GenAI Observability provides this unified view.

What’s Next in This Series

Part 4: Production Hardening & Advanced Patterns (Coming Next Week – Series Finale!)

We’ll close the series with production-ready patterns:

  • Guardrails in production: Content filtering, PII detection, toxicity screening
  • Human-in-the-loop evaluation: Building feedback loops and annotation workflows
  • Incident response playbooks: What to do when GenAI fails at 3 AM
  • A/B testing strategies: Testing prompts, models, and RAG configurations
  • Canary deployments: Safe rollout strategies with automated rollback
  • Advanced cost optimization: Model routing, caching, and batch processing
  • Security hardening: Protecting against prompt injection and jailbreaks

Additional Resources

AWS Documentation:

  • CloudWatch GenAI Observability
  • AWS X-Ray Developer Guide
  • AWS Distro for OpenTelemetry
  • OpenTelemetry Python SDK

Sample Code & Workshops:

  • CloudWatch GenAI Observability Samples
  • AWS Observability Workshop
  • X-Ray SDK for Python

OpenTelemetry Resources:

  • OpenTelemetry Specification
  • ADOT Collector Documentation
  • Semantic Conventions for GenAI

Integration Guides:

  • Prometheus CloudWatch Exporter
  • Datadog AWS Integration
  • Grafana CloudWatch Data Source

Let’s Connect!

Building observability for production GenAI systems? Let’s share experiences!

Follow me for Part 4 (the series finale!) on Production Hardening & Advanced Patterns. We’ll cover guardrails, incident response, A/B testing, and cost optimization—everything you need to run GenAI at scale.

About the Author

shoaibalimir image

Shoaibali MirFollow

I’m an engineer with 4+ yrs of experience spanning across DevOps, Data, Cloud and AI/ML Engineering Domain. Along with full time work, I’m pursuing Masters Degree in AI/ML from BITS Pilani.

Connect with me on:

  • LinkedIn
  • Twitter/X

Tags: #aws #genai #observability #cloudwatch #xray #opentelemetry #monitoring #genaops #bedrock #distributedtracing

NgSysV2-10.2: PowerShell Scripting essentials

This post series is indexed at NgateSystems.com. You’ll find a super-useful keyword search facility there too.

Last reviewed: Apr’26

Introduction

The concept of “scripting” for PowerShell terminal session procedures was introduced in Post 4.3 without any attempt to describe the language’s syntax.

This post still doesn’t fully cover the subject, but here are the essentials, plus one or two “extras” I’ve found invaluable.

Variables, Arrays and Operators

A variable in PowerShell is a named container that holds a value, such as a string, number, array or object. PowerShell variables are dynamically typed, meaning you don’t need to declare a variable’s data type when assigning a value; it is determined by the assigned value. But note that once assigned, types are strongly enforced.

Variable names in PowerShell are introduced with a $ symbol. Variable names are not case sensitive, so, for instance, $MyVariable and $myvariable refer to the same variable.

Operators

Arithmetical — For mathematical calculations (+, -, *, /, %)
Comparative — For comparing values (-eq, -ne, -gt, -lt, -ge, -le)
Logical — For combining conditions (-and, -or, -not)

Comments

Introduce these with a # symbol,

Output

You can write to the console with the following arrangement

Write-Output "$MyMessage" # write this to the terminal

You can write to a file as follows:

$LogPath = "c:/path-to-my-log-file"
$MyMessage  = " .... "
Add-Content -Path $LogPath -Value $Message

Loops

For loop — Used when you know the exact number of iterations required. It’s commonly used for tasks such as incrementing counters or processing arrays.
While loop — Continues executing as long as a specified condition evaluates to True. It’s ideal for scenarios where the number of iterations depends on a dynamic condition.
ForEach loop — Designed for iterating through collections like arrays or output from commands.For example:

$i = 0 # Initialise a counter
while ($i -lt 5) {    
Write-Output "Iteration: $i"   
 $i++
}

Conditionals

Powershell’s If-Else syntax follows the following model:

$num = 10
if ($num -gt 5) 
{    Write-Output "$num is greater than 5"} 
else 
{    Write-Output "$num is less than or equal to 5"}

“Null”

In PowerShell,

$MyVariable = $null

means “This variable exists, but it contains no value”

It is particularly useful because it evaluates to “false”, along with :

  • $false → explicit boolean false
  • $0 → numeric zero
  • $”” → empty string
  • $@() → empty array

Which is very handy because you can now do:

$MyVariable = $null

if ($MyVariable) {
    Write-Output "True" # or - a handy shortcut - just say "True"
}
else {
    Write-Output "False"
}

Functions

PowerShell lets you define and reference shared blocks of code with the following model:

function Display-Greeting {
    param(
        [string]$Name,
        [int]$Count
    )
    Write-Output "Name: $Name"
    Write-Output "Count: $Count"

    for ($i = 1; $i -le $Count; $i++) {
        Write-Output "Hello $Name ($i)"
    }
}
...
Display-Greeting -Name "Martin" -Count 3

Options exist for the function to accept only valid parameter values – ask ChatGPT for details

Referencing a script with parameters

A parent script can call a subordinate “child” script as follows:

& ".child.ps1" -Param1 value1 -Param2 value2

Meanwhile, the slave script might be configured as follows:

param(
    [Parameter(Mandatory = $true)]
    [string]$Param1,

    [Parameter(Mandatory = $true)]
    [string]$Param2
)

The “Try … Catch” block

Just as in JavaScript, you can capture exceptions and direct them to a “problem-resolution” block:

try {    
    # Code that might throw an error   
} catch {    
    Write-Output "An error occurred: $_"} 
finally {    
    Write-Output "This code always runs, regardless of errors"
}

More usefully, you might consider opening an account with pushover and configuring your script to send a notification to your mobile phone. Registration and configuration are extremely easy and, last time I checked, a lifetime account with pushovercosts just $5.

} catch {    
    curl.exe -s -o NUL --form-string "priority=1" `
        --form-string "token=aiu7yk ..obfuscated... 5uerqq6ix" `
        --form-string "user=ueczz ..obfuscated... jrv54u22" `
        --form-string "message=Something has gone wrong with your nightly .. run" `
        https://api.pushover.net/1/messages.json
}

The -s -o NUL parameter bit above simply suppresses the display of the “response” from curl.exe, which is no help at all when this is running in the Windows scheduler.

But when adding a try block, you need to know that PowerShell does not throw exceptions for many errors by default (especially for external commands). You can fix this either by adding explicit error-handling instructions to individual commands:

Some-Command -ErrorAction Stop

or, more realistically, by setting global instructions with:

$ErrorActionPreference = "Stop"

The “pipeline”

An advanced PS1 script will make extensive use of an arrangement that lets you “chain” commands together with a “pipe” symbol – | – that passes the output from one command as an object that then provides input to the next. So, in

Get-Process | Sort-Object CPU
  • Get-Process → produces “process objects”
  • Sort-Object → receives those objects and sorts them by a property

No string parsing, no fragile text handling — just structured data flowing through. PowerShell passes objects, not text, and binds them by property name or type. Here’s another example:

Get-Service | Where-Object {$_.Status -eq 'Running'}
  • Get-Service → outputs service objects
  • Where-Object → filters them, selecting those with status “running”

Most pipelines follow this shape:

Producer | Filter | Transform | Output

For example:

Get-Process |
Where-Object {$_.CPU -gt 100} |
Select-Object Name, CPU |
Out-File processes.txt

You’ll use these constantly:

  • Where-Object → filter
  • Select-Object → pick properties
  • Sort-Object → sort
  • ForEach-Object → act on each item

What’s New in RustRover 2026.1

Welcome to RustRover 2026.1. This version focuses on supporting the way modern Rust teams build, test, and maintain their code. Highlights include:

  • Native cargo-nextest integration
  • Call hierarchy for faster navigation
  • Easier access to macro expansions
  • Configurable visibility on module creation
  • Support for more AI agents, including GitHub Copilot and Cursor
Download RustRover

RustRover 2026.1

Key updates

Code analysis is now more accurate

We’ve continued improving RustRover’s code analysis, with a recent focus on reducing false positives that can cause confusion.

If you notice any false positives, please report them in our issue tracker so we can keep improving code insight.

Run tests faster with cargo-nextest support in the IDE

Running tests in large Rust workspaces can be slow with the default test runner. Many teams rely on cargo-nextest for faster, more scalable execution, but until now, it required switching to the terminal.We’ve added native support for cargo-nextest directly in the IDE. You can now run and monitor nextest sessions with full progress reporting and structured results in the Test tool window, without leaving your development workflow.

Trace call chains more easily

If you’ve ever tried to trace how execution reaches a function in a trait-heavy codebase, a flat list of usages can be hard to interpret. You get the matches, but you lose the bigger picture of the call chain.

RustRover 2026.1 adds Call Hierarchy support for Rust, so you can explore call relationships in a dedicated view and navigate complicated code faster. The hierarchy is Rust-aware and distinguishes between trait method calls and calls to concrete implementations.

ACP Registry in RustRover

In addition to Junie, Claude Agent, and most recently Codex, RustRover now lets you work with more AI agents directly in the AI chat. You can choose from agents such as GitHub Copilot, Cursor, and many others supported through the Agent Client Protocol (ACP).

Choose module visibility on creation 

When you create a new module, you often know right away whether it should be public or private. Previously, that meant creating the file first and then updating visibility manually.

RustRover now lets you choose module visibility directly in the New Rust Module dialog. This means you can create public or private modules and attach them to a module in a single step, reducing cleanup and keeping project structure consistent.

Workflow improvements

Updated LLDB debugger

RustRover 2026.1 updates LLDB to version 21, bringing performance and reliability improvements for debugging sessions. Expect faster loading of debug information through improved DWARF indexing and parallel shared-library parsing, along with more reliable breakpoint behavior in inline code.   

Macro expansion, one step away

Rust macros can hide a lot of logic behind a single line. When you need to confirm what code will actually be compiled, seeing the expansion is often the fastest way to understand what is going on.

RustRover makes it easier to find macro expansions right where you need them. Use the gutter icon on macro calls or the ⌥↩ (macOS) / Alt+Enter (Windows/Linux) shortcut to open the Show Context Actions menu and inspect the generated code without leaving the editor. 

Bug fixes and code insight improvements

Code insight improvements for derive macros

Derive and procedural macros generate code behind the scenes, which can make IDE analysis harder than it looks in the source. 

RustRover 2026.1 improves name resolution to reduce misleading warnings and keep editor feedback more dependable. Expect cleaner inspections and steadier code insight in macro-heavy projects.

Restored trust in IDE diagnostics when working with rustc crates

If you work with nightly and compiler-internal crates (rustc_*), you may have seen RustRover report E0463 errors even though the project still built successfully. That mismatch can make it harder to rely on editor feedback when you are working close to compiler internals. This RustRover 2026.1 reduces these false positives, so diagnostics in the editor better match what you get from cargo build and cargo check when using rustc_* crates.

AI updates

Next edit suggestions, now quota-free 

Next edit suggestions help you apply related edits across a file, not just at the cursor. In RustRover 2026.1, they are available without consuming AI quota for JetBrains AI Pro, Ultimate, and Enterprise subscriptions, helping you keep changes consistent and stay in the flow while you iterate.

More agent options in the AI chat

RustRover now supports a wider choice of agents in the AI chat, including Junie and Codex,  so you can pick the one that best fits the task at hand. It allows you to switch between assistance styles without leaving the development workflow.

AI help for database work

When you’re working with a connected database, RustRover’s AI chat can help you query and analyze data, adjust SQL queries, and confirm changes right in the IDE. This keeps database work in the same flow as your code, instead of bouncing between tools. External agents can access the same database support through an MCP server.

Code With Me sunset

As we continue to evolve our IDEs and focus on the areas that deliver the most value to developers, we’ve decided to sunset Code With Me, our collaborative coding and pair programming service. Demand for this type of functionality has declined in recent years, and we’re prioritizing more modern workflows tailored to professional software development.

As of version 2026.1, Code With Me will be unbundled from all JetBrains IDEs. Instead, it will be available on JetBrains Marketplace as a separate plugin. 2026.1 will be the last IDE version to officially support Code With Me, as we gradually sunset the service.

Download RustRover 2026.1