Layers Made It Universal. Harnesses Made It Run

A continuation of Flip the Axis: A Layer-Based Approach to Multi-Service Migrations.

TL;DR

You can’t script your way across a fleet of snowflake repositories. Neither can you just ask an AI agent to “migrate this service” and hope. What worked for the eight-quarter migration was a harness — a prompt pipeline in which each layer was a sequence of ordered steps: some calling scripts for deterministic changes, some using AI to discover and adapt, and some validating the results. The harness chained their outputs, ran each layer across 21 repos at once, and landed merge requests when it was done.

Here’s what that looked like in practice — including the parts we got wrong.

From methodology to machinery

The previous article ended on a line worth repeating: the methodology enables the tooling, not the other way around. This post is where that line becomes machinery.

The project: migrating 21 services from ECS to EKS — the final wave of an eight-quarter effort. Four engineers, targeting roughly ten repos per engineer per day on a given layer. The services were snowflakes: each with its own code style, CI configuration, logging framework, naming conventions, and infrastructure setup. The layers defined what to do — add an OIDC provider, swap the logging appender, rewrite the CI pipeline, set up a piece of infra — and the action was identical across services. But how to execute each layer varied across repositories. Same change, different wiring, 21 times over — toil that doesn’t yield to a single script. The layer approach made that pace imaginable. The harness made it real — though the most important piece wasn’t what we built first.

How a layer runs

Every layer ran through the same pipeline — a workflow of ordered prompts that mixed step types.

Take logging. One service uses logback.xml and is Java, another uses a custom logging setup and is NodeJS, and another has a custom appender in a shared library. You can’t script that discovery — but you can script adding the dependency once the agent finds the right config. So the workflow does both: an AI step to discover and adapt, a script step for the deterministic edit, and a validation step to check the result.

Some steps called Go tools for deterministic changes — known target, computable edit, no reasoning required. Others used AI to discover and adapt: Terraform is a good example — we could not realistically script the changes, but we could give the LLM the modules to use for adding new configuration. Others validated what the earlier ones produced. The pipeline chained their outputs, for example:

  • What a script produced in step 3 informed what the agent analyzed in step 4.
  • What the agent implemented in step 5 was what the validation prompt checked in step 7.

Step chaining

Why “ask Claude to migrate this service” fails

You cannot ask an AI agent to “migrate this service to EKS.” The task is too broad. The context is too large. The agent will hallucinate a plausible-looking solution, skip steps it decided weren’t important, or produce something that looks right and isn’t.

[!NOTE]
The failure mode isn’t that the AI is dumb. It’s that the task has no structure, so the agent invents its own — and its structure drifts between runs.

You get 21 repositories migrated in 21 different ways, with 21 different sets of mistakes to audit. That’s worse than having done nothing.

The fix isn’t a better prompt. The fix is everything around the prompt.

Harness engineering, named a little late

A few months ago, the term harness engineering started showing up — a zero-volume search term until early 2026. The idea: design the constraints and scaffolding around an LLM that make it reliable. Not whether to use AI, but to answer: what do you do after it’s part of your toolchain?

So far, the conversation is mostly about coding agents. We got here through infrastructure migration — and structured infra work is where the pattern fits most naturally. The changes follow patterns. The validation criteria are concrete. And the same change repeats across dozens of repos — which is exactly what a harness is built for.

The answer turns out to be unsexy: a pipeline that runs prompts in a fixed order, enforces what the agent can touch at each step, and validates its own work before declaring done.

I didn’t have that term in October 2025 when we built our own version of one. It’s easier to name a thing after you’ve already built it wrong twice. We just kept running into the same failure — the agent doing too much, too fast, with not enough guardrails — and kept adding constraints until it stopped failing. Mutation boundaries. Ordered steps. Explicit reasoning gates. Validation passes.

What was new — for us — was treating the harness as the primary engineering artifact. Not the prompt. Not the model. The pipeline around them.

The shift worth taking from it is this — stop optimizing the prompt, start engineering the pipeline that runs it.

What our harness looked like

A workflow is a sequence of numbered markdown files — each one a step prompt. The agent runs them in order inside an isolated Claude Code session. Context compounds: what step 2 discovered informs what step 5 decides.

The shape is always the same: context → discovery → analysis → planning → implementation → validation → ship → report.

Not every step needs the agent to reason. Some prompts call a Go tool or script for a deterministic change — same edit, known target, no drift. Others need AI to discover, analyze, and adapt. The workflow mixes both and outputs the chain forward.

Six things make this structure trustworthy:

  • Mutation boundaries. Every step is tagged READ ONLY, MAKES CHANGES, or PLANNING ONLY. The agent knows what it’s allowed to do at each point. No surprise writes during discovery.

  • Context anchoring. Step 0 explains why, not just what. “Here is why we move this piece of infrastructure and need it to be X and Z in a new destination.” “Here is why templates can’t be applied blindly.” The agent gets the intent before it touches any code.

  • Domain knowledge in the prompts. Hard-won lessons encoded directly: “BOM manages versions — you still must declare STS explicitly.” “Check actual pipeline behavior, not just config files.” These are the footnotes a senior engineer would leave for a junior one. The agent gets them every time.

The first three give the agent the right starting position. The rest keep it from wandering.

  • Early exits. If the dependency already exists, skip to the report. If a values file for a Helm chart is already configured, skip to the report. Not every repo needs every step.

  • Explicit reasoning gates. Complex steps require reasoning, so the agent has to reason through the problem before acting — no freestyle implementation — and this must be declared explicitly.

  • The implement/validate split. For critical layers, two separate workflows ran in sequence: one to implement, one to validate. The validation workflow reviewed the implementation against defined criteria rather than rubber-stamping its own work. This did more for reliability than any other constraint we added.

None of these is about making the agent smarter. They’re about making it predictable. Every constraint trades a degree of freedom for a degree of reliability. That’s the whole game.

The instrument

Workflows describe what the agent should do per repo. We still needed an orchestrator to run them — and to run them across a batch of repos at once.

So we built a parallel execution wrapper on top of the Claude Agent SDK. Think of it as a prompt pipeline runner. Point it at a workflow and a list of repositories, and for each repo it clones into its own git worktree, spins up an isolated Claude Code session, runs the workflow’s step files in order, enforces the mutation boundaries from the previous section, and lands a per-repo report when it’s done. One engineer monitors the batch and reviews the results.

Parallel execution diagram showing a workflow and repo list fanning out to isolated sessions per repository, converging to engineer review

Each session adapts to its repository while following the same workflow. The agent in repo A figures out one logging setup; the agent in repo B figures out a completely different one. Both produce the same kind of report. The engineer opens ten reports instead of writing ten PRs.

This is the automated flip-the-axis model. One layer, one workflow, N repos, one human in the loop. All twelve migration layers ran through this pipeline.

Prompt pipeline + parallel sessions + per-repo reports + human review at the end.

Prompts as production code

The non-obvious part: all of this lived in a shared repository. Workflows, runbooks, Go tools, scripts, per-service metadata — all version-controlled in one place. The project headquarters.

This wasn’t convenient. It was a knowledge-sharing mechanism.

When an engineer discovered a prompt needed more specificity — say, the IAM workflow missed an edge case with cross-account trust policies — they updated the prompt and committed it. On the next pull, every teammate got the improvement. Same for validation scripts, runbook instructions, and service metadata.

The lesson: treat AI prompts and workflows like production code. Review changes. Iterate as a team. The prompts you write in week 1 will not be the prompts you need in week 6 — and every improvement should propagate to everyone automatically.

If your team is adopting AI for infra work and the prompts live in private gists, you’ve already lost the compounding advantage.

Honest assessment

Cross-referencing was the hardest problem we didn’t fully solve. The Helm values layer required correlating data from Terraform configs, ECS task definitions, and application codebases into a single file. AI agents working within a single repository can’t hold that full picture. Human judgment and manual cross-checking stayed essential here, and I don’t think that’s going to change for this class of problem anytime soon.

The reliability gradient was predictable. Within each workflow, the deterministic steps — the ones calling scripts — never drifted. The AI steps were reliable when isolated to a single file following a repeatable pattern. Reliability dropped when a step required cross-repository context or judgment calls about tradeoffs. Those stayed with humans.

We validated — but not enough. The implement/validate split worked wherever we applied it — it was the highest-leverage pattern in the whole setup. But we didn’t apply it to every layer, and we should have. The same workflows we used for implementation could have run in verification mode at near-zero additional cost. Our QA and PREPROD environments caught what universal validation would have. They shouldn’t have had to.

If I were starting this project again tomorrow, the first thing I’d build is a validation workflow for every implementation workflow, from day one. Not as an afterthought. As a matching pair.

What this means

Layers made the work universal. Harnesses made the AI trustworthy enough to run it — even across 21 repos, each wired differently. Together they closed out an eight-quarter migration — four engineers, not forty.

The next time someone frames it as “just write a script” versus “just use AI” — it’s the wrong question. Build the harness that runs both.

Further reading:

  • Harness Design for Long-Running Application Development — Anthropic’s engineering team on the same concept applied to agentic coding.
  • Harness Engineering — OpenAI’s take on the same discipline.
  • It’s a Skill Issue: Harness Engineering for Coding Agents — HumanLayer on harness configuration for coding agents.

Flip the Axis: A Layer-Based Approach to Multi-Service Migrations

TL;DR

When you’re migrating many services through the same steps, parallelize by step, not by service. Sweep one type of change across all services, then the next – it compounds learning, catches inconsistencies early, and makes automation viable. But recognize which services don’t fit the pattern: architecturally unique services should still be migrated serially.

The Problem

You’ve probably seen the shape of this problem before. You’re planning next quarter’s migration – could be Kubernetes, a new database engine, a cloud provider switch, a major framework version bump. You count the services. You count the engineers. The math doesn’t work.

Here’s what it looked like for us: 2025 Q4 planning, 21 services still running on ECS (Amazon’s container orchestration service) that needed to move to EKS (their managed Kubernetes platform). A headcount cut left us with 4 engineers. Each service migration had been taking 3-4 weeks of effort. The project had already been running since January 2024 – nearly two years – and the serial execution model from the previous quarter had required 8 engineers for 19 services. We had half the people, more services, and a timeline that was starting to feel permanent.

Nobody was telling us it had to be done next quarter. Our management said, “It’s okay if we can’t, don’t worry.”

But you know what happens to migrations that stretch for many months. They lose momentum. Engineers rotate off. Institutional knowledge erodes. The remaining services – always the hardest ones – sit in a permanent “next quarter” backlog.

You don’t have a staffing problem. You have an execution model problem.

Why serial breaks down

The default migration approach is serial: one engineer owns a service end-to-end and walks it through every step – networking, permissions, environment adjustments, certificates, CI/CD, DNS, cleanup. This works fine for a couple of services. This breaks down at scale.

The engineer context-switches across completely different types of work – networking, then application configuration, then debugging a permissions issue – and never builds deep fluency in any of them. Services are unique snowflakes – each with its own code style, dependency patterns, and configuration quirks. Serial migration means absorbing that uniqueness for every service, at every step.

Even worse – learning stays siloed. An engineer who figured out a networking edge case in week 2 can’t help the engineer who hits the same issue in week 6 – by then, they’ve moved on. Everyone is deep in a different service, at a different stage. The team can’t effectively pair, review, or unblock each other.

The Insight

When I looked at what we’d actually done in Q3 – service by service, step by step – the pattern was obvious in hindsight: we were doing the same work over and over again. Networking, permissions, application setup – identical across services.

It only looked unique because we were thinking one service at a time.

Serial Migration Strategy

What if we flipped the axis? Instead of completing all steps for one service before moving to the next, complete one step across all services before moving to the next step.

That’s the core of the layer-based approach. A layer is one type of infrastructure or configuration change, applied to every service in the migration scope. You sweep through all services at one layer, validate, then move to the next layer.

Layer-based Migration Strategy

Why this works

  • Repetition builds expertise. By the third service in a layer, you’ve seen the pattern. By the tenth, you’re fast.
  • Cross-service checks catch errors early. When you’re applying the same change to 20 services in a row, inconsistencies become obvious.
  • Learning compounds across the team. Everyone works the same layer simultaneously – discoveries spread instantly instead of weeks later.
  • Automation becomes viable. Identical changes across services are exactly what tooling excels at – predictable patterns with minor per-service variations.

Defining Your Layers

The number of layers depends on your migration. Ours had 14. Yours might have 8 or 20.

Here are the categories we found useful, grouped by concern:

  • Discovery: mapping downstream dependencies – services, databases, endpoints, protocols
  • Connectivity: networking between environments, firewall configurations
  • Identity: permissions, service accounts, trust policies, OIDC configuration
  • App level security: certificates, TLS termination, WAF rules
  • Application: runtime configuration, environment variables, secrets, logging adjustments
  • Delivery: CI/CD pipelines, ingress and routing, traffic management for gradual rollout

Your categories will differ. The names don’t matter – the decomposition does.

How to decompose your own migration

Start from a single service migration you’ve already done. List every change you made, in order. Group changes by type, not by when they happened. Each group is a candidate layer.

Then validate: can this layer be applied independently of the next one? Can you validate it before moving on? If yes, it’s a good layer boundary. If two changes are tightly coupled and can’t be validated separately, merge them into one layer.

One rule we learned the hard way: one layer per pull request. Early on, some PRs combined changes from multiple layers – networking and permissions in the same commit. Validation got complex, rollbacks got messy. Keep them separate.

Execution Model

A layer sweep works like this: the team takes on a layer, splits the service list among themselves, and each engineer applies that layer to their assigned services. Everyone works the same type of change simultaneously.

One engineer can realistically sweep a single layer across 6-8 services in a day. That number surprises people – until they know the tooling. We paired the layered methodology with AI-assisted automation that handled the repetitive configuration work across services. But the important thing is: the layer-based structure is what makes that automation possible.

When every service needs the same type of change with minor variations, you can build prompts, scripts, and validation checks that apply across the board. Serial, per-service work is too varied to automate effectively. The AI tooling story – what worked, what failed, and where human judgment was irreplaceable – is the subject of the next post in this series.

During execution, the team meets briefly to sync on edge cases – because the work is homogeneous, an edge case in one service is immediately relevant to every other service going through the same layer.

Progress tracking

A simple table – one column per layer, one row per service – serves as the source of truth. The team updates it in real time. Status per cell: not started, in progress, done, blocked, not applicable. This sounds basic, but it’s surprisingly effective. You can see at a glance where the project stands, which layers are complete, and where blockers are clustering.

Services that don’t fit

Not every service fits the pattern. In our case, 4 out of 21 services were architecturally complex enough that the layered approach didn’t help – they required deep, per-service analysis that negated the speed advantage.

We recognized this early and migrated them serially, with dedicated engineers working in parallel with the layer sweeps. Trying to force these into the pattern would have slowed everything down.

The lesson: the layer-based approach is a force multiplier for homogeneous work. When a service is genuinely unique, serial migration is the right tool. Budget for both.

Coordination That Matches the Work

The coordination model that works during one phase can hurt you in the next.

During layers: synchronize

When the whole team works the same layer, synchronous coordination is natural and cheap. Team syncs are short because everyone has context on the same type of work. An edge case discovered by one engineer is immediately useful to the others. Knowledge transfer happens without any deliberate mechanism – the work itself is identical.

During traffic switching: structure async handoffs

When the project moves from layer execution to per-service traffic switching, the work diverges. Each service has its own timeline, its own blockers, its own owning team with a different schedule. Synchronous coordination becomes expensive – the team is now working on different problems.

This is where a handoff log pays for itself. A shared document – not “made progress on Service X” but the actual PR link, the specific blocker, the decision to skip WAF configuration for this service, and why. What made it work: specificity over summary, explicit ownership, and early surfacing of blockers.

We heavily used this approach during the last phase of migration – the traffic switch – when two team members went to our SF hub to be on site with service owners, and two stayed in Berlin. But this isn’t a timezone trick – it works for co-located teams just as well. Fewer meetings, more focused execution, and a written record that prevents “I thought you were handling that” conversations.

The lesson: match the coordination model to the shape of the work. When work is homogeneous, synchronize. When it diverges, structure async handoffs and get out of each other’s way.

The Traffic Switch Cadence

Layer execution is predictable. Traffic switching is where the surprises live.

We used a graduated weekly cadence: Monday preflight (verify hostnames, certificates, ingress, autoscaling, dashboards – deploy one instance), Tuesday scale up and shift 1% of traffic, Wednesday observe and fix, Thursday shift to 50%, Friday observe and fix, following Monday shift to 100%.

The observation days weren’t idle – they were when most debug work happens. Issues that don’t surface at 1% show up at 50%. Fixes discovered for one service often apply to others in the same batch.

Batch your traffic switches. Running multiple services through this cadence simultaneously amortizes the coordination overhead – the preflight checklist, once built, applies to every service.

When This Works (and When It Doesn’t)

The layered approach is not universal. It works well under specific conditions.

Use layers when:

  • The migration is decomposable into independent, repeatable steps
  • The same type of change applies across many targets with minor per-service variations
  • Changes can be batch-validated – all services at one layer before moving on
  • The team is small relative to the workload and needs a force multiplier

Use serial when:

  • Services are architecturally unique and complex, and require deep, per-service analysis
  • The number of targets is small enough that coordination overhead outweighs the parallelization benefit

This is not an either/or decision. In our migration, the layered approach covered 17 of 21 services. The remaining 4 were migrated serially. Recognizing which services don’t fit the pattern early is just as important as the pattern itself.

What We’d Do Differently

Start the handoff log from day one. We introduced it when the team split across workstreams during traffic switching. In retrospect, the discipline of specificity and explicit ownership helps even when everyone is in the same room working the same layer.

Run validation sweeps after each layer, not at the end. We deferred some validation to later phases, when we did traffic switch on preproduction environments, which made fixing errors more expensive and created pressure during the most time-sensitive window.

Define service owner readiness criteria upfront. Some services reached the traffic switch phase with owners who weren’t fully briefed, dashboards that weren’t adjusted, etc. Clear criteria before the switch phase would have eliminated friction during the highest-pressure window.

Plan for the energy arc. An intensive, multi-month migration grinds people down. Build rotation points into the plan. Bring fresh perspective at deliberate moments – especially before the production switch phase.

Track decisions explicitly, separate from action items. Some decisions logged in the handoff document were missed because they were buried among task updates. A dedicated “decisions” section prevents teams from diverging without realizing it.

Key Takeaways

  1. Flip the axis. When many services go through the same steps, parallelize by step, not by service. The efficiency gain comes from repetition, shared learning, and automation – not from working harder.
  2. Define your layers by decomposing a single service migration. Group changes by type, validate that layers can be applied independently, and enforce one layer per merge request.
  3. Match coordination to the shape of the work. Synchronize when work is homogeneous. Structure async handoffs when it diverges.
  4. Recognize what doesn’t fit the pattern. Some services are genuinely unique. Budget for serial migration alongside the layer sweeps.
  5. The traffic switch is its own phase. Layer execution is predictable. Traffic switching is where surprises live. Treat it with a graduated cadence and observation days.
  6. The methodology enables the tooling, not the other way around. We paired layer-based execution with AI-assisted automation – and that’s what made one engineer sweeping 6-8 services in a day realistic. But the automation only worked because the layers created predictable, repeatable patterns. That story is next.

If this feels like a problem you’ve hit – or you’re about to – I’d like to hear your approach. Same constraint, different solution? A migration where layers didn’t work? Drop a comment.

Correlation-Aware Memory Search: How I Taught OpenClaw to Remember What Matters

This is a submission for the OpenClaw Challenge.

What I Built

I built a correlation-aware memory search plugin for OpenClaw — openclaw-correlation-plugin.

The problem: OpenClaw’s memory returns keyword matches, but doesn’t know that certain contexts always matter together. Search for “backup error” and you get hits on those words — but you also need “last backup time”, “recovery procedures”, and “recent changes”. You have to think to ask for them.

The solution: A rule-based correlation layer. Define correlations once:
json
{
“id”: “cr-error-001”,
“trigger_context”: “backup-operation”,
“trigger_keywords”: [“backup”, “git push”, “commit”, “workspace”],
“must_also_fetch”: [“last-backup-time”, “backup-status”, “recovery-procedures”],
“confidence”: 0.9,
“relationship_type”: “related_to”,
“learned_from”: “backup-verification-failed-silently”
}

When you search for a backup issue, the plugin matches this rule and suggests the additional searches automatically. Zero extra keystrokes.

How I Used OpenClaw

Plugin SDK: Simple but Tricky

The SDK makes tool registration easy — call api.registerTool() with your tools, parameters, and handlers. I built two tools:

  1. memory_search_with_correlation — Enriched memory search. Returns matches + suggested additional searches based on correlation rules.
  2. correlation_check — Debug tool. Test rule matches without performing searches.

Gotcha: The registration API requires { names: [...] } as the second argument, not just tool objects. Documented, but easy to miss.

Three Matching Modes

Mode Use for Tradeoff
auto (default) General use Keyword + context, normalizes hyphens/underscores
strict Zero false positives Word-boundary only, may miss valid matches
lenient Fallback Fuzzy when nothing else matches

The auto mode’s normalization is small but powerful: “backup operation” matches backup-operation rules.

Rule Lifecycle: CI/CD Borrowing

proposal → testing → validated → promoted → retired

Rules follow a promotion pipeline. retired rules are kept but not matched — no data loss. This lesson came hard: I deleted rules that didn’t work, losing their learned_from institutional memory. Now rules get retired, not trashed.

Confidence Scoring: Not “Higher is Better”

I set everything to 0.95 because “high confidence sounds better.” Result: signal drowning. Every query returned the same high-confidence rules, burying context-specific correlations.

The production model:

  • 0.95–0.99: Catastrophic if missed (config changes, gateway restarts)
  • 0.85–0.90: Reliable patterns (backup operations, error debugging)
  • 0.70–0.80: Useful with some false-positive risk (session recovery, git ops)

Zero Runtime Dependencies

The plugin has zero runtime dependencies — only esbuild and vitest for dev. A memory plugin that reads local files has no business pulling in transitive deps. Code is read-only: no filesystem writes, no network, no credentials. Passed security audit in March 2026.

Heartbeat Integration: The Killer Feature

On-demand correlation search is fine. Proactive surfacing is better. Every 5 heartbeats, a script scans the current work context and surfaces related memories before the agent thinks to ask. This is the difference between a search tool and a decision-support system.

Demo

Query: "backup error" with memory_search_with_correlation
json
{
“query”: “backup error”,
“matched_rules”: [
{
“id”: “cr-error-001”,
“context”: “backup-operation”,
“additional_searches”: [“last-backup-time”, “backup-status”, “recovery-procedures”]
},
{
“id”: “cr-session-001”,
“context”: “error-debugging”,
“additional_searches”: [“recovery-procedures”, “recent-changes”, “similar-errors”]
}
],
“suggested_additional_searches”: [
“recovery-procedures”, “recent-changes”, “similar-errors”,
“last-backup-time”, “backup-status”
]
}

Same query. 5 extra contexts. Zero extra keystrokes.

What I Learned

1. Two half-solutions beat greenfield

This plugin merged two earlier experiments: proper SDK lifecycle + rich matching. The code still supports dual formats from both (must_also_fetch and correlations). Sometimes synthesis > from-scratch design.

2. Confidence scores tier, don’t max

0.95 for everything = useless. Tiered confidence prevents signal drowning. Only catastrophic correlations sit at the top.

3. Rules are organizational memory

The learned_from field captures why a rule exists. Deleting rules burns institutional knowledge. Retire, don’t trash.

4. Proactive > reactive

On-demand search is reactive. Heartbeat integration is proactive. Every 5 heartbeats is the sweet spot: useful without token burn.

5. Check ESM/CommonJS compatibility first

A dependency went ESM-only while the gateway uses CommonJS require(). Result: ERR_REQUIRE_ASYNC_MODULE, memory system disabled. Fix: local embeddings via Ollama. Always check module system before upgrading.

6. Know when NOT to correlate

Anti-patterns: 1:1 relationships (write a script instead), generic keywords like “help” or “status” (creates noise). Correlation rules are for probabilistic relationships — real but not guaranteed.

Repo: github.com/ether-btc/openclaw-correlation-plugin

License: MIT

OpenClaw Plugin Registry: correlation-memory (v2.1.0)

Building the OpenClaw Smart Finance Tracker – An AI-Powered Expense Parser

This is a submission for the OpenClaw Challenge.

The Problem

We all get dozens of bank SMS alerts, emails, and app notifications about our spending every week. Something like: “Alert: Card ending 1234 charged $42.50 at WHOLEFDS MRKT on 04/25”.

Tracking these manually in a spreadsheet is tedious. Traditional regex parsers break the moment your bank changes their SMS format. I needed something smarter, something that could understand context.

The Solution: OpenClaw Smart Finance Tracker

The OpenClaw Smart Finance Tracker is a sleek web dashboard that acts as an intelligent middleman. You simply paste your raw notification strings, and the power of OpenClaw’s intelligent parsing extracts the precise data:

  • Amount
  • Merchant
  • Category
  • Date

It then logs this perfectly into a visual dashboard, giving you a real-time health check of your monthly spending.

Check out the project code here: GitHub Repository

How I Built It (OpenClaw in Action / Wealth of Knowledge)

Here is a breakdown of how the application was architected:

1. Frontend Architecture

I wanted the app to feel premium and fast, so I skipped bulky frameworks. It’s built using pure HTML, Vanilla JS, and custom CSS featuring a modern glassmorphism aesthetic.

2. OpenClaw Integration

The real magic happens in app.js. The core functionality handles receiving the raw text string and passing it to the OpenClaw LLM via API.

The LLM is instructed with a specific system prompt to take the unstructured text and output structured JSON. OpenClaw is perfect for this because of its speed and accuracy in reasoning through unstructured text.

Here’s a conceptual look at how we process the data:

// Example conceptual approach
async function parseExpense(rawText) {
  const prompt = `Extract the Amount, Merchant, Category, and Date from this text and return it as JSON: "${rawText}"`;

  const response = await callOpenClawAPI(prompt);
  return JSON.parse(response.content);
}

3. Dynamic Rendering

Once the structured JSON is returned from the OpenClaw model, the dynamic tables on the frontend update immediately.

Try it out!

Want to run it locally?

  1. Clone the repository.
  2. Serve index.html (e.g., using VSCode Live Server).
  3. Connect your local OpenClaw instance in the designated endpoint inside app.js!

Enjoy taking your time back from tedious admin work! Let me know what you think in the comments below.

I Run My AI Content Pipeline on a $20 VPS (Because My $200 PC Crashed)

This is a submission for the OpenClaw Writing Challenge

The PC came from my daughter. She was getting a new one … this was going in the trash. 16GB of RAM, a GPU, and the best part? It sounds like a rocket ship whenever I open Chrome. I’d been a Mac guy forever. The thought of using a Windows machine made my skin crawl. But the PC was free, so I took it. Within the hours of setting it up, I downloaded OpenClaw. Within the first fifteen minutes I saw the black screen of death. Not blue … black. I’d heard of the blue screen. Apparently the new thing is black. Maybe I was right all this time.

Everyone said get a Mac Mini. You for sure want the local model for privacy. So I looked. The M4 Pro starts at $1,399 with 24GB unified memory … enough to run models. My friends bought them, created LLCs, wrote off the hardware as CapEx. Good for them.

I wasn’t ready to drop fourteen hundred bucks.

The thing is … I’m not against spending money on tools. I’m against spending my money.

The $40 Discovery

I was determined to find another solution. AWS is my bread and butter. It’s my go-to. When I need cloud infrastructure, I don’t think about Azure or Google Cloud. I think about AWS.

I went that route first. Asked other people using OpenClaw what specs I’d need, then looked up AWS pricing based on those specs. Came back with the price .. and wanted to throw up.

So … I did what we all do. Asked ChatGPT. Asked Claude. Both acted like they never heard of OpenClaw. It must be their distant cousin they pretend not to know.

So I went to Google. Typed “OpenClaw Cloud Hosting.” Found a YouTube Video by Hostinger. They made it look easy. I didn’t believe it would work based on the specs … but at the end of the day I was like, tag on Ollama cloud and … “I can try it for $40.”

Spun up a KVM 1 instance with 1 vCPU and 4GB RAM, and stumbled into something that’s now running 24/7 on my phone, my laptop, my Slack. The Mac Mini I was supposed to buy sits at $1,399 in my browser history, unwatched.

I guarantee a $20 VPS is not as good as a Mac Mini. The models are obviously not running locally … but it works for me.

The System, Not the Tool

What I have isn’t just “OpenClaw on a VPS.” That’s the headline. The reality is more interesting.

OpenClaw is running on that $20 VPS 24/7. It’s integrated into my daily workflows through Slack and Telegram. I can message it from anywhere … laptop, phone, doesn’t matter. It has access to my content pipeline skills: research, drafting, editing, story management.

The assistant doesn’t write for me. It removes the friction between having a thought and getting it down.

The gap between “I have an idea” and “that idea is captured” has always been the hard part. Not the thinking. Not the editing. The transfer from brain to document.

The gap between “I have an idea” and “that idea is captured” has always been the hard part. Not the thinking. Not the editing. The transfer from brain to document.

OpenClaw bridges that gap. I’m still the strategist. I’m still the editor. I’m still the one with the voice. But I don’t get stuck on blank pages anymore.

The Authenticity Question

Here’s what I see happening with AI and content: people are using it to make up shit. They’re generating posts about experiences they haven’t had, advice they haven’t tested, frameworks they haven’t built. The AI writes it, they publish it, and it sounds … off.

I get why they do it. The content treadmill is brutal. Daily posting is unsustainable without help. But there’s a difference between AI-assisted and AI-generated.

My hundred-story library contains real things that happened to me. OpenClaw helps me structure them, find connections, get unstuck. But the source material is mine. The judgment about what to publish is mine. The voice that lands or doesn’t land is mine.

That’s the part people miss when they outsource the whole thing. AI can amplify what you have. It can’t create what you don’t.

What I Actually Built

For the curious, here’s the stack:

  • Hostinger KVM 1 VPS ($20/month): 1 vCPU, 4GB RAM, 50GB NVMe, 4TB bandwidth, Ubuntu 22.04
  • Ollama Pro ($20/month)
  • OpenClaw: Running 24/7 on the VPS
  • Slack + Telegram integration: Interface from any device
  • Custom skills: Story library, research assistance, drafting support, LinkedIn optimization
  • Ollama: Running models on the VPS for specific tasks

The VPS runs quietly. I SSH in whenever I need to. The rest of the time, OpenClaw is just … there. An endpoint I can hit from anywhere. A consistent presence that knows my patterns.

It’s not fancy. It’s reliable. It’s been running for months without me touching it.

It’s not fancy. It’s reliable. It’s been running for months without me touching it.

What OpenClaw Gets Right

I’ve tried a lot of AI tools. What OpenClaw gets right that others don’t:

It’s not trying to be a chatbot

It’s trying to be an assistant … something with memory, with skills, with integration into your actual life. The skill system means you can teach it what you need, not just prompt it differently.

It lives where you want

Local if you want. VPS if you want. The abstraction is portable. You’re not locked into someone else’s infrastructure.

It’s built for builders

The people who made OpenClaw seem to understand that the value isn’t in the model … it’s in the system around the model. The orchestration. The memory. The integration.

What This Means For You

If you’re reading this and thinking about personal AI, ask yourself what you’re optimizing for.

Privacy? A VPS on a reputable host is private enough for most workflows. Your threat model may differ.

Speed? Local wins on latency, but only if your hardware is good. My daughter’s old PC with Intel graphics was slower than the VPS.

Cost? $40/month vs $1,400 upfront is math you can do.

Control? OpenClaw gives you plenty. It’s open. You own your data. You’re not locked into anyone’s ecosystem.

The point isn’t that my setup is better. The point is that “local AI” became a default answer before people asked what problem they were actually solving.

The Real Win

I was already writing daily. I’d been using Claude for that. But I’m on the Pro plan, and I was running into my weekly limit with the number of things I was asking it to do.

With OpenClaw and Ollama, I’ve never hit my rolling window. The assistant is always there. I can message it without counting tokens or watching a progress bar. It removes the friction between having something to say and getting it said.

My story library has 100+ entries. I add to it weekly. The assistant helps me find patterns, structure arguments, and get unstuck. But the stories are mine. The voice is mine. The judgment about what goes out is mine.

That’s the system. Not the Mac Mini. Not the VPS. The system of having source material, a process, and an assistant that removes friction instead of adding it.

You don’t need $1,400 to get started. You need a clear sense of what you’re trying to solve … and the willingness to build around your constraints instead of someone else’s recommendation.

The Mac Mini I was supposed to buy sits at $1,399 in my browser history, unwatched. I don’t need it. OpenClaw on a $20 VPS + Ollama pro at $20 is the right abstraction for my wiring.

If you’re building something similar with OpenClaw, I’m curious about your setup. Drop a comment or find me on LinkedIn. I’m always interested in how people solve the same problem with different constraints.

Take Control of AI Code Quality in CI: Live Demo

AI is accelerating coding, but without the right checks, it can also introduce risk, inconsistency, and hidden issues into your codebase. Businesses are offering “total automation” and “AI-driven checks” while consumers lose control of code quality and security. 

In this livestream, we’ll show how to take control of AI-generated code by bringing deterministic, repeatable quality checks into your CI pipeline.

You’re invited!

Code quality livestream

Save Your Seat

Join JetBrains experts Kai (Product Specialist, Qodana), Alex (Solutions Engineer, Qodana), and Artem (Solutions Engineer, TeamCity) as they demonstrate how Qodana and TeamCity work together to:

  • Automatically analyze AI-generated code in CI.
  • Enforce consistent quality standards with deterministic inspections.
  • Reduce review bottlenecks and improve developer confidence.
  • Catch issues before they reach production.

We’ll also run a live demo, showing how AI-generated code flows through a CI pipeline and how Qodana applies reliable, repeatable checks to keep your codebase clean and maintainable.

Whether you’re experimenting with AI-assisted development or already using it in production, this session will help you build workflows that are both fast and trustworthy.

Your speakers

Kai Schmithuesen

Kai is an accomplished product specialist with over 15 years of experience in software sales, focusing on developer tools and practices. Originally from Germany, Kai spent over 17 years living and working abroad, working for international software companies before returning to Berlin.

Also from Qodana…

Alex Costa, Solutions Engineer at Qodana

Alex has spent over a decade helping teams implement modern code quality workflows, working closely with clients to provide live demos and building tailored proofs of concepts and custom solutions. Outside of work, he has a merry band of kids and enjoys crafting handmade dice – a creative outlet that reflects his attention to detail and love of building things from scratch.

From TeamCity…

Artem Rokhin, Solutions Engineer at TeamCity

Artem started out at JetBrains as a release manager over a decade ago and is now based in the Netherlands. As a certified JetBrains and TeamCity expert, he helps teams automate their CI/CD pipelines so every code change is built, tested, and validated before reaching production. He works closely with developer advocates and the developer community, putting his master’s degree in technology to good use.

Register Now!