72% of Enterprises Think They Control Their AI. Ask Them What Their Agents Are Spending.

A VentureBeat survey of 40 enterprise organizations published in Q1 2026 found that 72% of enterprises believe they have meaningful control over their AI deployments. They have dashboards. They have policies. They have vendor contracts with safety clauses.

Ask them one question and the illusion breaks: what did your AI agents spend this week?

Silence.

Enterprise AI governance in 2026 has a systematic blind spot. Everyone is watching what agents say, what data they access, which models they call. Nobody is watching what they spend. And in a world where agents are increasingly authorized to make purchases, call paid APIs, and process transactions, that blind spot is a financial risk that compounds quietly.

Shadow AI Became Shadow Spending

Retool’s 2026 Build vs. Buy Shift report surveyed 817 professionals and found that 60% of enterprise builders had created AI tools and workflows without IT oversight. A quarter of them did this frequently.

These tools were connected to production data. They were running automated workflows. They had API keys.

Now consider: many of those same tools are calling external APIs. Some are calling paid APIs. Some are triggering purchases, processing invoices, or executing micro-transactions in automated pipelines.

The governance layer that was supposed to audit these actions? It was never built for the payment surface.

Mass General Brigham, with 90,000 employees, had to build a custom security layer on top of Microsoft Copilot because the platform’s native governance could not account for the real-world workflows running on top of it. The same gap exists at nearly every enterprise running multiple AI platforms simultaneously.

The Three Governance Failures

When we map enterprise AI governance onto payment workflows, three failure modes emerge consistently.

1. Credential Sprawl

An agent that calls OpenAI, Anthropic, a third-party data enrichment API, and a payment processor is using four separate credential chains. Each one has different scope, different expiry, different audit trail. The IT team sees none of it as a single coherent spend profile.

Result: you cannot answer the question ‘what did our AI cost us this month’ with any accuracy.

2. Budget Without Enforcement

Most enterprise AI budget controls exist at the procurement level. A team is allocated $10,000 for AI APIs. But at the agent execution level, there is no real-time enforcement. An agent can exceed the monthly budget in a day of unexpected behavior, and the budget owner finds out three weeks later on the invoice.

Result: cost surprises that feel like infrastructure failures.

3. Audit Trail Gaps

When something goes wrong and an agent made an unauthorized or erroneous payment, reconstructing what happened is extremely difficult. API logs exist in silos across different vendors. The agent’s decision context is separate from the transaction record. Compliance teams cannot establish a clear chain of custody.

Result: regulatory exposure that increases as agent autonomy increases.

What Real-Time Payment Governance Looks Like

The solution is not more dashboards. It is moving payment authorization infrastructure outside the agent layer entirely.

When an agent’s payment credentials are scoped at issuance, the governance problem changes shape. Instead of monitoring what agents are spending after the fact, you define what they are allowed to spend before execution begins.

Here is what that looks like in practice with rosud-pay:

// Issue a scoped payment credential for an agent
const credential = await rosud.credentials.create({
  agentId: "procurement-agent-prod",
  maxAmount: 500,
  dailyLimit: 2000,
  allowedDomains: [
    "api.openai.com",
    "api.anthropic.com",
    "data.clearbit.com"
  ],
  requireApproval: {
    above: 200
  },
  expiresIn: "7d"
});
// Any attempt to pay outside the defined scope is rejected at infrastructure level

The credential itself encodes governance. There is no separate monitoring system to build. The constraint is enforced at the infrastructure level, not the application level.

This matters because of a core security principle: if your agent generates the payment authorization logic, it could also manipulate that logic. Governance must live in a layer the agent cannot modify.

Closing the Audit Trail Gap

Real-time enforcement is one half of the problem. Auditability is the other.

rosud-pay records every payment event with the agent identity, the credential scope, the transaction context, and a timestamp. This means that when compliance asks what happened, you have a structured record that maps AI decisions to financial outcomes.

// Query the spend audit trail for a specific agent
const auditLog = await rosud.payments.history({
  agentId: "procurement-agent-prod",
  from: "2026-04-01",
  to: "2026-04-25",
  format: "structured"
});
// Returns: totalSpend, currency, per-transaction records
// Each record maps: agentDecision -> vendor -> amount -> timestamp
// No manual reconciliation required

This is the governance infrastructure that enterprise AI deployments are missing. Not a policy document. Not a vendor audit. A real-time, scoped, auditable payment layer that operates at the infrastructure level.

The 72% Problem Is Actually a Measurement Problem

The VentureBeat survey did not find that enterprises are reckless. It found that enterprises are measuring the wrong things. They count model calls. They track prompt costs. They monitor data access.

They are not measuring the financial actions their agents are taking autonomously.

As agent capabilities expand and autonomous spending becomes normalized, the governance frameworks that enterprises are building today will have systematic gaps where payment flows are concerned. The organizations that close that gap now will have a significant advantage when regulators begin requiring it.

rosud-pay is the infrastructure layer that makes agent spending visible, constrained, and auditable. You can learn more at https://www.rosud.com/rosud-pay.

Key Takeaways

  • 72% of enterprises believe they control their AI, but few have visibility into what agents are spending
  • Shadow AI created shadow spending: 60% of enterprise builders created tools without IT oversight
  • Real payment governance requires scoped credentials enforced at the infrastructure level
  • Audit trails must map AI decisions to financial outcomes, not just API call logs
  • rosud-pay provides the spending governance layer that enterprise AI deployments are missing

Turning AI Coding Assistants into Engineering Mentors with Modular Skills

I’ve been experimenting with a problem I keep noticing while using AI coding assistants for learning.

Most coding agents are optimized for solving tasks quickly:
prompt → code dump → copy-paste → done.

That works for productivity.

But when learning from GitHub repositories, technical documentation, or complex codebases, this workflow often creates shallow understanding. The developer finishes the task without actually understanding the architecture, debugging process, or reasoning behind the implementation.

So I started building something around that idea.

🔗 https://github.com/yugash007/edu-agent-skills

edu-agent-skills is an open-source modular skill system for AI coding agents like Gemini CLI, Claude Code, Cursor, and others.

Instead of treating coding assistants as static chat interfaces, the project injects specialized behavioral skills into the workflow.

Current skills include:

🧠 Socratic mentoring
🔍 Misconception detection
🏗️ Architecture review
📦 Project critique
🐞 Debugging guidance
📚 Active learning workflows
📈 Weak-area tracking

The main use case right now is learning from:

  • GitHub repositories
  • markdown-based tutorials
  • OSS documentation
  • implementation-heavy engineering resources

For example, instead of immediately generating the final implementation, the agent can:

  • guide debugging step-by-step
  • ask targeted reasoning questions
  • detect flawed mental models
  • force active recall
  • review architectural decisions
  • adapt explanations based on repeated mistakes

The goal is not to replace developers with AI.

The goal is to make AI-assisted engineering workflows more educational, interactive, and reasoning-driven.

🛠️ Installation is simple:

npx edu-agent-skills install

The installer auto-detects supported local agents and configures the skills automatically.

Right now I’m actively exploring:
⚡ Skill composition
⚡ Agent behavior orchestration
⚡ Repository-aware learning workflows
⚡ Adaptive educational feedback
⚡ Modular agent capability injection

One thing I’ve realized while building this:
GitHub repositories are increasingly becoming the “new textbooks” for software engineering.

But reading repositories passively rarely builds deep understanding.

I think AI agents can help bridge that gap — if they are optimized for learning instead of only code generation.

Would genuinely like feedback from developers working with:

  • AI coding agents
  • OSS tooling
  • educational workflows
  • developer productivity systems

Open to contributors, ideas, and criticism.

From Blender Demos to Agent Toolchains: Why Terminal Skills Matter

Most AI + Blender demos still follow the same pattern:

Ask the model for a prompt.
Generate a scene or script.
Hope the result looks close enough.
Try again when it breaks.

That can be useful for experiments.

But it is not how real creative work usually gets done.

Blender is not just an image generator. It is a full production environment with scenes, objects, cameras, lights, materials, modifiers, animation timelines, render settings, exporters, and a Python API.

So the interesting question is not:

Can an AI agent describe a Blender workflow?

The better question is:

Can an AI agent actually operate Blender as part of a repeatable toolchain?

That is where terminal-native skills become interesting.

The gap between “knowing Blender” and using Blender

Modern AI models can explain Blender concepts very well.

They can tell you what a bevel modifier does.
They can describe three-point lighting.
They can write a Python script that creates a simple scene.
They can explain camera focal lengths, materials, render engines, and file exports.

But knowing the tool is not the same as reliably using the tool.

If you ask an agent to create a product render in Blender, a lot can go wrong:

  • the camera may not frame the object
  • the lights may be too weak or too harsh
  • the material names may be inconsistent
  • the render settings may be missing
  • the script may assume the wrong scene state
  • the output file may never be verified
  • the workflow may work once and fail tomorrow

That is the difference between a demo and a production workflow.

A demo can be impressive once.

A workflow needs to be repeatable.

Why Blender is a good test case for AI agents

Blender is creative, but it is also deeply scriptable.

That makes it a useful benchmark for agent workflows.

It is not enough for the agent to say something plausible. At the end, there should be an actual artifact:

  • a .blend file
  • a rendered image
  • an animation preview
  • an exported asset
  • a contact sheet
  • a set of named cameras
  • a reusable scene setup

Either the output exists or it does not.

That makes Blender less forgiving than a text-only task, and that is exactly why it is valuable.

It forces the agent to move from language into execution.

The role of Terminal Skills

Terminal Skills is an open-source catalog of skills for AI agents.

The idea is simple:

Agents do not just need more prompts. They need reusable operational workflows.

A skill can teach an agent how to perform a specific type of work:

  • when to use the workflow
  • what inputs are expected
  • which commands or scripts matter
  • what conventions to follow
  • how to verify the result
  • what failure modes to avoid
  • what output should be returned

That is different from just giving the agent a tool.

A tool gives the agent capability.

A skill gives the agent a path.

For Blender, that path matters a lot.

From GUI work to agent-operable workflows

Blender has a powerful GUI, and artists should absolutely use it.

But a GUI is not always the best interface for an AI agent.

Agents work best when they can:

  • run a command
  • inspect files
  • read logs
  • modify scripts
  • verify outputs
  • repeat the process

That is why terminal-native workflows are such a natural fit.

A terminal workflow gives the agent a clean feedback loop:

intent → command/script → output → verification → next step

Instead of guessing inside a visual interface, the agent can perform concrete operations and check whether they worked.

For example, a Blender skill might help an agent:

  • create a clean scene setup
  • generate camera variants
  • apply consistent material conventions
  • create lighting presets
  • render previews
  • export assets
  • validate that the output file exists
  • return a short summary of what changed

The human still controls taste and direction.

The agent handles the repeatable production layer.

What belongs inside a Blender skill

A useful Blender skill is not just a prompt template.

It should behave more like a small operating manual for the agent.

It should define:

  • what the workflow is for
  • what inputs are required
  • which files the agent may create or modify
  • which commands or scripts should be run
  • what naming conventions to follow
  • what output artifacts must exist
  • how to verify those artifacts
  • what common failure modes to check before reporting success

For example, instead of giving the agent a vague request like this:

Make a Blender product scene.

A skill can define a stronger contract:

Create or update the scene.
Save the .blend file.
Render a preview.
Confirm the preview file exists.
Return the paths and a short summary of what changed.

That contract is the important part.

It gives the agent a definition of done that is stronger than “the response sounds plausible.”

The skill is the interface

A lot of agent tooling conversations focus on connectors.

Can the agent access this app?
Can it call this API?
Can it run this command?
Can it control this environment?

Those questions matter.

But access is not the whole workflow.

If an agent can run Blender from the terminal, that is useful. But the more important layer is the operating pattern around that access:

  • what should the agent do first?
  • what should it avoid touching?
  • how should files be named?
  • when should it render a preview?
  • what should it check before saying done?
  • what should it hand back to the human?

That is why I like thinking about skills as interfaces for work.

They make the task boundary explicit.

The agent is not just dropped into a powerful tool and told to figure it out.

It gets a workflow it can execute, inspect, and repeat.

A better definition of done

For many AI tasks, “done” is too fuzzy.

The model stops writing, so the interaction feels complete.

But production work needs a stronger definition.

For a Blender workflow, “done” might mean:

  • the .blend file was saved
  • the preview render exists
  • the output path was returned
  • the scene contains named cameras and lights
  • the agent reports what changed
  • the human has something concrete to review

This is where terminal-native skills become especially useful.

They can push the agent toward evidence-based completion.

Not just:

Here is a script you could run.

But:

I ran the workflow, created these artifacts, checked these outputs, and here is what needs review.

That difference is small in a demo.

It is huge in real work.

Why this matters for reproducibility

One-off AI outputs can be impressive, but they are hard to build on.

If the process is hidden inside a long prompt and a lucky generation, it is difficult to answer basic questions:

  • Can we run this again next week?
  • Can another agent follow the same process?
  • Can we change one input and keep the rest consistent?
  • Can we debug why the output failed?
  • Can we tell which step created which artifact?

Terminal-native workflows are not glamorous, but they help with all of that.

Commands can be rerun.
Files can be inspected.
Logs can be read.
Outputs can be checked.
Conventions can be documented.

A skill wraps those pieces into something the agent can reuse.

That is the real value.

Not magic.

Repeatability.

Why this matters beyond Blender

Blender is just one example.

The same pattern applies to many agent workflows:

  • video processing
  • data cleanup
  • documentation updates
  • screenshot generation
  • test automation
  • asset exports
  • report generation
  • deployment checks

In each case, the problem is not only whether the model understands the task.

The problem is whether the agent has a reliable way to perform the task.

That usually requires more than a prompt.

It requires operational knowledge:

  • steps
  • defaults
  • constraints
  • checks
  • outputs
  • failure handling

That is what skills are good at packaging.

Skills make agent work more auditable

One underrated benefit of terminal-native skills is auditability.

When an agent uses a repeatable workflow, it can leave evidence:

  • which files were created
  • which commands were run
  • which checks passed
  • where the output was saved
  • what still needs human review

That makes agent work easier to trust.

Not because the agent becomes magically perfect.

Because the workflow becomes visible.

For creative work, that matters.

A human should not have to guess whether the agent actually rendered the scene, exported the asset, or just stopped after writing a script.

The output should be inspectable.

The practical takeaway

If you are building with AI agents, do not only ask:

What tools can my agent access?

Ask:

What repeatable workflows can my agent follow?

Blender makes this obvious because the final result is concrete.

A good agent workflow should not end with “here is some code you could run.”

It should end with an artifact, a check, and a clear next step.

That is the shift Terminal Skills is designed around:

  • less one-off prompting
  • more reusable workflows
  • less hidden improvisation
  • more executable, verifiable work

Agents do not need to become artists.

But they can become much better production assistants.

And for tools like Blender, that is already a very useful place to start.

The bigger point

Blender is useful here because it makes the gap visible.

If the render file does not exist, the workflow failed.
If the camera misses the object, the workflow failed.
If the agent cannot explain what changed, the workflow is hard to trust.

That same lesson applies to other agent work too.

Terminal Skills is about turning repeatable work into reusable operational knowledge: not just what the agent should know, but how it should act, check itself, and report the result.

If you want to explore the catalog, Terminal Skills is open-source and available at terminalskills.io.

AI assistance was used while drafting this article. The final structure, edits, and publishing decisions are human-reviewed.

Effortless Dart Coding with dart_extensions_pro


Introduction
Introducing dart_extensions_pro a Dart package that offers a collection of handy extensions and helper functions designed to enhance the development process. By simplifying common tasks and providing streamlined solutions, it allows developers to write code more efficiently and focus on building features rather than repetitive tasks. Ideal for improving productivity, this package is a valuable tool for both novice and experienced programmers.

Key Features
📊 Comparison: Simplify comparison operations with intuitive extension methods.

📅** Date Handling:** Effortlessly manage date and time with a variety of helpful functions.

✍️ String Utilities: Enhance string manipulation with powerful utility functions.

📋 List Enhancements: Improve list handling with convenient extensions for common operations.

🧭 Navigation: Streamline navigation tasks with specialized navigation functions.

👆 Tap Gestures: Easily handle tap gestures to improve user interaction.

🔁 Iterable Enhancements: Optimize iterable processing with enhanced methods.

🎨 Color Conversion: Simplify color manipulations and conversions with dedicated functions.

🔢 Number Utilities: Access a range of number-related utilities for calculations and formatting.


🛠️ Utility Functions: Utilize various handy utility functions to simplify your coding experience.

Installation
Add dependency to your pubspec.yaml file & run Pub get

dependencies:
  dart_extensions_pro: ^0.0.1

And import package into your class file

import 'package:dart_extensions_pro/dart_extensions_pro.dart';
**Analytics**

Visit EXTENSIONS.md for a complete list of all the available extensions.

Extensions:                    271
Helper Classes:                7
Helper Functions & Getters:    21
Typedefs:                      7
Mixins:                        2

Here’s a quick preview of dart_extensions_pro,
String extension

'hello'.iscapitalize(); // Capitalizes first letter // Hello
'Copy this text'.copyTo(); // Copies string to clipboard
'test@example.com'.isValidEmail(); // Checks if valid email // true
'flutter'.reverse(); // Reverses string // rettulf
'madam'.isPalindrome(); // Checks for palindrome // true
'flutter example'.toCamelCase(); // Converts to camel case // FlutterExample
'{"name": "Flutter"}'.decodeJson(); // Parses JSON string to map // {name: Flutter}

Comparison extension


5.gt(3);  // true, checks if 5 is greater than 3
3.lt(5);  // true, checks if 3 is less than 5
5.eq(5);  // true, checks if 5 is equal to 5
3.lte(3); // true, checks if 3 is less than or equal to 3
5.gte(3); // true, checks if 5 is greater than or equal to 3
5.ne(3);  // true, checks if 5 is not equal to 3

Date extension

DateTime.now().isSameDate(DateTime(2023, 9, 14));  // true, checks if today matches the provided date
DateTime.now().isToday();  // true, checks if today is today
DateTime.now().isTomorrow();  // true, checks if today is tomorrow (unlikely)
DateTime.now().wasYesterday();  // true, checks if today is yesterday (false)
DateTime.now().addDays(5);  // adds 5 days to the current date
DateTime.now().addMonths(3);  // adds 3 months to the current date
DateTime.now().addYears(2);  // adds 2 years to the current date
DateTime.now().subtractDays(7);  // subtracts 7 days from the current date
DateTime.now().subtractMonths(1);  // subtracts 1 month from the current date
DateTime.now().subtractYears(1);  // subtracts 1 year from the current date

List extension

final list = [1, 2, 3] << 4;  // [1, 2, 3, 4], appends 4 to the list using the `<<` operator
list.replaceFirstWhere(10, (item) => item == 2);  // true, replaces the first occurrence of 2 with 10
list.replaceLastWhere(20, (item) => item > 1);  // true, replaces the last item greater than 1 with 20

Navigation extension

context.to(MyPage());  // Navigates to `MyPage` using `to()`
context.toNamed('/home');  // Navigates to the named route '/home' using `toNamed()`
context.back();  // Pops the current route using `back()`
context.backUntil((route) => route.isFirst);  // Pops routes until the first one using `backUntil()`
context.toWithReplace(AnotherPage());  // Replaces current route with `AnotherPage` using `toWithReplace()`
context.replaceWithNamed('/dashboard');  // Replaces the current route with named route '/dashboard' using `replaceWithNamed()`
context.toAndRemoveAll(HomePage(), (route) => false);  // Navigates to `HomePage` and removes all previous routes using `toAndRemoveAll()`
context.toNamedAndRemoveAll('/login', (route) => false);  // Navigates to named route '/login' and removes all previous routes using `toNamedAndRemoveAll()`

Gesture extension

widget.onInkTap(() => 'Tapped!'.logMsg());  // Adds an ink splash effect with `onInkTap()`
widget.onTap(() => 'Tapped!'.logMsg());  // Adds a basic tap gesture with `onTap()`
widget.onDoubleTap(() => 'Double Tapped!'.logMsg());  // Adds a double-tap gesture with `onDoubleTap()`
widget.onTapCancel(() => 'Tap Cancelled!'.logMsg());  // Adds a tap cancel gesture with `onTapCancel()`
widget.onLongPress(() => 'Long Pressed!'.logMsg());  // Adds a long press gesture with `onLongPress()`
widget.onTapDown((details) => 'Tap Down!'.logMsg());  // Adds a tap down gesture with `onTapDown()`
widget.onScale(
  onScaleStart: (details) => 'Scale Started!'.logMsg(),
  onScaleUpdate: (details) => 'Scaling!'.logMsg(),
  onScaleEnd: (details) => 'Scale Ended!'.logMsg(),
);  // Adds a scale gesture with `onScale()`

Iterable extension

iterable.lastElementIndex;  // Returns the index of the last element or -1 if empty.
iterable.hasSingleElement;  // Checks if the iterable has exactly one element.
iterable.addAllMatchingTo(targetList, (e) => e.isEven);  // Adds elements matching the predicate to the target list.
iterable.whereFilter((e) => e.isEven);  // Filters elements matching the predicate.
iterable.whereFilterIndexed((index, e) => index % 2 == 0);  // Filters elements with their index.
iterable.mapTransform((e) => e.toString());  // Transforms each element and maps to a new iterable.
iterable.skipElements(2);  // Skips the first 2 elements.
iterable.takeLastElements(2);  // Takes the last 2 elements.
iterable.skipWhileElements((e) => e < 5);  // Skips elements while the predicate is true.
iterable.skipLastElements(2);  // Skips the last 2 elements.

Color conversion

String.toColor();  // Converts a hex color string to a Color object, assuming full opacity.
HexColor.getColorFromHex(hexColor);  // Converts a hex color string to an integer color value, adding alpha if missing.
HexColor(hexColor);  // Creates a HexColor instance from a hex color string.

Number conversion


num.negative;  // Converts positive numbers to their negative counterparts.
num.isBetween(value1, value2, {inclusive = false});  // Checks if [this] is between [value1] and [value2], inclusive if [inclusive] is true.
num.roundToDecimals(decimalPlaces);  // Rounds the number to [decimalPlaces] decimal places.
double.asRadians;  // Converts degrees to radians.
double.asDegrees;  // Converts radians to degrees.
T.maxim(upperBound, {exclusive = false});  // Limits the value to [upperBound], exclusive if [exclusive] is true.
T.minm(lowerBound, {exclusive = false});  // Ensures the value is not less than [lowerBound], exclusive if [exclusive] is true.
T.clampAtMin(lowerBound);  // Ensures the value is not below [lowerBound].
T.clampAtMax(upperBound);  // Ensures the value does not exceed [upperBound].
num.orZero;  // Returns this value or 0 if null.
num.orOne;  // Returns this value or 1 if null.
num.or(value);  // Returns this value or [value] if null.

Utility conversion

double.isWhole;  // Checks if the value is a whole number.
double.roundToPrecision(nthPosition);  // Rounds the value to [precision] decimal places.
bool.isCloseTo(other, {precision = 1.0e-8});  // Checks if the value is close to [other] within [precision].
double.randomDouble({max});  // Generates a random double between 0.0 (inclusive) and 1.0 (exclusive).
int Duration.inYears;  // Returns the number of whole years spanned by this [Duration].
bool Duration.isInYears;  // Returns `true` if the [Duration] is equal to or longer than one year.
int Duration.absoluteSeconds;  // Returns the number of seconds remaining after accounting for whole minutes.
void Map<K, V>.operator <<(MapEntry<K, V> entry);  // Inserts a [MapEntry] into the map using the `<<` operator.
String Map<K, V>.toJson();  // Converts the map into a JSON string.

For more information, check out the below link

dart_extensions_pro | Flutter package

A Dart package that provides handy extensions and helper functions, designed to simplify and speed up development, making coding more efficient and streamlined.

favicon
pub.dev


Thanks for reading! If this post was helpful, feel free to share it and follow for more Flutter tips and tutorials. Keep coding and stay awesome!

Spring AI Explained — ChatClient, RAG, Advisors, and Every Core Component

Most Spring AI tutorials jump straight to code. You copy the dependency, paste the config, call ChatClient, and something works. But when you need to actually build something — a chatbot that remembers conversations, an API that answers questions from your own documents — you hit a wall. Because you don’t know what’s actually doing what. Friend’s Link

What Spring AI actually is — in one sentence

Spring AI is an abstraction layer that lets you wire LLMs into your Spring Boot app without hardcoding any particular AI provider.

That last part matters. OpenAI, Google Gemini, Anthropic Claude, and Ollama are running locally on your machine — Spring AI talks to all of them through the same API. Swap providers without touching your business logic. That’s the entire value proposition, and everything else is built on top of it.

Spring AI Components

ChatClient — the front door
ChatClient is the component you’ll interact with the most. It’s the fluent API that sits at the top of the stack and handles the actual request-response cycle with the LLM.

Think of it like a RestTemplate or a WebClient— but instead of calling a REST endpoint, you’re sending a prompt and getting a response back. It handles all the low-level connection details, request formatting, and response parsing so you don’t have to.

What makes ChatClient genuinely well-designed is its fluent builder style. You don’t configure it once globally and hope for the best. Each call is composable — you can set the system prompt, attach advisors, pass user input, and control the output format all in one readable chain.

It also separates two things that often get conflated: the default configuration you set at startup (your system prompt, default advisors, model parameters) and the per-request configuration you apply at call time. That separation matters in production, where different endpoints need different behaviours from the same underlying client.

PromptTemplate — how you talk to the LLM properly

A raw string shoved into an LLM is not a prompt. A prompt is a structured piece of text with placeholders, context, and instructions — and this PromptTemplate is how Spring AI handles that.

The idea is simple: you define a template with variables, and at runtime, you fill those variables in. Instead of building prompt strings with Java string concatenation — which gets messy fast — you define the shape of the prompt separately from the data that goes into it.

This matters for three reasons. First, it keeps prompts readable and maintainable. Second, it separates the “what to ask” from the “what data to inject” which is the same separation concerns you apply everywhere else in your codebase. Third, it makes prompt versioning possible. When your prompt needs tweaking, you’re editing a template, not hunting through business logic.

PromptTemplate also gives you a proper Prompt object that carries both the human message and the system message. That distinction — system prompt (the instructions) vs user prompt (the question) — is one of the most important things to understand when working with LLMs, and Spring AI models it explicitly.

EmbeddingModel — the piece that makes search smart

An EmbeddingModel takes text and converts it into a vector — a list of floating point numbers that represents the meaning of that text in multi-dimensional space.

That sounds abstract. Here’s the concrete thing to grasp: two pieces of text that mean similar things will produce vectors that are close to each other mathematically. “What’s your return policy?” and “How do I get a refund?” are different strings, but their vectors will be very close — because semantically, they’re the same question.

This is what makes semantic search possible. Traditional search matches keywords. Embedding-based search matches meaning. A user asking “how do I cancel my order” will find a document titled “Order cancellation policy” even if the words don’t overlap, because the meanings are geometrically close in vector space.

In Spring AI, EmbeddingModel is the interface that abstracts over whatever embedding service you’re using — OpenAI’s text-embedding-ada-002, Gemini’s embedding API, or a local model via Ollama. The abstraction is consistent regardless of provider, which means your RAG pipeline doesn’t break if you switch models.

VectorStore — where embeddings live

VectorStore is the database for embeddings. You put vectors in, and you query them by similarity — “give me the top 5 stored vectors that are closest to this query vector.”

It’s worth understanding that this is a fundamentally different kind of database from what you’re used to. You don’t query it with SQL. You don’t look things up by ID. You ask: which stored content is most semantically similar to this input? And it returns the matches ranked by similarity score.

Spring AI’s VectorStore interface abstracts over the actual storage engine underneath. In development, you might use SimpleVectorStore an in-memory implementation. In production, you’d swap to Pinecone, Weaviate, pgvector on top of Postgres, or Elasticsearch. The interface stays identical. Your code doesn’t change.

The VectorStore is also responsible for handling the metadata that travels alongside each vector — the document title, page number, source URL, whatever you stored at ingestion time. When it returns matching chunks, that metadata comes with it, so your prompt builder knows where the information came from.

Advisors — the middleware nobody talks about enough

This is the component most tutorials skip, and it’s arguably the most powerful part of the whole framework.

An Advisor in Spring AI is a piece of middleware that wraps around every ChatClient request. Before the request goes to the LLM, advisors can intercept it and modify it — add context, inject memory, apply safety rules, log the conversation, filter the input. After the response comes back, they can post-process it too.

The important thing to understand is that advisors form a chain. Each one wraps the next, like servlet filters in a web application. You configure which advisors run in which order, and each one has a defined responsibility.

QuestionAnswerAdvisor is the one you’ll use for RAG. Before your question reaches the LLM, this advisor takes that question, queries VectorStore for the most relevant chunks, and injects them into the prompt automatically. From ChatClient’s perspective, you just asked a question. Internally, your question has been enriched with your own data before the LLM ever sees it.

MessageChatMemoryAdvisor is what makes conversations persistent. Without it, every call to ChatClient starts fresh — no memory of what was said before. With it, previous turns from ChatMemory are injected into each new request so the LLM has context.

You can write your own advisors too. Any cross-cutting concern that applies to every LLM call — rate limiting, PII detection, response caching, A/B testing between prompts — belongs in an advisor, not in your business logic.

ChatMemory — giving the LLM a memory

LLMs are stateless. Every API call is completely independent. Ask an LLM “what’s the capital of France,” then ask “what did I just ask you,” and it has no idea — because, from its perspective, that second request is the first thing you’ve ever said.

ChatMemory is how Spring AI solves this. It’s a storage abstraction for conversation history. After each exchange, the message — both the user’s question and the LLM’s response — gets saved. On the next request, that history gets loaded and injected into the prompt so the LLM has context.

InMemoryChatMemory is the default — history lives in your application’s heap and disappears on restart. That’s fine for development and short stateless sessions. For production chatbots that need to remember users across sessions, you’d implement a persistent ChatMemory backed by Redis or a database.

There’s a real constraint here worth knowing upfront: every message you inject into the conversation history costs tokens. LLMs have a context window limit — usually somewhere between 8K and 128K tokens, depending on the model. If a conversation goes on long enough, the accumulated history will either exceed the limit and fail, or you’ll need to implement a summarisation strategy to compress older messages.

This is not a Spring AI problem — it’s a fundamental LLM constraint. But ChatMemory is where you manage it.

RAG Flow

How RAG brings it all together
RAG — Retrieval-Augmented Generation — is the pattern that makes Spring AI genuinely production-useful. The diagram above shows both phases. Here’s the thinking behind it.

The core problem: your LLM knows nothing about your company. It doesn’t know your product documentation, your internal policies, your customer data. Fine-tuning a model on your data is expensive, slow, and goes stale every time the data changes.

RAG is the pragmatic answer. Instead of teaching the model your data, you just hand it the relevant pages at the moment it needs them. Like giving a contractor a specific clause from the contract rather than asking them to memorise the whole thing.

The ingestion phase runs once, or whenever your data changes. Your documents are loaded, split into manageable chunks, embedded into vectors, and stored in a VectorStore. This is how your data gets indexed for semantic retrieval.

The query phase runs on every request. The user’s question is embedded into a vector. That vector is used to query the VectorStore for the closest matching chunks. Those chunks — plus the original question — get injected into the prompt. The LLM reads them as context and answers based on what it finds there.

The LLM never “learned” your data. It reads it fresh on each request, like an open-book exam. That framing matters because it sets the right expectations: if the relevant information isn’t in the retrieved chunks, the model will still try to answer — and that’s when hallucinations happen. RAG reduces hallucinations by providing grounding. It doesn’t eliminate them.

The part that controls retrieval quality isn’t the LLM and isn’t the vector database — it’s the chunking strategy. How you split your documents determines what gets retrieved. A chunk that’s too large buries the relevant detail in noise. A chunk too small loses the surrounding context that makes it meaningful. Getting chunking right is usually where the real tuning work happens.

The one-line mental model for each component

ChatClient — you talk to the LLM through this. PromptTemplate — You structure what you say. EmbeddingModel — converts meaning into math. VectorStore — stores and searches that math. Advisors — middleware that enriches every request automatically. ChatMemory — gives the conversation a past. Together, they’re the full stack for building LLM features that actually behave like software — predictable, configurable, and debuggable.

Building a UI for a Team Task Manager (Laravel + Tailwind)

“I’ve been working on a project, a simple team task management system built with Laravel. After finishing the backend features like teams, invitations, and authentication, I finally shifted my focus to something I was avoiding for quite a while now: the UI. I also added a name for this project instead of just calling it a team task manager. The website name is Worklio.

Here’s the home page of the website:
Welcome page

I’m keeping the UI as simple as possible because my main focus is learning the backend side. I also use ChatGPT to help generate some of the frontend code, and the results have been surprisingly good. Right now, I just need to break parts of the UI into reusable components to make the codebase easier to read and maintain, while also centralizing repeated logic and code.

Here’s the rest of the pages:

Team Page

Team page

Show Single Team Page with Projects

Project Page

What’s next

I currently have both frontend and backend for teams, invitations, projects, and now I’ll be working on the task part.

Note: I’m using ChatGPT as part of my learning process—to sanity-check ideas, understand Laravel patterns, and think through tradeoffs. I still write the code myself and use the project to test whether I actually understand what I’m building.

Hermes Agent Remembers You

For the past two years, the AI industry has obsessed over model intelligence.
Bigger context windows.
Smarter benchmarks.
More parameters.
Faster inference.
But most AI assistants still suffer from the same fatal flaw:
They forget everything.
Every session starts from zero.
Every workflow requires re-explaining context.
Every “AI agent” often behaves like a temporary script wearing a chatbot costume.
Then Hermes Agent arrived.
Built by Nous Research, Hermes Agent is not trying to be another copilot or another flashy autonomous demo. It is attempting something much more ambitious:
An AI system that evolves through use.
And that changes the conversation entirely.
What Is Hermes Agent?
Hermes Agent is an open-source autonomous AI agent framework designed around one central idea:
Persistence.
Not just persistent memory.
Persistent skills.
Persistent workflows.
Persistent identity.
Unlike traditional chat-based assistants, Hermes runs as a long-lived system that can continuously operate across platforms, tools, terminals, APIs, and messaging apps.
The official tagline says it best:
“The agent that grows with you.”
That sounds like marketing copy at first.
Until you understand how Hermes actually works.
The Core Breakthrough: AI That Learns Operationally
Most AI systems today are stateless.
Even when they simulate memory, the “memory” is usually just:
conversation history,
vector retrieval,
or manually injected context.
Hermes goes further.
After solving tasks, Hermes creates reusable “skills” from successful execution traces. Those skills become searchable operational knowledge the agent can reuse later.
This is the real innovation.
Hermes does not merely answer.
It accumulates experience.
That distinction matters more than most people realize.
Why Hermes Agent Feels Different
The easiest way to understand Hermes is this:
Chatbots respond.
Copilots assist.
Hermes persists.
That persistence creates entirely new behavior patterns.
A normal AI assistant:
solves a task,
forgets it,
and starts over next time.
Hermes:
solves a task,
stores successful workflows,
refines them,
and reuses them later.
Over time, your agent slowly becomes specialized around:
your workflows,
your preferences,
your infrastructure,
and your recurring problems.
That is much closer to hiring a junior operator than opening a chatbot.
The Three-File Architecture That Makes Hermes Unique
One of the most fascinating design decisions inside Hermes is its identity system.
According to community documentation and framework breakdowns, Hermes organizes persistent behavior into three evolving files:
SOUL.md → personality, principles, behavioral constants
MEMORY.md → accumulated factual knowledge
USER.md → evolving understanding of the user
This is incredibly important conceptually.
Most AI systems merge everything into one giant context blob.
Hermes separates:
identity,
memory,
and user modeling.
That separation mirrors how humans actually operate.
You are not the same as your memories.
And your memories are not the same as your understanding of another person.
Hermes encodes that distinction directly into the architecture.
That is not just clever engineering.
It is a glimpse into where agent design is heading.
Hermes vs Traditional Agent Frameworks
The current AI agent ecosystem is crowded:
LangChain
AutoGen
OpenClaw
CrewAI
OpenAI Agents SDK
countless orchestration layers
Most frameworks optimize for:
tool calling,
chaining,
orchestration,
or multi-agent coordination.
Hermes optimizes for continuity.
That is a fundamentally different design philosophy.
Framework Type Main Focus
LangChain Orchestration
AutoGen Multi-agent collaboration
OpenAI Agents API-level workflows
OpenClaw Autonomous execution
Hermes Agent Persistent self-improving operation
Hermes is less interested in “agent demos.”
It is trying to become infrastructure.
The Most Underrated Feature: Multi-Platform Presence
Hermes can operate across:
Telegram,
Discord,
Slack,
WhatsApp,
Signal,
email,
terminal interfaces,
IDE integrations,
and more.
At first glance, this sounds like a convenience feature.
It is not.
This transforms Hermes from a tool into an ambient computing layer.
Imagine:
asking your agent something from Telegram,
continuing the task in VS Code,
receiving summaries through Slack,
and letting background automations continue overnight.
The agent persists independently from the interface.
That architecture feels much closer to operating systems than applications.
Local-First AI Finally Becomes Real
One reason Hermes exploded in popularity is because it aligns perfectly with a growing movement in AI:
AI sovereignty.
Developers increasingly want:
local models,
self-hosted infrastructure,
private memory,
ownership of workflows,
and freedom from API lock-in.
Hermes supports multiple providers and local inference backends, including OpenAI-compatible APIs, Hugging Face integrations, Anthropic, Google, OpenRouter, and local stacks like LM Studio.
It can run:
on a laptop,
on a cheap VPS,
or on GPU infrastructure.
That flexibility matters.
For years, powerful AI systems required centralized cloud dependency.
Hermes suggests another future:
personal AI infrastructure.
The Real Shift: From Prompt Engineering to Agent Evolution
Prompt engineering dominated the first wave of generative AI.
But Hermes points toward something bigger:
Experience engineering.
The value is no longer just crafting prompts.
The value becomes:
shaping long-term agent behavior,
building reusable operational knowledge,
and evolving persistent systems over time.
This is a massive conceptual shift.
Instead of:
“How do I prompt the model?”
The question becomes:
“How do I train my operational agent ecosystem through use?”
That is a much more interesting future.
The Biggest Weaknesses of Hermes Agent
Hermes is exciting.
But it is not magic.
There are still major limitations.

  1. Complexity
    Hermes is not beginner-friendly.
    Running persistent self-hosted agents requires:
    infrastructure knowledge,
    API management,
    model selection,
    memory management,
    and operational discipline.
    This is still very much a builder’s tool.
  2. Long-Running Drift
    Persistent agents introduce a new category of problems:
    memory pollution,
    behavioral drift,
    recursive errors,
    and degraded context quality over time.
    An agent that remembers incorrectly can become dangerous faster than one that forgets.
  3. Autonomous Reliability Is Still Hard
    Even advanced agents still struggle with:
    long task chains,
    edge cases,
    hallucinated tool use,
    and execution reliability.
    Hermes improves the structure around the model.
    It does not magically solve reasoning limitations.
    Why Developers Are Paying Attention
    Hermes Agent grew extraordinarily fast because it landed at the exact right moment.
    The industry is moving from:
    isolated prompts
    toward:
    persistent autonomous systems.
    From:
    AI chat
    toward:
    AI operations.
    From:
    asking questions
    toward:
    delegating workflows.
    Hermes is one of the clearest early examples of what that transition looks like in practice.
    My Take: Hermes Agent Is More Important Than Most People Realize
    The biggest idea behind Hermes is not tool use.
    It is not automation.
    It is not memory.
    The biggest idea is this:
    AI systems are starting to accumulate operational experience.
    That changes everything.
    Because once agents can:
    remember,
    refine,
    specialize,
    and evolve through execution,
    they stop behaving like software in the traditional sense.
    They begin behaving more like digital coworkers.
    We are still early.
    The systems are imperfect.
    The reliability problems are real.
    But Hermes Agent feels like one of the first open-source projects pointing clearly toward the next era of AI:
    Not isolated intelligence.
    Persistent intelligence.

Introducing Agent Note: saving the why behind AI-assisted code in Git

Hi, I’m wasabeef.

I have been using coding agents such as Claude Code, Codex CLI, Cursor, and Gemini CLI regularly in daily development.

They no longer feel like experiments. They can already produce reviewable Pull Requests. But while reviewing AI-assisted changes, I kept running into the same problem.

A diff tells you what changed. It does not tell you why it changed.

That is already a problem with human-written commits when the commit message is weak. With AI-assisted commits, the missing context is even larger: the prompt, the response, the discussion that led to the implementation, the agent that touched each file, and the reason a particular path was chosen.

That is why I built Agent Note.

Agent Note — AI conversations saved to Git

This article focuses less on the exact usage and more on why this kind of record is needed, and how Agent Note keeps that context in Git.

What is missing in AI-era code review

AI coding agents have become common in everyday development.

They write code quickly. They add tests. They update documentation. They can even open Pull Requests.

But review exposes a different problem.

The final diff does not show the background of the implementation.

  • What request started the change?
  • What assumptions did the AI make?
  • Did the direction change halfway through?
  • Is this a generated bundle, or source code someone intentionally edited?
  • Which commits were mostly AI-assisted, and which were mostly human follow-up?

In human-to-human development, commit messages, Pull Request descriptions, and review comments have carried that context.

In AI-assisted development, prompts and responses also belong in the review context. Without them, reviewers lose the trail before review even starts.

Until now, the conversation with the AI often stayed inside the agent UI or a local transcript. Once the session ended, the team usually received only the commit and the Pull Request.

The reason behind the change disappears.

AI review tools need context too

I also use AI review tools such as Copilot, CodeRabbit, Devin, and Greptile.

Their main inputs are usually the diff and the repository code.

That means AI can review AI-written code without seeing the prompt or intent that produced it.

When that happens, the review tends to stay near the surface of the diff.

To judge whether an implementation matches the intended change, a reviewer needs more than the final code. The reviewer needs to know what the author asked for, what the agent understood, and which parts of the repository were supposed to change.

Agent Note keeps that context in the Pull Request in a form AI review tools can read.

It renders a human-readable summary in the Pull Request body, and also embeds an agentnote-reviewer-context hidden comment. It is invisible in the rendered PR body, but AI review tools that read the raw Pull Request description can use it to understand changed areas, review focus, and author intent.

The reviewer gets more than the diff.

Today

git diff
Pull Request description

Prompt?       missing
Response?     missing
Why this way? reviewers have to infer it

With Agent Note

git diff
Pull Request description
refs/notes/agentnote
Dashboard

Prompt / Response / Context / AI Ratio stay connected to the commit

What gets recorded

Agent Note saves the AI conversation and changed files for each commit.

Think of it as git log with the AI conversation behind the change attached to it.

It records four kinds of information.

Data What it helps you see
Prompt / Response What was requested and how the AI answered
Files Which files the agent touched
AI Ratio A practical estimate of how much of the commit involved AI
Context Extra context when the prompt alone is too short

For example, a prompt like yes, implement it does not carry enough meaning when it appears alone in a Pull Request.

Agent Note does not try to inflate that prompt. Instead, when the surrounding commit evidence helps, it can attach a short Context note.

Context shown in the Agent Note Dashboard

The point is not to say “this code is correct because AI wrote it” or “this code is risky because AI wrote it.”

The point is to give reviewers better evidence.

How it works

Agent Note is not a hosted service.

It adds a thin recording layer next to the normal Git workflow.

You prompt your coding agent
        │
        ▼
Agent hooks save the conversation and session info
        │
        ▼
The agent edits files
        │
        ▼
Hooks or local transcripts record changed files
        │
        ▼
You run `git commit`
        │
        ▼
A Git hook links the session to the commit
        │
        ▼
Agent Note writes a Git note for that commit
        │
        ▼
Agent Note's pre-push hook shares `refs/notes/agentnote`

Temporary session data lives under .git/agentnote/.

The permanent record lives in refs/notes/agentnote.

Agent Note does not modify the commit diff. It adds only a short session trailer to the commit message and stores the detailed record in Git notes. When you need the AI context behind a commit, you read the Git note.

Why Git notes

The design constraint I cared about most was avoiding unnecessary workflow changes.

I did not want to replace git commit, and I did not want the core record to depend on a hosted service.

The context behind AI-assisted code should be a team asset, just like the commit itself. Keeping that context in Git felt natural.

Git notes let Agent Note attach structured data to a commit without changing the regular commit history.

That balance felt right.

  • Use normal git log and Pull Requests most of the time
  • Read Agent Note data only when you need the deeper context
  • Share it with the team through refs/notes/agentnote
  • Avoid requiring a hosted service

The design keeps AI development context close to Git instead of sending it somewhere else.

What Agent Note does not do

Agent Note is not a tool for proving that AI-written code is correct.

AI Ratio is not an automatic judgment of responsibility or quality. It is a practical signal for understanding how much AI involvement a commit appears to have.

Agent Note also does not claim perfect line-to-prompt attribution today. agent-note why is a shortcut from a line, to the blamed commit, to the prompts, responses, and context attached to that commit.

The goal is not to replace review. The goal is to keep the context reviewers need from disappearing.

How it fits with Spec-Driven Development

Spec-Driven Development makes the intent explicit before implementation.

That works well with AI coding agents. If the input is vague, the agent may still produce code quickly, but reviewers later have to guess why the implementation took that shape.

A spec alone does not preserve the implementation conversation. It does not show how the agent interpreted the task, what changed during the session, or which prompts ended up in each commit.

If the spec is the intent before implementation, Agent Note is the execution record after implementation.

Together, they let reviewers compare the implementation against the spec, and also inspect the AI conversation that produced the commit.

How it relates to Entire

Agent Note is not the only project working on this problem.

Entire also connects the context behind AI-assisted code changes to Git. Entire records prompts, transcripts, tool calls, changed files, and other session data as Checkpoints linked to commits. It is a broader system for agent development history, including rewind, resume, search, and a web UI.

Agent Note is intentionally narrower.

It focuses on commits and Pull Request review. The persistent record lives in Git notes under refs/notes/agentnote, and the main surfaces are the PR Report, Dashboard, hidden reviewer context for AI review tools, and agent-note why.

I do not see this as a matter of which approach is correct. The scope is different.

If you want full session Checkpoints, rewind, resume, and repository-wide search, a system like Entire makes sense. If you mainly want lightweight commit-level review context in Pull Requests, Agent Note is designed for that narrower workflow.

PR Report and Dashboard

In Pull Requests, Agent Note renders a human-readable summary.

## 🧑💬🤖 Agent Note

**Total AI Ratio:** ██████░░ 73%
**Model:** `claude-sonnet-4-20250514`

| Commit | AI Ratio | Prompts | Files |
|---|---|---|---|
| ce941f7 feat: add auth | ████░ 73% | 2 | auth.ts, token.ts |

Open Dashboard ↗

The PR Report is the entry point for review.

The Dashboard is for deeper reading.

In the Dashboard, you can inspect Prompt / Response, changed files, AI Ratio, and diffs by PR and by commit.

Agent Note Dashboard preview

The report answers “what should I look at first?” The Dashboard answers “what happened in this commit?”

The idea behind agent-note why

Agent Note also includes agent-note why.

It starts from a target line, uses git blame to find the commit, then reads the Agent Note attached to that commit.

npx agent-note why README.md:111

It does not claim exact line-to-prompt attribution yet.

But even without a new schema, connecting an individual line to the commit conversation is useful. It shortens the path from “why is this line here?” to “what did we ask the agent to do in that commit?”

Eventually, I want to get closer to line-level explanations. The MVP is intentionally smaller: connect existing Git blame data with existing Git note data and make the available context easy to reach.

Different agents expose different context

Agent Note supports multiple coding agents, but each agent exposes a different level of detail.

That is because every agent exposes hooks and transcripts differently.

Claude Code provides the richest signal today. Codex CLI, Cursor, and Gemini CLI are also supported, but Agent Note records only the prompt, response, changed files, and AI Ratio evidence that each agent can expose reliably.

I also do not want to overstate the evidence.

If Agent Note cannot know something reliably, it does not pretend to know it. AI Ratio is an estimate, not proof.

The latest support matrix is available in Agent Support.

Things to keep in mind

Agent Note records conversations with AI for the team.

That record should be handled carefully.

  • Do not put secrets in prompts or responses
  • When Git notes are pushed, the team can read the saved conversation
  • AI Ratio is an estimate, not an automatic judgment of quality or responsibility
  • Different agents expose different levels of detail
  • Gemini CLI support is still Preview

Agent Note is closer to review context than to an audit verdict.

Closing

The more we use AI coding agents, the less a diff alone is enough for code review.

Human commits have commit messages and Pull Request discussions. AI-assisted commits should also preserve prompts, responses, context, and AI Ratio.

Agent Note is an open source, Git-native way to do that.

  • GitHub: https://github.com/wasabeef/AgentNote
  • Documentation: https://wasabeef.github.io/AgentNote/
  • npm: https://www.npmjs.com/package/agent-note

If you want AI-assisted code to remain understandable after the session is over, please give Agent Note a try.

Building a Dating App with No Backend: How I used Rust, Tauri 2.0, and P2P Mesh Networking to Fight the Loneliness Pandemic

The Problem: The “Skinner Box” of Modern Dating

Traditional dating apps have a fundamental conflict of interest: if you find a partner, they lose a customer. Their algorithms are designed to keep you swiping, not to help you meet. Plus, your most intimate preferences and behavioral patterns are stored in a centralized database, ripe for data mining.

I decided to build Aura: a dating app that is a privacy-first utility, not a business.

The Architecture: Local-First and Serverless

Aura has no central server. No “matchmaking” algorithm running in the cloud. No user database.

Here’s how it works:

  1. Encrypted Local Storage: Everything—your swiping history, chats, and profile—lives in an encrypted SQLCipher database on your device, managed by a native Rust backend.
  2. P2P Discovery: Instead of a cloud API, Aura uses libp2p to scan for nearby “Resonances.” Devices act as nodes in a gossip mesh, propagating encrypted discovery packets.
  3. Store-Carry-Forward: The network is “living.” Your phone “carries” encrypted profiles through physical movement, gossiping them to other peers as you move through different areas.

The Tech Stack: Why I switched to Tauri 2.0 + Rust

I recently migrated Aura from React Native to Tauri 2.0. This was a game-changer:

  • Rust for the Heavy Lifting: P2P networking, encrypted storage, and a local preference optimizer (a tiny ML model that learns your interests and optimizes suggestions) are all in Rust. It’s fast, memory-safe, and handles background tasks beautifully.
  • Vite + React for the UI: I can build a premium, high-performance UI with standard web tech, leveraging glassmorphism and modern animations without a heavy framework overhead.
  • Atomic IPC: Tauri’s bridge allows the frontend to talk to the secure Rust core with minimal latency.

Solving Trust: The Relational Reputation Mesh

One of the biggest challenges in a serverless app is safety. How do you trust someone without a central moderator?

Aura uses a Decentralized Reputation Mesh. Your “Aura Score” isn’t a global number; it’s a Relational Valence. Your score is calculated locally on your device based on specific gossip you’ve received. We even implemented Asymmetric Time Decay: negative signals decay 4x faster than positive ones to allow for “redemption arcs” while rewarding long-term positive behavior.

What’s Next?

Aura is fully open-source (AGPL v3). I’ve just finished the F-Droid metadata recipe and am validating the build simulation to get it published.

I’m looking for contributors! If you’re into P2P protocols, local-first architectures, or just want to build tech that actually helps people connect in the real world, check out the repo:

👉 https://github.com/bensiv/aura

What do you think about the future of local-first social apps? Let’s discuss in the comments!

API Versioning: Current Approaches and Choices in the Ecosystem

The moment you launch an API, one of the biggest nightmares associated with it is versioning decisions. Especially when you consider that this API will be used by different clients or is expected to have a long lifespan, things become even more complex. Looking back at the trouble I got myself into by underestimating this topic in the past, my pragmatic approach today is rooted in those painful experiences.

API versioning is one of the key ways an application can evolve over time while ensuring that existing clients are not disrupted. In a production ERP system I worked on, the painful lesson of 20 different operator screens suddenly throwing errors due to a minor change in a JSON field name taught me how critical this issue is. In this post, I will discuss API versioning approaches, the experiences I’ve gained in my projects, and the trade-offs these choices bring.

Why is API Versioning Necessary? My First Wrong Choices

When developing an API, everything initially goes through a single version, and it’s easy. However, over time, business requirements change, new features are added, and old features are updated or removed. These changes directly affect the applications consuming the API (mobile apps, web frontends, other services) and are called breaking changes. A breaking change causes an application using the API to stop working without code modifications.

In my early projects, I overlooked this. While developing the backend for a financial calculator side project, I didn’t implement versioning with the thought, “I’m the only one using it, so it’s not necessary.” A few months later, while adding a new feature, I had to change the data format my existing mobile app received from the backend. The result: I had to deploy both the backend and the mobile app simultaneously and wait for users to update their applications. This was unacceptable for a system with zero downtime expectations.

⚠️ Proven by Experience: Versioning is Essential

Versioning is critical not only when your API is exposed externally but also in internal inter-service communication. If one service uses another service’s API, a breaking change can cause the other service to break, leading to hours of downtime in the production environment.

These early mistakes taught me that versioning is not just a “technical detail” but also a matter of product strategy and operational flexibility. It is essential to define a versioning strategy from the outset to extend the life of your API and avoid inconveniencing your clients. Otherwise, we face a significant regression risk with every change.

Main API Versioning Approaches: Pros and Cons

There are three main approaches at the core of API versioning: URL Path, Query Parameter, and Header (Content Negotiation). Each has its own advantages and disadvantages, and choosing the right one depends on the project’s needs. I’ve also experienced different approaches in various projects and seen when each works and when it causes headaches.

Approach Pros Cons When I Preferred It
URL Path – Simple, understandable, discoverable – Can lead to URL bloat – Public APIs, simple projects (my own side projects)
Query Parameter – Keeps URLs relatively clean – Open to misuse, low discoverability – Rarely, usually as a “fallback”
Header (Custom) – Keeps URLs cleanest – Low discoverability, requires client support – Internal APIs, places with strict control (ERP)
Header (Accept) – Compliant with HTTP, Content Negotiation – Client implementation more complex – When different output formats are required

As a general rule, for external and public APIs, I’ve usually preferred the URL Path method because its simplicity and understandability provide the least friction for external developers. For internal APIs, I can opt for Header-based approaches that are more flexible and keep URLs clean.

URL Path Versioning: Direct and Understandable

URL Path versioning is perhaps the most common and understandable method. You include the API’s version number directly in the URL path, for example, /v1/users or /api/v2/products. This method can be easily tested even in a browser, and its documentation is straightforward.

When developing a production ERP system, I used this method for a few integration APIs exposed externally. Our clients could clearly see which version they were calling from the URL. However, when I kept two different versions live simultaneously, I started seeing if version == "v1" blocks in the backend code, which was not a pleasant situation in terms of code quality. We can make this situation a bit more manageable with a reverse proxy like Nginx.

# Nginx config example
server {
    listen 80;
    server_name api.example.com;

    location ~ ^/(v1|v2)/ {
        # Routing for v1 or v2 paths
        proxy_pass http://backend_api_cluster/$request_uri;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /v1/ {
        # Specific backend or logic for v1
        proxy_pass http://v1_backend_cluster;
        proxy_set_header Host $host;
    }

    location /v2/ {
        # Specific backend or logic for v2
        proxy_pass http://v2_backend_cluster;
        proxy_set_header Host $host;
    }
}

In the Nginx example above, you can route different versions to different backends. This gives me the flexibility to deploy different versions independently and allows me to develop new versions independently while supporting older versions. However, this approach leads to longer URLs and the need to define a new path for each new version. It’s inevitable that URLs will “bloat” as they go from v3, v4, and so on.

Header and Content Negotiation Versioning: More Flexible Methods

Header-based versioning specifies the API version via HTTP Request Headers. This keeps URLs clean and ensures that the API’s base resource URI does not change. For example, we can use a custom header like X-API-Version: 2.

Versioning using Content Negotiation is a more standard HTTP approach. By using the Accept header, the client indicates that it is requesting a specific media type or API version. For instance, with a value like Accept: application/vnd.myapi.v2+json, the client is saying “I want the 2nd version of myapi in JSON format.” I used this approach in a client project where the same endpoint needed to work with different data models.

# FastAPI example: Versioning with Accept header
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse

app = FastAPI()

@app.get("/items")
async def get_items(request: Request):
    accept_header = request.headers.get("Accept")

    if "application/vnd.myapi.v1+json" in accept_header:
        # Return data according to v1 model
        return JSONResponse({"version": "v1", "data": [{"id": 1, "name": "Item A"}]})
    elif "application/vnd.myapi.v2+json" in accept_header:
        # Return data according to v2 model (e.g., with an additional field)
        return JSONResponse({"version": "v2", "data": [{"uuid": "abc", "item_name": "Item A", "price": 100}]})
    else:
        raise HTTPException(status_code=406, detail="Unsupported Accept header")

This approach has been useful, especially in internal service-to-service communication or with tightly controlled clients like mobile applications. When developers correctly set the Accept header, it was clear which version of the API they were using. However, it is less discoverable than URL Path; for someone using the API for the first time, setting the header correctly requires reading a bit more documentation. Also, some client libraries or tools may not automatically support such custom headers or Content Negotiation, which means additional development costs.

My Versioning Strategy Choices in Projects

The choice of API versioning strategy varies depending on the nature of the project and its target audience. In my experience, there is no “single right way”; there are always trade-offs. The decisions I’ve made in my own side projects or in internal platforms of banks I’ve consulted for differ from those made in a production ERP system where I gained experience.

For example, for the public API of my financial calculator side project, I preferred URL Path versioning. The reason was simple: to enable external developers to use and integrate with the API easily. A simple URL like /v1/calculate speeds up the onboarding process, compared to a complex header structure. Currently, I maintain both v1 and v2 versions live and ensure that older clients using v1 continue to work without issues. However, this comes with the burden of maintaining the code for both versions in the backend.

On the other hand, for services I developed for a bank’s internal platform, I versioned the APIs via the Accept header. On this platform, since the clients were typically enterprise applications, developers had a higher ability to manage HTTP headers correctly. Additionally, keeping the URLs clean provided a more organized view for the bank’s security and audit teams. In this choice, the idea that “keeping URLs clean and unchanging simplifies firewall rules and monitoring configurations” was influential. Especially when a service has more than 20 APIs, going with /v1/users, /v2/users, /v3/users can be a more elegant solution than Accept: application/vnd.bank.users.v3+json.

ℹ️ Which Method When?

When making a choice, consider your target audience and the lifespan of your API. Simplicity (URL Path) for public APIs and flexibility (Header) for closed systems or custom integrations generally yield better results. Remember, the ultimate goal of API versioning is to be able to evolve your API without breaking your clients.

In these choices, I always asked myself, “How much flexibility will I need in the future?” and “What will be the operational cost of this approach?” Sometimes I accepted technical debt for the sake of simplicity, and sometimes I opted for a slightly more complex start for long-term sustainability. The important thing is to be aware of these trade-offs and make informed decisions.

Zero Downtime and Deprecation Management During Version Transitions

One of the most challenging parts of API versioning is ensuring a smooth transition from older versions to newer ones. When working with the goal of “zero downtime,” both old and new versions need to be live simultaneously for a period. This is usually managed through a process called “graceful deprecation.”

During the transition from v1 API to v2 in an ERP system for a manufacturing firm, I observed that the old operator screens continued to use v1, while the newly developed mobile applications started using v2. To manage this transition process, we set a deprecation period of approximately 6 months. During this period, by monitoring calls to the v1 API, we identified which clients were still using the old version. I notified clients by adding a warning header (X-API-Deprecated: true; Deprecation-Date: 2026-12-31) to the v1 endpoints.

HTTP/1.1 200 OK
Content-Type: application/json
X-API-Deprecated: true; Deprecation-Date: 2026-12-31
Link: <https://api.example.com/v2/docs>; rel="sunset"; type="text/html"

{
  "message": "This API version (v1) will be deprecated on December 31, 2026. Please migrate to v2."
  // ... v1 response data
}

This type of deprecation notice gives client-side developers sufficient time and helps them plan the transition. Seeing the number of requests to v1 endpoints decrease over the deprecation period was a key metric indicating that we could safely shut down v1. If I hadn’t done this monitoring, I might have shut down v1 and put a customer still using it in a difficult situation. I previously wrote about [related: observability strategies], and this is part of that.

I haven’t shied away from making mistakes either. Once, while performing final checks before shutting down v1, I decided to shut it down within an hour with the thought, “no one is using it anyway.” However, a reporting tool that was overlooked was still using v1, and that tool’s reports suddenly started coming back “empty.” This taught me the lesson: “Never assume, always look at the data.” In such situations, having rollback mechanisms ready is a lifesaver. My [related: post on CI/CD reliability] also touches upon these topics.

Development and Operational Cost of API Versioning

API versioning brings significant development and operational costs with it. Defining a strategy without considering these costs can lead to major problems down the line. In my experience, these costs are generally categorized under three main headings: documentation, testing, and deployment.

Documentation: Each new API version requires an updated or new set of documentation. While tools like Swagger/OpenAPI simplify this process, maintaining separate documentation for two or more active versions requires continuous effort. Especially in cases where a field in v1 changes or is removed in v2, clear and up-to-date documentation is vital to avoid confusing client developers. I know I spend a few hours a week keeping the API documentation for my side project up to date.

Testing: Each new version means expanding the existing test suite. The tests you wrote for v1 may not be valid for v2, or you may need to add new scenarios specific to v2. In an ERP project, we updated our CI/CD pipeline to test v1 and v2 APIs simultaneously. This increased test time by 30%, but it was necessary to catch regressions. I especially had to write separate integration tests for each version in automated tests.

# Example CI/CD test step (pseudo-code)
# Similar steps can be used in Gitlab CI/CD or Github Actions

test_api_versions:
  stage: test
  script:
    - echo "Running tests for API v1"
    - docker-compose -f docker-compose.v1.yml up -d
    - pytest tests/v1/
    - docker-compose -f docker-compose.v1.yml down

    - echo "Running tests for API v2"
    - docker-compose -f docker-compose.v2.yml up -d
    - pytest tests/v2/
    - docker-compose -f docker-compose.v2.yml down

Deployment and Operations: Keeping multiple API versions live complicates deployment strategies. Having different versions in different codebases (e.g., separate Git branches) or containing conditional logic within the same codebase increases the maintenance burden. In a client project, I ran two different API versions in separate Docker containers and routed traffic using Nginx. This meant allocating separate resources (CPU, RAM) for each version, which increased operational costs. I previously shared my experiences with [related: Nginx reverse proxy configurations].

To reduce these costs, it’s important to avoid unnecessary versioning and include non-breaking changes (e.g., adding a new field) in the current version. Furthermore, setting long deprecation periods encourages clients to migrate to newer versions, helping to reduce the maintenance burden of older versions.

Conclusion: Versioning is a Strategy, Not a Feature

API versioning is more of a product and operational strategy than a technical feature. One of the most important lessons I’ve learned in my 20 years of field experience is not to postpone this topic by saying “we’ll handle it later.” Before launching an API, having a clear versioning strategy and ensuring your team is in agreement on it will prevent many future headaches.

The right versioning approach varies depending on the nature of your project, your target audience, and your operational capacity. The simplicity of URL Path, the flexibility of Header-based versioning, or the rare use cases of Query Parameter… I’ve experienced them all and seen that each has its own pros and cons. The important thing is to be aware of these trade-offs and make an informed decision.

Remember, as your API evolves, your clients will also have to evolve. Versioning is one of the most powerful tools we have to make this transition as smooth and transparent as possible. My clear position is: Start early, monitor, communicate, and never assume.