Five Eyes published the policy on 1 May. Mickai filed the engineering 4 weeks earlier.

Cross-posted from mickai.co.uk.

On 1 May 2026, the Five Eyes intelligence alliance (UK NCSC, US CISA, Australia ASD, Canada CCCS, New Zealand NCSC NZ) issued joint guidance on Agentic AI security. The headline findings: AI agents need verifiable identity, signed audit trails, and cryptographic attestation of behaviour.

Four weeks earlier, on 4 April 2026, I (Micky Irons) filed UK patent application GB2610413.3 at the Intellectual Property Office: the Open Inter-Vendor Audit Record (OAR) format. Twenty claims. The same engineering primitive the Five Eyes guidance describes, only it is already in the public patent record.

The OAR primitive in plain English

Every action an AI agent takes (prompt received, tool call dispatched, model invoked, memory written, response emitted) is captured as an Audit Record. Each record is:

  • Cryptographically signed with a hardware-bound key (post-quantum, ML-DSA-65, FIPS 204).
  • Chained to the previous record so tampering breaks the chain.
  • Vendor-portable. The record format is open. A regulator, an auditor, or the user can verify the chain without depending on the vendor that produced it.

That last property is the policy hook. Five Eyes asked: how does a defender prove what an agent did? OAR’s answer: read the chain, verify the signatures, done. No vendor cooperation required.

Why “4 weeks earlier” matters

Filing dates at the UK IPO are immutable public record. GB2610413.3 has a UK IPO filing date of 4 April 2026. The Five Eyes guidance is dated 1 May 2026. Anyone can verify both dates independently.

This is not a coincidence. Mickai’s broader portfolio is 31 UK patent applications and 914 claims, all named to Mickarle Wagstaff-Irons (Micky Irons, the founder), all filed without external counsel via the UK IPO’s no-fee Apply for a Filing Date route. The work was done before the policy was written, because the policy was the obvious next step once the engineering existed.

What changes for builders

If you are shipping an agent today and you want to be ready for the regulatory wave that the Five Eyes guidance is about to trigger, the OAR primitive gives you three properties:

  1. Verifiability without vendor lock-in. Your customers can audit your agents without your help.
  2. Post-quantum readiness. ML-DSA-65 is the FIPS 204 standard. Quantum-resistant from day one.
  3. Hardware-bound identity. Keys live in TPM / Secure Enclave / TrustZone, not in environment variables.

The full architecture is documented at mickai.co.uk. The article that pegs this to the Five Eyes news is here:

Five Eyes Published the Policy. Mickai Filed the Engineering.

Mickai is a sovereign AI operating system built in Workington, Cumbria, by Micky Irons. 31 UK patent applications, 914 claims. No cloud round-trip. No telemetry. Sovereign by default.

Debugging the Deployment Pipeline (When the MDT Image Goes Ghost)

They call me a Support Tech, but I see myself as a Value Architect. I don’t just “install apps”—I engineer the logic that makes them deploy at scale. Recently, my flow was interrupted when our MDT image decided to stop cooperating. What should have been a routine laptop setup quickly turned into a high-stakes deep dive into systems integrity and deployment architecture.

The Glitch: The Logic Break
I was preparing to image a batch of fresh laptops when the process hit a wall. The system couldn’t find the instructions it needed to start, and Disk Management showed the drive as “Unallocated”.

  • The Problem: The bootable logic on the MDT image was corrupted.
  • The Stake: High-stakes deployments for the IMEA region were at a complete standstill.

The Systems Logic Fix
Instead of just re-downloading and hoping for a miracle, I treated the failure like a software bug that needed a structural fix:

  • Re-partitioning via Script: I didn’t just format the drive; I used Diskpart to re-align the partition logic to match modern UEFI standards—the specific environment Windows 11 requires to function.
  • Verifying Source Integrity: I navigated back to the source on SharePoint to download a fresh, verified IMEA MDT image. This ensured the “code” I was deploying was clean and optimized from the start.
  • The Result: The “ghost” drive was restored, becoming a perfectly functioning deployment tool once again.

Why This is Software Development
Software development is ultimately about creating repeatable, logical processes. By fixing the MDT pipeline, I wasn’t just fixing one laptop; I was ensuring that every future deployment followed a clean, automated script.

This is the exact mindset I am bringing into Data Science—identifying where a data flow is broken and re-building the pipe for maximum efficiency. Whether you’re writing Python or managing MDT images, the goal is Systems Logic. If the foundation is broken, the software won’t run. Fix the foundation first. ✌️

String Polyfills and Common Interview Methods in JavaScript

Hello readers 👋, welcome to the 24th blog in this JavaScript series!

Last time we explored the clever spread and rest operators, learning how the same three dots can either unpack or collect data. Today we are shifting gears to a topic that deeply sharpens your understanding of how JavaScript works under the hood: string polyfills and common interview methods.

Have you ever wondered how "hello".includes("ell") works inside the engine? Or how you could make the same behavior work in an old browser that doesn’t support includes? That’s exactly what we are going to dive into. We will not just use string methods, we will build several of them from scratch, understand the logic, and then solve some classic interview string problems.

Let’s get our hands dirty.

What string methods are

In JavaScript, every string is a primitive value, but when you access a property or method on it, the engine wraps it in a String object behind the scenes. This gives you access to a large set of built-in methods like indexOf, slice, substring, includes, startsWith, endsWith, trim, repeat, and many more.

These methods help you manipulate and inspect strings without manually looping through characters. For example:

const greeting = "Hello, Satya";
console.log(greeting.includes("Satya")); // true
console.log(greeting.startsWith("Hello")); // true
console.log(greeting.endsWith("ya")); // true
console.log(greeting.repeat(2)); // "Hello, SatyaHello, Satya"

They are convenient, but they are also abstractions over simple character-by-character operations. Understanding what happens under the hood is not only fascinating but also extremely valuable for technical interviews.

Why developers write polyfills

A polyfill is a piece of code that provides modern functionality to older browsers that lack it. For example, String.prototype.includes was introduced in ES6 (2015). If you needed to support an environment that didn’t have it, you would write your own version of includes and attach it to String.prototype (carefully, of course).

But even in modern development, writing polyfills serves another purpose: it forces you to truly understand how a method works. Interviewers love to ask, “Can you implement your own version of startsWith?” or “How would you write a polyfill for repeat?” Knowing the internal logic makes you a stronger developer and prepares you for these moments.

So, let’s start building.

Implementing simple string utilities: polyfills

We will write our own versions of several common string methods. For each one, I’ll explain the logic step by step and then show the code. Remember, these are simplified educational versions that aim to mimic the core behavior as per the MDN specification.

Polyfill for includes

The includes method determines whether one string can be found within another string, returning true or false. It takes a search string and an optional position from which to start searching. It is case-sensitive.

Logic: Loop through the main string from the given start position. For each index, check if the substring starting there matches the search string. If we find a full match, return true. If we reach the end without a match, return false.

if (!String.prototype.myIncludes) {
  String.prototype.myIncludes = function(search, start) {
    if (search instanceof RegExp) {
      throw new TypeError("First argument must not be a RegExp");
    }
    start = start || 0;
    if (start + search.length > this.length) return false;
    return this.indexOf(search, start) !== -1;
  };
}

Wait, we used indexOf above! That’s cheating if we want a from-scratch polyfill without relying on other ES6 methods. So let’s do a straight loop:

String.prototype.myIncludes = function(search, start) {
  if (search instanceof RegExp) throw new TypeError("First argument must not be a RegExp");
  start = start || 0;
  var source = this;
  while (start + search.length <= source.length) {
    var match = true;
    for (var i = 0; i < search.length; i++) {
      if (source[start + i] !== search[i]) {
        match = false;
        break;
      }
    }
    if (match) return true;
    start++;
  }
  return false;
};

Now we have a pure implementation. The nested loop checks for character-by-character equality at each possible starting position.

Polyfill for startsWith

startsWith checks if a string begins with the characters of a specified string, returning true or false. It also accepts an optional position.

Logic: From the given position, compare the characters of the source string with the search string. If all match, return true. If any mismatch or if the remaining length is shorter than the search string, return false.

String.prototype.myStartsWith = function(searchString, position) {
  position = position || 0;
  if (position + searchString.length > this.length) return false;
  for (var i = 0; i < searchString.length; i++) {
    if (this[position + i] !== searchString[i]) {
      return false;
    }
  }
  return true;
};

This is straightforward: we just compare the two strings side by side from the starting index.

Polyfill for endsWith

endsWith checks if a string ends with the characters of a specified string, returning true or false. It accepts an optional length parameter, which sets the length of the string to consider. If not provided, it defaults to the full string length.

Logic: We need to compare the end of the source string (up to the given length) with the search string. The start index for comparison will be (length - searchString.length).

String.prototype.myEndsWith = function(searchString, length) {
  var sourceLen = length !== undefined ? length : this.length;
  if (sourceLen > this.length) sourceLen = this.length;
  var startIndex = sourceLen - searchString.length;
  if (startIndex < 0) return false;
  for (var i = 0; i < searchString.length; i++) {
    if (this[startIndex + i] !== searchString[i]) {
      return false;
    }
  }
  return true;
};

Test it:

console.log("Hello world".myEndsWith("world")); // true
console.log("Hello world".myEndsWith("Hello", 5)); // true

Polyfill for repeat

repeat constructs and returns a new string which contains the specified number of copies of the string on which it was called, concatenated together. It throws a RangeError if the count is negative or Infinity, and a count of zero returns an empty string.

Logic: We start with an empty string, then loop count times, appending the original string each time. However, for large counts, this would be inefficient. For a polyfill, a simple loop is fine as it’s unlikely to be used with huge numbers in older browsers. But we can implement a more efficient method using doubling.

For simplicity, we’ll do a loop and also handle non-integer counts by flooring them.

String.prototype.myRepeat = function(count) {
  if (count < 0 || count === Infinity) {
    throw new RangeError("Invalid count value");
  }
  count = Math.floor(count);
  var result = "";
  for (var i = 0; i < count; i++) {
    result += this;
  }
  return result;
};

Test: "abc".myRepeat(3) gives "abcabcabc".

Polyfill for trim

trim removes whitespace from both ends of a string. Whitespace includes spaces, tabs, no-break spaces, and all line terminator characters.

Logic: Use a regular expression to strip leading and trailing whitespace. Alternatively, loop from the start and end to find the first non-whitespace character and slice the string.

A regex approach is simple and works in older environments:

String.prototype.myTrim = function() {
  return this.replace(/^[suFEFFxA0]+|[suFEFFxA0]+$/g, "");
};

s matches spaces, tabs, line breaks; uFEFF is BOM (Byte Order Mark); xA0 is non-breaking space. This closely matches the ES5 specification.

Common interview string problems and their logic

Beyond polyfills, interviews often test your ability to manipulate strings using basic algorithms. Here are a few classic problems, along with the thought process and solutions.

1. Reverse a string

Probably the most famous beginner question: “Write a function that reverses a string.” The trick is to avoid using the built-in reverse method on arrays, at least in the explanation, but we can show multiple approaches.

Approach 1: Loop from end to start

function reverseString(str) {
  var reversed = "";
  for (var i = str.length - 1; i >= 0; i--) {
    reversed += str[i];
  }
  return reversed;
}

Approach 2: Using array methods (built-in but still asked)

function reverseString(str) {
  return str.split("").reverse().join("");
}

2. Check if a string is a palindrome

A palindrome reads the same forward and backward. You can use the reverse function and compare, but it’s more efficient to compare characters from both ends moving inward.

function isPalindrome(str) {
  str = str.toLowerCase().replace(/[^a-z0-9]/g, ""); // sanitize
  var left = 0;
  var right = str.length - 1;
  while (left < right) {
    if (str[left] !== str[right]) return false;
    left++;
    right--;
  }
  return true;
}

3. Count occurrences of a character or substring

A straightforward loop or using split:

function countChar(str, char) {
  var count = 0;
  for (var i = 0; i < str.length; i++) {
    if (str[i] === char) count++;
  }
  return count;
}

// Using split trick (but not recommended for interviews if they want algorithm)
function countSubstring(str, sub) {
  return str.split(sub).length - 1;
}

4. Truncate a string

Write a function that truncates a string to a given length and appends “…” if it was truncated.

function truncate(str, maxLength) {
  if (str.length <= maxLength) return str;
  return str.slice(0, maxLength) + "...";
}

5. Capitalize the first letter of each word

A classic transformation:

function capitalizeWords(str) {
  return str.split(" ").map(function(word) {
    if (word.length === 0) return word;
    return word[0].toUpperCase() + word.slice(1).toLowerCase();
  }).join(" ");
}

6. Remove duplicates from a string

We can use a Set, but interviewers might ask for a manual approach:

function removeDuplicates(str) {
  var seen = {};
  var result = "";
  for (var i = 0; i < str.length; i++) {
    if (!seen[str[i]]) {
      seen[str[i]] = true;
      result += str[i];
    }
  }
  return result;
}

These exercises train you to think in loops and conditionals, which is exactly the skill needed to write polyfills or solve algorithmic challenges.

The importance of understanding built-in behavior

When you write a polyfill, you are essentially stepping into the shoes of the JavaScript engine. You learn to handle edge cases: what happens if the argument is a RegExp? What if the count is Infinity? How does case-sensitivity work? This depth of understanding makes you a safer and more precise developer.

In an interview, if you can not only use includes but also explain how it might be implemented and then code it up on a whiteboard, you demonstrate a fundamental command of the language that many candidates lack. It shows you don’t just memorize methods; you understand principles.

Moreover, knowing the internal logic helps you debug mysterious bugs. For instance, understanding that trim removes a specific set of whitespace characters, not just spaces, prevents unexpected failures when dealing with user input.

Visualizing string processing flow

I often picture a string method as a loop with a pointer scanning across the characters. For includes, imagine a sliding window. You have the main string, and you slide the search string along it, checking each position until you find a match or reach the end.

For startsWith, you only look at the very beginning, like checking the first few letters of a book title to see if it matches a given prefix. For endsWith, you look at the last letters, moving your attention to the tail of the string.

For something like repeat, it’s like a factory that takes a template and stamps out copies, joining them one after another. For trim, you walk in from both edges, trimming away whitespace until you hit a visible character, then capture the inner part.

This mental imagery makes writing polyfills much easier because you translate the visual into code.

Visulalization

Conclusion

Today we’ve gone beyond just using string methods and dug into the logic that powers them. We built polyfills for includes, startsWith, endsWith, repeat, and trim, seeing how simple loops and conditionals can replicate browser-native behavior. We also solved common interview string problems to reinforce the thinking pattern.

Let’s summarize the key takeaways:

  • String methods are built-in tools that manipulate strings, but they are all based on fundamental operations like looping, comparison, and slicing.
  • Polyfills are implementations of modern methods that enable them in older environments, and writing them deepens your understanding of JavaScript.
  • We created step-by-step polyfills for includes, startsWith, endsWith, repeat, and trim, each with a clear logic.
  • Interview problems like reversing a string, checking palindromes, counting occurrences, and capitalizing words rely on the same foundational skills.
  • Truly comprehending built-in behavior prepares you for technical interviews and makes you a more effective developer.

The next time you use a string method, you’ll know exactly what’s happening behind the scenes, and you’ll be ready to tackle any string-related challenge that comes your way.

Hope you found this helpful! If you spot any mistakes or have suggestions, let me know. You can find me on LinkedIn and X, where I post more about web development.

What is dogfooding? How JetBrains builds better developer tools

Dogfooding in software development means using your own products to build, test, and improve them. At JetBrains, it’s a core part of how we create developer tools like IntelliJ IDEA, YouTrack, and Rider.

We don’t rely on assumptions or abstract user personas. We use our tools every day in real workflows, which keeps us close to the problems developers actually face.

Our CEO, Kirill Skyrgan, puts it:

“You can only build truly great software if you use it yourself. Every feature and every decision comes from firsthand experience.”

What is dogfooding in software development?

Dogfooding — short for “eating your own dog food” — means putting your product through the same real-world use as your customers.

Our engineers, designers, product managers, and even technical writers build their daily workflows around JetBrains tools. We write code in  IntelliJ IDEA and track issues and internal project statuses in YouTrack.

It’s not about internal compliance – no one forces anyone to use a product. It’s about trust. We use our tools because they help us do our jobs better, and when they don’t, we fix them.

This direct connection between building and using keeps us grounded. We don’t chase trends or design for hypothetical users. If something slows us down, we know it likely affects thousands of developers too

Benefits of dogfooding: Faster feedback and better software

Dogfooding gives us what every product company dreams of: immediate, unfiltered feedback.

Instead of waiting weeks for customer reports, our developers spot issues as they code.
When a feature feels unintuitive or a shortcut doesn’t work as expected, the fix often starts that same day or even the same hour.

This tight feedback loop turns every JetBrainer into a quality advocate. It shortens the distance between problem and solution, helping us catch things long before they ever reach users.It also fosters empathy. Using the tools ourselves means we understand not only what users say, but what they experience. We feel the slowdowns, the friction points, and the “why is this like that?” moments – and we care enough to address them.

“Those thousands of tiny corrections made over time are what turn a good product into a great one,” Kirill shared. “They come from people who use the tool every day and want it to be better, not for KPIs, but because they genuinely care.”

Examples of dogfooding at JetBrains

Dogfooding shapes every JetBrains product, often long before release.

Rider: From unstable to production-ready

One of the best examples of dogfooding in action is Rider, our .NET IDE. Back in 2016, when it was still unstable and full of rough edges, JetBrains developers began using it for their work long before it was officially released. Some days, you couldn’t even type because the editor would crash. But instead of giving up, teams fixed the issues they encountered on the spot.

That perseverance turned Rider from an experiment into a world-class IDE. The same principle has shaped countless JetBrains products since.

YouTrack: Built and managed in itself

Another case is the YouTrack team, who use their own issue tracker to manage every internal project and improvement flows for the product itself. That constant internal use surfaces edge cases and drives continuous refinement.

Junie: Shaped before users ever saw it

Junie, one of our newer tools, was used internally months before its closed beta.

The team started using Junie internally in December 2024, even before it reached closed Beta. From the very beginning, internal feedback played a major role in shaping how the product evolved. Team members quickly identified things that didn’t feel quite right, from small interface quirks to moments where Junie didn’t respond as expected. This early insight helped the team refine the experience long before anyone outside JetBrains ever saw it.

One particularly important piece of feedback was that Junie didn’t explain enough about what it was doing. That lack of clarity made some interactions feel confusing. Because the team experienced this themselves, they were able to rethink the product’s communication early on and make it more transparent and helpful.

Another area that benefited enormously from dogfooding was Junie’s connection with different work environments used throughout the company. JetBrainers rely on a wide variety of setups in their daily work, and using Junie across these revealed many edge cases the team wouldn’t have spotted otherwise. Each of these discoveries turned into improvements – hundreds of them.

How dogfooding improves developer experience and ownership

Dogfooding doesn’t just improve products — it changes how teams work. When you use what you build, the distinction between “developer” and “user” disappears. There’s no handoff, no abstraction.

That perspective creates stronger ownership. Decisions have immediate, visible impact. Teams see the results of their work in real time.

Dogfooding AI tools at JetBrains

Our teams use AI-assisted features internally long before release, testing what feels useful, what feels distracting, and what actually improves productivity.

This helps us avoid building AI for the sake of trends. We build it because we need it — and we refine it until it works in real development environments.

Why dogfooding matters for building better software

Dogfooding is how we make sure our tools meet the same high standards our users expect. It keeps us honest, motivated, and connected to the work we do. It’s not always comfortable – finding bugs in your own product rarely is – but it’s the most authentic way we know to build software that truly makes a difference.

This is what has kept JetBrains thriving for over two decades: a culture of doers who build, test, and improve from the inside.

As one of our technical leads put it:

“If I start any new project, the first milestone for it is definitely dogfooding. It’s one of the most important quality gates for the product and a crucial source of high-quality feedback.”

Build what you believe in

Dogfooding isn’t just a process we follow – it’s a fundamental part of how we work. It helps us stay close to our mission, keep improving, and make sure that when developers everywhere open a JetBrains tool, it feels like it was built by someone who truly understands them.

Because it was.

If this way of working resonates with you, if you care about the craft, and prefer solving real problems over just chasing trends — you’ll likely feel at home here. Check out out careers page for open roles!

The smoke tests that never got automated

I’ve been a frontend dev for a few years now, and there’s a pattern I kept seeing across almost every small team I worked with.

New feature ships. Everyone’s happy. Then three days later something completely unrelated breaks and nobody caught it.
Not because the QA testers didn’t know what to test. They did. They had the test cases written, they understood the flows, they knew exactly what needed a smoke test after every deploy.

The problem was always the same: automating that required Playwright or Selenium, and that was “a dev thing”. And the devs were busy shipping the next feature. So the smoke tests stayed manual, got skipped when things got hectic, and eventually everyone just hoped nothing broke between sprints.

I watched this happen enough times that I started building something about it.

What I built

FlowCanvas

Flow Testing is a drag-and-drop builder that lets anyone; QA testers, PMs, whoever does the clicking create and run real Playwright tests without writing code. You connect nodes visually (navigate, click, fill, assert) and it runs against real browsers under the hood.
What’s working today:

  • Visual canvas with nodes for navigate, click, fill, assert
  • Runs on actual Playwright (Chromium, Firefox, Webkit)
  • Trace viewer with screenshots and network logs when something fails
  • API mocking for edge cases
  • AI Agents (Planner, Generator & Healer) still improving

TraceViewer

Why I’m posting this
The product has a paid plan but I’m more interested in finding people who will actually use it and tell me what’s broken, what’s missing, or what doesn’t make sense.

If that’s you, free month on me. Just reach out.
https://flowtesting.io

Token Consumption Anxiety and the Open Source App I Built to Solve It

Thanks to AI, I’ve spent more time architecting and building apps, which means I spend a lot of time looking at frontier models and agonizing over token use. I’ve also been battling a very modern affliction: token consumption anxiety.

It feels modern AI-powered app architecture is asking us slaps an LLM at the front door. You want to dynamically pick the best model for a specific task? Great, the industry standard is to call an expensive, heavy model just to decide if the prompt should go to Claude, Gemini, or a smaller open-source model. We are burning latency and spending tokens at near absurd levels.

I got tired of this cycle. I wanted a model picker with exactly zero models in the request path. So, I fired up Antigravity, let the AI (a trio of Gemini, Codex, and Claude) do the coding while I directed the architecture, and built a tool to solve my own headache.

The result is RightModel. It’s a tool that evaluates your task and recommends the ideal model—but the way it gets there is entirely different. Let’s walk through the architecture.

Handling the request

When you submit a task to RightModel, there are zero LLM calls in the default path. The system evaluates your parameters, computes the ideal model against a pre-existing ruleset, and returns the response instantly.

Here’s an example JSON snippet:

{
  "task_type": "code_generation",
  "recommended_model": "claude-3-5-sonnet",
  "reason": "High complexity context matched; tier 1 code model selected."
}

Everything interesting happens before the request, not during it.

The “intelligence” at runtime

The core of the app is the ruleset. It contains task-type classification rules, model-tier mapping, and tie-breakers.

While I used AI to help author these rules initially, the final artifact is human-reviewable and human-owned. I’m not relying on an LLM to make a black-box runtime decision; I’m executing code.

Solving the staleness problem

The LLM landscape moves fast, so a static ruleset needs to keep up to date. To keep RightModel accurate without making live API calls during a user request, the app pulls fresh pricing data from OpenRouter via a scheduled workflow trigger via Google Cloud Scheduler. This scheduling can be done with another service, depending on the app architecture.

Notice what gets regenerated: the pricing data, not the rule logic. The logic remains a curated, human-authored layer. I also caution the user about this staleness directly with a footer stating exactly when the data was last refreshed, for transparency.

AI as an escalation path

Sometimes, requests don’t fit cleanly into a ruleset. A task might trigger an “ambiguous” or “low confidence” flag.

When this happens, RightModel doesn’t perform a silent fallback or an automatic, expensive upgrade. Instead, the user sees an explicit “Deep Analysis” button. This LLM call is powered by Gemini 2.5 Flash, but I plan to tweak this based on user feedback and technology updates.

Enter: Precomputed AI

Building this app made me realize this architecture isn’t isolated to picking models. A happy accident, really, and I’ve been calling this pattern Precomputed AI.

At its core, Precomputed AI shifts LLM reasoning out of the real-time request path and into an asynchronous build pipeline. It requires three specific properties, all of which power RightModel:

  • A versioned artifact (the ruleset)
  • A regeneration cadence (the pricing cron and visible staleness)
  • A declared escalation path (the Deep Analysis button)

What do you think?

If you’re shipping LLM-powered tools right now, I challenge you to ask yourself: which parts of your reasoning actually need to be live?

You can read more at the Precomputed AI website, and try out the RightModel app. I’d particularly value feedback from people creating AI-powered apps and solutions.

Why MCP Apps are going to be the next big thing

There’s a quiet shift happening in how AI tools interact with users, and most developers haven’t noticed yet, because just one week after the MCP Apps specs were published by Anthropic, OpenAI launched a huge marketing campaign around OpenClaw and they got all the attention, at least for a while.

For the past two years, every AI assistant has been stuck behind the same interface: a text box. You ask a question, you get text back. Maybe some markdown. Maybe a code block. Maybe an image.

Claude, GPT, Copilot, Gemini, every local model render into the same narrow pipe. MCP Apps change that.

What MCP Apps actually is

MCP Apps is a protocol extension that lets MCP tool results include interactive UIs. Actual interactive components running inside the AI host’s sandbox.

The mechanics are straightforward:

  1. Your MCP tool returns a structuredContent payload alongside the normal text content
  2. The host loads a ui:// resource (an HTML page you provide) into a sandboxed iframe
  3. The host forwards the tool result to your iframe via postMessage using a JSON-RPC 2.0 protocol (ui/* methods)
  4. Your renderer mounts the UI inside the iframe
  5. The UI can call tools back, send messages to the conversation, request display mode changes, and resize itself

The text content still serves LLM reasoning and hosts without rendering support. The UI is a progressive enhancement. Let’s be honest: natural language is not always the most convenient or fastest way to express a desire. A click or a tap on a button can be much faster.

The spec behind it

The MCP Apps protocol (spec version 2026-01-26) is an official extension to MCP maintained by Anthropic. It defines a JSON-RPC 2.0 message layer over postMessage between a sandboxed iframe and its host. The spec covers:

  • Handshake: ui/initialize request/response with protocol version negotiation, app capabilities, and host capabilities
  • Lifecycle: tool-input, tool-result, tool-cancelled, tool-input-partial (streaming), host-context-changed, resource-teardown
  • Actions: ui/open-link, ui/message, ui/request-display-mode, ui/update-model-context, tools/call
  • Sizing: ui/notifications/size-changed (reactive, from app to host), ui/notifications/preferred-size (declarative hints)
  • Security: CSP via _meta.ui.csp on resource content items, Permission Policy for camera/microphone/geolocation/clipboard

The spec is deliberately renderer-agnostic. It defines how the host and the iframe talk to each other. What you put inside the iframe is entirely your choice. The host doesn’t parse your component tree or validate your DOM structure. It sends you JSON-RPC messages and expects JSON-RPC messages back.

This is a deliberate design decision and the reason multiple rendering approaches can coexist. The Anthropic ext-apps SDK, a prefab renderer, a raw React app, a Svelte component — all valid. The protocol doesn’t care.

Current host implementations:

Host Status Notes
VS Code Copilot Chat Shipping Full spec support, CSP via <meta> tag, acquireVsCodeApi() transport
Claude Desktop Shipping Full spec support, CSP via HTTP headers on sandboxed origin ({hash}.claudemcpcontent.com). Can send results before init completes
ChatGPT Shipping Full spec support, CSP via HTTP headers on sandboxed origin ({slug}.oaiusercontent.com)

Three independent hosts shipping the same protocol make apparent that this is not a proposal, but infrastructure.

Why this matters

Here’s the thing I think people are going to underestimate: this turns every MCP server into a full-stack application.

An MCP server is already a backend that exposes typed tools over stdio or HTTP. It already has access to databases, APIs, file systems. The only thing missing was a frontend.

Now it has on that lives inside the AI conversation, gets tool arguments and results pushed to it automatically, can call tools back on its own MCP server via the host, inherits the host’s theme for free, and works across VS Code, Claude Desktop, ChatGPT, and anything else that implements the spec.

That last point is the critical one. Write once, render everywhere. The protocol is the same across hosts. The sandboxing model is the same. The postMessage bridge is the same.

Two different things driven by the same idea

MCP Apps is going to do to AI tooling what PWAs were supposed to do to mobile apps. PWAs were competing against native apps that users already loved. MCP Apps are filling a vacuum. There is no existing standard for rendering interactive UIs inside AI conversations. The alternative is pasting JSON into the chat. MCP Apps are going to be “AI Native Apps”.

The bar is low. And the protocol is good enough.

What it looks like in practice

Here’s a complete MCP tool that returns an interactive patient list with search, sorting, and click-to-view detail:

import { display, autoTable, Column, H1 } from '@maxhealth.tech/prefab'

async function listPatients(args) {
  const patients = await db.query('SELECT * FROM patients')
  return display(
    Column({ gap: 6 }, [
      H1('Patients'),
      autoTable(patients),
    ]),
    { title: 'Patient List' }
  )
}

display() wraps the component tree into the MCP wire format. The host renders it. The table is interactive — search, sort, row selection — without writing any frontend code.

But you don’t even need to compose components. The auto-renderers infer the right UI from your data shape:

import { display } from '@maxhealth.tech/prefab/mcp'
import { autoTable, autoChart, autoForm, autoMetrics, autoDetail } from '@maxhealth.tech/prefab'

// Array of objects -> searchable, sortable table
return display(autoTable(patients))

// Array with numeric fields -> line/bar chart with axes and tooltips
return display(autoChart(salesData, { xAxis: 'month', title: 'Revenue' }))

// Schema fields -> validated form that submits back to an MCP tool
return display(autoForm([
  { name: 'name', label: 'Name', required: true },
  { name: 'email', label: 'Email', type: 'email' },
  { name: 'role', label: 'Role', type: 'select', options: ['admin', 'user'] },
], 'save_user'))

// Key-value object -> formatted detail card
return display(autoDetail({ name: 'Alice', status: 'active', lastSeen: '2026-04-30' }))

// Object with numeric values -> metric cards with labels
return display(autoMetrics({ patients: 1284, appointments: 47, waitTime: '12min' }))

Each auto-renderer picks columns, axes, labels, and formatting based on what it finds in the data. You pass an array or object, you get a production-quality UI back. When you outgrow them, you drop down to the component API and build exactly what you want.

And when you want to tweak the look without writing full custom components, every element accepts utility classes:

import { display, Column, H1, Badge, autoTable } from '@maxhealth.tech/prefab'

return display(
  Column({ gap: 6, cssClass: 'p-6 max-w-4xl' }, [
    H1('Patient Dashboard'),
    Badge({ label: '47 active', variant: 'success', cssClass: 'text-sm' }),
    autoTable(patients, { cssClass: 'rounded-lg shadow-md' }),
  ]),
  { title: 'Dashboard', layout: { preferredHeight: 600 } }
)

The built-in CSS ships ~200 Tailwind-compatible utility classes (padding, margin, flex, grid, gap, typography, colors, borders, shadows, max-height, overflow). No Tailwind dependency, no build step, no purge config. They’re there in the 15KB stylesheet that the CDN serves alongside the renderer.

Three levels of control: auto-renderers for zero-config, utility classes for visual tweaks, full component API for complete custom UIs.

The HTML renderer is only 80kB in size and imported by a single script tag:

<div id="root"></div>
<script src="https://cdn.jsdelivr.net/npm/@maxhealth.tech/prefab@0.2/dist/renderer.auto.min.js"></script>

The protocol is the product

The real value is the protocol itself.

ui/initialize handshake. ui/notifications/tool-result for pushing data. ui/notifications/size-changed for responsive layout. ui/open-link for navigation. ui/message for sending messages back into the conversation. All JSON-RPC 2.0 over postMessage.

Anyone can build a renderer against this protocol. React, Svelte, vanilla JS, raw DOM manipulation. The host sends JSON-RPC messages to an iframe and expects JSON-RPC messages back. Your rendering stack is your business.

This is why I think MCP Apps will win where similar attempts failed. It’s a host protocol that allows any rendering approach, and that makes adoption straightforward on both sides.

What’s missing

It’s early. The spec is dated 2026-01-26. Here’s what’s still rough:

  • CSP implementation varies by host. All hosts read _meta.ui.csp from the content item returned by readResource, but the enforcement mechanism differs. VS Code injects a <meta> tag. Claude Desktop and ChatGPT enforce via HTTP headers on a sandboxed origin. The spec standardizes the declaration format, but you should test your CSP config against each host you target.
  • No standard component format. The protocol defines the transport, the payload is up to you. Every renderer invents its own component schema. (We use a $prefab JSON wire format, but nothing stops someone from using entirely different components.)
  • Permission Policy support varies. Camera, microphone, geolocation, clipboard-write access via iframe allow attribute. Hosts report what they support in hostCapabilities.sandbox.permissions, but not all hosts honor all permissions yet.
  • Buffering and timing are tricky. Claude Desktop can send tool-result before the ui/initialize response arrives. If your renderer doesn’t buffer, you lose the first result. This took us several hours to debug.

These are solvable problems. The architecture is sound.

The prediction

Within 12 months, we will see a market emerging around MCP Apps.

MCP Apps hosts will compete on rendering quality, theme support, and permission handling. MCP servers will compete on UI polish. And the protocol will quietly become the standard that holds it all together. MCP Hosts potentially replace the need for an internet browser.

Follow Max Health on GitHub

Inside Job Logs: What to Look For When Things Break

When a job fails on an HPC cluster, your first instinct might be to rerun it and hope for a different outcome. That rarely works. The real answers are almost always sitting quietly in your job logs.

Understanding how to read those logs effectively can save hours of guesswork and help you fix issues faster and more confidently.

Start With the Basics: Exit Codes

Every job finishes with an exit code. This is the simplest signal of what happened.

  • 0 means success
  • Non-zero values indicate failure

In Slurm, you will often see something like:

ExitCode=1:0

The first number is the job’s exit status, and the second is the signal. If the signal is non-zero, it usually points to something more abrupt, like a kill or crash.

Check Standard Output and Error Files

Slurm writes logs to files like:

slurm-<jobid>.out

Or custom paths defined in your job script:

#SBATCH --output=job.out #SBATCH --error=job.err

These files are your primary source of truth.

  • stdout shows normal program output
  • stderr shows warnings, errors, and crashes

Always read stderr first when debugging.

Look for the First Error, Not the Last

A common mistake is focusing on the last line of the log. In reality, the root cause often appears much earlier.

For example:

File not found: input.dat Segmentation fault (core dumped)

The segmentation fault is just a consequence. The missing file is the real issue.

Memory Issues: Subtle but Common

Memory problems show up in different ways depending on how the system enforces limits.

Typical signs include:

  • Out Of Memory
  • Killed
  • oom-kill event

In Slurm, you might also see:

slurmstepd: error: Detected 1 oom-kill event(s)

If this happens, your job likely exceeded its allocated memory. Increase --mem or optimize memory usage.

Node-Level Failures vs Application Errors

Not every failure is your fault.

Application Errors

  • Segmentation faults
  • Python tracebacks
  • Missing libraries

These point to issues in your code or environment.

System or Node Issues

  • Block device required
  • I/O error
  • Node unreachable messages

These suggest problems with the compute node, filesystem, or scheduler.

If multiple jobs fail on the same node, it’s a strong signal of a node issue.

Environment and Dependency Problems

A job might fail simply because something isn’t loaded.

Look for:

command not found module: not found libXYZ.so: cannot open shared object file

These errors usually mean:

  • Missing modules
  • Incorrect environment setup
  • Wrong software versions

Double-check your module loads and environment variables.

MPI and Multi-Node Clues

For parallel jobs, logs can get noisy. Focus on patterns:

  • Rank-specific failures
  • Communication errors
  • Timeouts

Examples include:

MPI_ABORT was invoked NCCL error connection timed out

These often point to network issues, misconfiguration, or mismatched libraries.

Timing and Resource Clues

Sometimes the issue isn’t a crash, but inefficiency or limits.

Look for:

  • Jobs stopping exactly at walltime
  • Slow startup or long idle times
  • Uneven resource usage

Slurm accounting tools like sacct and seff can complement logs and give a clearer picture.

Build a Debugging Habit

Instead of reacting randomly to failures, follow a consistent approach:

  1. Check exit code
  2. Read stderr from top to bottom
  3. Identify the first real error
  4. Correlate with resource usage and job settings
  5. Verify environment and dependencies

Over time, patterns become familiar, and debugging gets faster.

Final Thoughts

Logs are not just noise. They are structured clues about what went wrong and why.

The more time you spend understanding them, the less time you waste guessing. In HPC environments, that difference matters.

STRATEGY.md as code — turning a doc nobody reads into an LLM contract

TL;DR:

  • Strategy docs that live in Notion or in a slide deck rarely affect day-to-day decisions, because nobody reads them at the moment a decision happens. LLM agents have the same problem in a worse form: they reinvent generic best practice every session.
  • STRATEGY.md as code is the pattern of writing strategy as a file the agent must load before acting. Persona, USP, brand voice, goals, operation mode, and explicit constraints become inputs to the LLM’s reasoning, not slides for a quarterly review.
  • The pattern generalizes. INCIDENT_RESPONSE.md for SREs, INVESTMENT_THESIS.md for funds, THREAT_MODEL.md for security teams — anywhere the local rules differ from the generic best practice an LLM was trained on, a contract file closes that gap.

I have a Notion page from 2024 titled “FY24 Marketing Strategy”. It is 18 pages. Three people on the team can find it. Two have read it. One read it once, then forgot it existed. Last quarter we made a budget reallocation decision that contradicted page 7 of that document, and it took six weeks for someone to notice.

This is the normal state of strategy documents. They are written to be approved, not to be consulted. The act of writing them is what the organization wants; the act of reading them is what nobody has time for. The Notion page exists so that, in a future meeting, somebody can say “as discussed in our strategy doc”. It does not exist so that, on a Tuesday afternoon when you are deciding whether to bid on a competitor’s brand term, you reread page 7.

LLM agents inherit this problem and make it worse. Every new chat session starts from zero context. The agent has read the entire public internet but knows nothing about your business. Ask Claude or ChatGPT a question about your ad account, your codebase, or your investment thesis, and it will helpfully return whatever the average answer to that kind of question looks like. “For a B2B SaaS account, you should consider…” Yes. For a B2B SaaS account. Not for mine.

The fix is not “prompt better”. The fix is to write the business-specific constraints down in a file the agent loads at the start of every session, and to design the agent’s tools so that constraints in that file are evaluated against the actions it proposes. Strategy as a contract, not a deck. That is what STRATEGY.md as code means in practice.

What goes in the file

The shape that has held up for me, after writing this kind of file for an ad-ops product (mureo) and watching agents use it across a few hundred sessions, is six sections. Each one is a different kind of constraint on what the LLM is allowed to recommend.

Persona answers “who are we selling to”. The agent reads this and stops generating ad copy aimed at the wrong audience. Without a persona, an LLM defaults to the median small business; with one, it speaks to a specific buyer.

USP answers “what is the differentiator”. The agent reads this and stops recommending tactics that flatten the differentiator into commodity competition. If your USP is “the only one with X”, the agent should not be suggesting price-led headlines.

Brand voice answers “how do we sound”. This one is the easiest to underestimate. An LLM with no brand voice constraint produces text that sounds like an LLM. With three sentences of “no exclamation marks, no superlatives, no metaphors about journeys”, the output stops sounding generic.

Goals answer “what numbers matter and by when”. A goal is Target | Deadline | Current | Priority. The agent reads this and prioritizes the metric you actually care about, not the metric most legible on a dashboard.

Operation mode answers “what is the current posture”. I use seven values internally: ONBOARDING_LEARNING, TURNAROUND_RESCUE, SCALE_EXPANSION, EFFICIENCY_STABILIZE, COMPETITOR_DEFENSE, CREATIVE_TESTING, LTV_QUALITY_FOCUS. The exact set is less interesting than the fact that there is a set, with rules about which actions each mode permits. Mode is what flips an agent from “find new keywords to test” to “do not change anything, the algorithm is still learning”.

Constraints answer “what is forbidden, and why”. This is where the document earns its keep. Generic best practice plus your specific exceptions. The constraints section in our demo STRATEGY.md for a fitness-app account, for example, contains three rules:

1. No competitor-name bidding on either platform.
   Past attempt cost JPY 600K/quarter and produced 8 subscriptions
   at JPY 75K CPS — 16x worse than baseline.
2. Conversion campaigns optimize for "Subscribed" (paid signup),
   never for "Started trial". Trial-volume optimization trains the
   ad platforms toward shallow signups whose downstream trial-to-paid
   rate falls below the 12% quality floor.
3. Meta Lookalike stack capped at 3 active variants, and the
   variants must be 1%, 2%, 5% (or smaller). Past stacking of LAL 7%
   and 10% caused audience overlap, frequency above 8x, and CTR
   collapse within 14 days.

A vanilla LLM has no idea about any of those rules. They are the residue of past mistakes. Each rule has a number and a reason attached, because rules without reasons get ignored the moment they look inconvenient. With this file loaded, the agent has access to the same accumulated organizational scar tissue that a senior team member would carry in their head — and unlike that team member, the agent loads it on every single decision.

What “as code” actually changes

A document is read by humans, sometimes. A contract file is loaded by an agent, always. The difference is mechanical: the file goes through a tool call into the LLM’s context window before any reasoning happens.

In mureo’s case, v0.8.0 (released 2026-05-02) shipped five MCP tools that expose STRATEGY.md and STATE.json to the agent — mureo_strategy_get, mureo_strategy_set, mureo_state_get, mureo_state_action_log_append, mureo_state_upsert_campaign. The names are boring on purpose. What matters is the side effect: any host that speaks MCP — Claude Desktop chat, Cowork, the web UI, any of them — can now read the strategy file without needing direct filesystem access. Read and write. The agent can also propose updates to the strategy when the human says “we just changed our quarterly target”, and the file moves with the conversation.

The reason this matters more than “the agent reads a markdown file” might suggest: most LLM hosts outside Claude Code do not have Read and Write filesystem tools. If your strategy lives in a local file, the agent in the chat window literally cannot see it. Exposing the file through an MCP tool is the bridge. The contract becomes part of the agent’s input regardless of which client it is talking to.

The other thing that moves once the file is a contract: the agent’s recommendations stop being LLM-generic. Concretely, here is what /daily-check produces today against a synthetic D2C cosmetics scenario where Meta CPA has spiked because of a broken Pixel:

The single biggest story: Meta CPA is 5.2x Google CPA — well past the STRATEGY.md “50% sibling-channel divergence ⇒ diagnose before more spend” tripwire — and three prior manual cuts have worsened the curve.

Google Ads (last 30d): blended CPA ¥2,054. Healthy.
Meta Ads (last 30d): blended CPA ¥10,714 against a ≤¥4,500 target.

Recommend: run /rescue (pixel / Conversions API audit) on Meta. Hold all Meta bid/budget moves until divergence is diagnosed.

Read that closely. The agent quoted a constraint by name — “50% sibling-channel divergence ⇒ diagnose before more spend” — that exists nowhere outside of this account’s STRATEGY.md. A vanilla LLM looking at the same numbers would tell you Meta CPA is high and suggest pausing underperformers. That is exactly what the human manager in the demo did, three times over twenty-five days, and it made things worse, because pausing a campaign whose tracking is broken does not fix the tracking.

The contract was the thing standing between the LLM and the same mistake.

A second example, less dramatic, more typical

The seasonality-trap demo is the loud kind of failure. The strategy-drift demo is the quiet kind, and honestly the more important.

A subscription fitness app. STRATEGY.md says no competitor bidding, optimize for Subscribed not Trial, cap Meta Lookalike at three variants. A new growth manager joins on Day 30 and, over the next month, violates each rule. They launch a “Competitor Names” campaign. They flip the Meta optimization target from Subscribed to Started Trial. They add LAL 4%, 7%, 10% on top of the existing 1%, 2%, 5%.

None of the three actions appears in the dashboard as red. The competitor campaign generates apparent volume. The optimization swap inflates the result count because trials are easier than subscriptions. The bigger Lookalike stack reaches more people. Each violation is paired with a better-looking surface metric. The action_log in STATE.json shows zero new entries since Day 30. That silence is itself the diagnostic signal, because the strategy says all actions must be logged.

mureo’s STRATEGY-vs-STATE compliance audit walks the constraints, reads the campaign list, and produces a violation report with start dates and JPY-impact estimates. Not because the agent is clever. Because the rules and the state are both in files the agent can load, and “is rule N violated?” is a check the agent can perform mechanically. The cleverness is upstream — in the decision to write the rules down with reasons attached, instead of relying on tribal knowledge that the new manager never absorbed.

I find this scenario more useful for explaining the pattern than the seasonality one. Loud failures get caught eventually; the manager will eventually realize their cuts are not working. Quiet drift compounds for months because the surface metrics keep looking fine. A contract file is the only thing that catches it.

Where else the pattern fits

Marketing is not special here. Anywhere an LLM agent operates and the local rules differ from the generic best practice it was trained on, the contract pattern works.

For SREs running incident response, an INCIDENT_RESPONSE.md codifies severity classification, escalation chains, the postmortem template, and the explicit anti-patterns (“do not page the on-call DBA for read replica lag below 30 seconds — it self-resolves”). An on-call agent reading that file at the start of a page makes calibrated decisions instead of fashionable ones.

For an investment fund, an INVESTMENT_THESIS.md codifies the thesis itself, position sizing rules, sell triggers, and the kind of trade the fund refuses to make regardless of expected value. An agent doing pre-meeting prep with that file in context surfaces concerns that match the fund’s actual operating principles, not LinkedIn’s.

For a security team, a THREAT_MODEL.md lists assets in priority order, threats per asset, and the mitigations already in place. An agent triaging a CVE alert can compare it against the model and decide whether the vuln matters for this threat surface, not whether it is generally bad.

The shape under all three is the same: a file the agent loads, written in a format the agent can parse, containing the local rules that overrule generic best practice. The format does not have to be markdown. It has to be loadable, parseable, and short enough that loading it does not blow the context window.

The hardest part is not the format. It is convincing yourself to write the rules down, with reasons attached, instead of trusting them to live in three senior people’s heads.

What does not work

Honest limitations, because the pattern is not magic.

The contract is exactly as good as the human who wrote it. A vague STRATEGY.md produces vague reasoning. “Be data-driven” is not a constraint. “Do not change bids on a campaign whose 7-day conversion volume is below 30 unless the action_log shows two prior weeks of stable spend” is a constraint. The first one cannot be checked; the second one can.

Tacit knowledge still escapes. A senior ad ops person knows the brand without writing it down. They know not to bid on certain compound terms because of a regulatory issue from 2019 that nobody bothered to document. The agent does not know any of that. Writing those things down is the work, and most of the work happens after the first version of the file ships, when the agent makes a recommendation that violates an unwritten rule and the human has to either correct the agent or update the file. The file gets better; the unwritten rules shrink.

A grounded agent is not automatically a careful agent. The agent can quote the constraint and still recommend a bad action — confident-sounding writing is the LLM’s default. The contract reduces a class of mistakes (generic recommendations that ignore local rules); it does not eliminate the meta-class of “LLMs are sometimes confidently wrong”. Approval gates and rollback paths still matter; the contract is the input layer, not the safety layer.

The file has to be maintained. A STRATEGY.md that lasts a year without edits is either a stable business or a neglected file. Most are the second one. The fix is to make the agent propose edits and the human approve them — mureo_strategy_set exists for that — but the discipline of actually doing it through quarterly review still has to come from a human who cares.

A small thing, if you want to try the pattern

Pick the smallest constraint your team has paid for in past mistakes. One sentence. Write it down with a number attached and a one-line reason. Put it in a file your LLM-using tooling can read at the start of a session. See if the next decision the agent surfaces is meaningfully different.

If it is, you have the start of a contract. Add a constraint per week as new edges of “the agent gave a generic answer that does not fit our context” become visible. Within a quarter you will have something a senior teammate could read and say “yes, that is how we actually operate”. At which point you have something the LLM can also read.

mureo is one implementation of this pattern, focused on ad ops. The repo is at github.com/logly/mureo; the strategy doc format is in docs/strategy-context.md, the demo scenarios that exercise it live in mureo/demo/scenarios/. I would be more interested in seeing the pattern picked up outside marketing than in seeing more marketing-specific implementations of it. If you build an INCIDENT_RESPONSE.md or INVESTMENT_THESIS.md flavor of the same idea, I am at @yoshinaga on X — show me what you ended up with.

Yoshinaga (founder, mureo)

PyTorch vs. TensorFlow: Choosing the Right Framework in 2026

PyTorch vs. TensorFlow

Choosing between PyTorch and TensorFlow isn’t about finding the “better” framework – it’s about finding the right fit for your project. Both power cutting-edge AI systems, but they excel in different domains. PyTorch dominates research and experimentation, while TensorFlow leads in production deployment at scale.

The frameworks have evolved significantly since their early days, each building tools and capabilities to support research and production. Despite these improvements, fundamental differences remain in their philosophies, ecosystems, and ideal use cases, which will naturally influence which framework will best fit your project.

This guide examines where each framework shines, compares them across key dimensions, and helps you choose the right tool for your natural language processing, computer vision, and reinforcement learning projects.

What sets PyTorch and TensorFlow apart?

PyTorch and TensorFlow took different approaches from day one. Google launched TensorFlow in 2015, focusing on production deployment and enterprise scalability. Meta released PyTorch in 2016, prioritizing research flexibility and Pythonic development. These roots still shape each framework today.

The key difference between the two lies in computational graphs. PyTorch uses dynamic graphs that execute operations immediately, making debugging natural – you use standard Python tools and inspect tensors at any point. TensorFlow originally required static graphs defined before execution, though version 2.x now defaults to eager execution while retaining optional graph compilation for performance.

Market data shows TensorFlow holds a 37% market share, while PyTorch commands 25%. But the research tells a different story: PyTorch powers 85% of deep learning papers presented at top AI conferences.

PyTorch: Strengths and weaknesses

PyTorch’s Pythonic API treats models like regular Python code, making development feel intuitive from the start. The framework’s dynamic computational graphs execute operations immediately rather than requiring upfront model definition, fundamentally changing how you approach debugging and experimentation.

This design philosophy has made PyTorch the dominant choice in research, where flexibility matters more than deployment infrastructure. However, this research-first design means production deployment tools remain less mature than TensorFlow’s enterprise infrastructure.

PyTorch strengths

  • Intuitive, Pythonic API: Models use standard Python syntax with minimal framework-specific concepts, reducing the learning curve dramatically compared to other frameworks.
  • Dynamic graphs enable natural debugging: Set breakpoints in training loops, inspect tensor values mid-execution, and modify architectures on the fly using tools you already know.
  • Priority access to the latest techniques: Because of its research dominance, when cutting-edge architectures or methods emerge, they’re implemented in PyTorch before anywhere else.
  • Strong ecosystem: Libraries like PyTorch Lightning handle training loops and best practices automatically, letting you focus on model architecture.

PyTorch weaknesses

  • Production deployment tools are less mature: Deployment options lag behind TensorFlow’s battle-tested infrastructure, so you need to do more setup work for production systems.
  • Mobile and edge deployment is limited: PyTorch Mobile is functional but less polished than TensorFlow Lite for smartphones and IoT devices.
  • Dynamic nature complicates optimization: The flexibility that aids development can make optimization for production performance harder without additional tools like TorchScript.
  • Smaller enterprise adoption: Fewer production patterns and case studies compared to TensorFlow’s extensive enterprise documentation.

TensorFlow: Strengths and weaknesses

TensorFlow’s production ecosystem provides you with a comprehensive infrastructure for deploying models at scale. Google built the framework specifically for enterprise environments where reliability, performance, and deployment flexibility matter most.

This production-first approach created mature tooling for serving, mobile optimization, and MLOps that PyTorch is still catching up to. The trade-off comes in development experience – TensorFlow’s API can feel more complex and less intuitive than PyTorch’s streamlined approach.

TensorFlow strengths

  • Mature production deployment tools: Battle-tested infrastructure with TensorFlow Serving for high-throughput serving, TensorFlow Lite for mobile, and TensorFlow.js for browsers.
  • Superior mobile and edge optimization: TensorFlow Lite delivers industry-standard performance and comprehensive device support for smartphones and edge devices.
  • Strong enterprise adoption: Proven production patterns used by thousands of companies, with extensive documentation for scaling systems serving millions of predictions.
  • Comprehensive MLOps tooling: TensorFlow Extended (TFX) gives you end-to-end pipelines for production ML workflows, from data validation through model monitoring.
  • TPU support for large-scale training: Access to Google’s specialized Tensor Processing Units for training at massive scale with performance advantages over GPU infrastructure.

TensorFlow weaknesses

  • Steeper learning curve: More complexity when implementing custom models or debugging issues, even with Keras integration simplifying high-level operations.
  • More verbose code for custom work: Novel architectures or training procedures require significantly more code compared to PyTorch’s streamlined approach.
  • Larger, less cohesive API: Broader API surface with multiple ways to accomplish the same task creates confusion and longer learning curves.
  • Debugging can be challenging: Graph-related issues may require you to understand TensorFlow’s internal execution model despite eager execution improvements.
  • Slower adoption of research techniques: New methods from research papers typically take longer to appear in TensorFlow compared to PyTorch.

If you’re new to TensorFlow and want a hands-on starting point, check out How to Train Your First TensorFlow Model in PyCharm, where you’ll build and train a simple model step by step using Keras and visualize the results.

PyTorch vs. TensorFlow: Head-to-head comparison

Choosing between PyTorch and TensorFlow isn’t always straightforward, and there are many factors to consider. 

The table below provides a high-level head-to-head comparison of PyTorch and TensorFlow so you can quickly assess which framework generally fits your needs. We’ll later consider project-specific scenarios and provide a detailed decision matrix to guide your choice.

Dimension PyTorch TensorFlow
Learning curve Easier: Pythonic and intuitive Steeper: more complex API despite Keras
Debugging Excellent: standard Python tools work naturally Good: improved with eager execution
Production deployment Improving: TorchServe and TorchScript available Excellent: mature ecosystem (Serving, Lite, JS)
Research/experimentation Dominant: 85% of deep‑learning research papers Present: but trailing PyTorch in adoption
Community ecosystem Research-focused: Hugging Face, PyTorch Lightning Enterprise-focused: TFX, strong cloud integration
Performance at scale Strong: DDP for distributed training Strong: graph optimization, TPU support
Industry adoption Growing: used by 15,800+ companies Established: used by more than 23,000 companies

PyTorch vs. TensorFlow for different use cases and applications 

Your framework choice depends heavily on what you’re building. Here’s how PyTorch and TensorFlow stack up for major machine learning domains.

Natural language processing

PyTorch dominates NLP with no signs of slowing. The Hugging Face Transformers library – the de facto standard for working with language models – started as a PyTorch-only framework and later added TensorFlow support as a secondary option. When you’re fine-tuning transformers, implementing custom attention mechanisms, or experimenting with novel architectures, PyTorch’s flexibility accelerates your iteration.

Verdict: PyTorch leads NLP decisively. Choose TensorFlow only if you have specific mobile deployment requirements that override all other considerations.

Computer vision

Computer vision presents a more balanced landscape for your projects. PyTorch benefits from research momentum – when you’re developing novel detection algorithms or experimenting with architectures, you’ll find state-of-the-art implementations appear in PyTorch first. TensorFlow excels for building production CV systems, especially for mobile object detection or on-device image classification, where TensorFlow Lite’s optimization matters most.

For a hands-on example, watch this video on how to build a TensorFlow object detection app to see how to take a pre-trained model and turn it into a real-time object detection app running on a robot in PyCharm:

Verdict: Use case dependent. Choose PyTorch for research and novel architectures, TensorFlow when your deployment priorities favor mobile and edge devices.

Reinforcement learning

PyTorch holds a slight edge in reinforcement learning, driven by the research community’s preference for it. When you’re implementing custom RL algorithms, modifying reward functions dynamically, or debugging agent behavior, PyTorch’s flexibility serves you better. TensorFlow offers solid capabilities through TF-Agents for production RL systems at scale.

Verdict: Choose PyTorch for RL research and experimentation or TensorFlow for building large-scale production-grade RL systems like recommendation engines.

Tooling and developer experience in PyCharm

PyCharm provides comprehensive support for both frameworks, streamlining your development workflow regardless of which you choose.

  • Debugging: Set breakpoints in training loops, inspect tensor values, and step through model forward passes using the integrated debugger that works naturally with PyTorch’s dynamic graphs and TensorFlow’s eager execution.
  • Jupyter notebook support: Prototype in notebooks, inspect data transformations visually, then move to scripts for production training with seamless integration.
  • Package management: Handle complex dependency trees and CUDA requirements using virtual environment management to prevent conflicts between frameworks.
  • Remote interpreters: Connect to remote GPU servers, develop locally while training remotely, and sync code automatically to take advantage of powerful hardware without leaving your IDE.
  • TensorBoard integration: Track training metrics, visualize model graphs, and compare experiments within PyCharm using native TensorFlow support or torch.utils.tensorboard for PyTorch.
  • Code completion: Get framework-specific suggestions for layer definitions, optimizer configurations, and data pipeline operations that reduce errors and accelerate development.

Performance, scalability, and deployment

Training performance barely differs between frameworks for most workloads – both handle GPU training efficiently with comparable speeds. TensorFlow gains an edge when you need TPU support for large-scale training, offering more mature integration with Google’s specialized hardware. For multi-GPU scaling, both deliver strong performance with PyTorch’s DDP and TensorFlow’s MirroredStrategy.

Deployment scenarios differentiate the frameworks more clearly. TensorFlow Serving handles production model serving at scale with built-in versioning and A/B testing that PyTorch’s TorchServe can’t yet match in maturity. When deploying to mobile devices or edge hardware, TensorFlow Lite provides industry-standard optimization through quantization and pruning. For browser deployment, TensorFlow.js offers more integrated, optimized inference compared to serving PyTorch models via ONNX Runtime.

Memory management affects development experience – PyTorch’s caching allocator handles GPU memory efficiently with dynamic batch sizes, causing fewer surprises when experimenting with different model configurations.

Community, ecosystem, and library support

PyTorch’s research dominance created a vibrant, innovation-focused community that accelerates development. The PyTorch Conference 2024 saw triple the registrations versus 2023, and when cutting-edge techniques emerge, they appear in PyTorch first. The Hugging Face ecosystem amplifies this advantage – more than 220,000 PyTorch-compatible models versus around 15,000 for TensorFlow makes a tangible difference in development speed.

TensorFlow’s community skews toward production engineering, providing comprehensive enterprise-grade documentation and proven deployment patterns. Google’s backing ensures strong cloud platform integrations, particularly with Google Cloud, offering managed services that reduce operational complexity. The Model Garden provides production-ready implementations optimized for deployment rather than research experimentation.

Learning resources reflect these different audiences – PyTorch tutorials emphasize research workflows and novel implementations, while TensorFlow documentation prioritizes production deployment patterns and enterprise-scale systems.

Choosing the right framework for your project

Many successful teams use both frameworks strategically – researching and experimenting in PyTorch, then deploying in TensorFlow. The frameworks aren’t mutually exclusive. You can use ONNX to enable model conversion between them when needed.

When making a choice, it helps to prioritize factors most relevant to your project: Mobile deployment requirements may override other considerations, research-heavy work might make PyTorch essential, and enterprise support with MLOps integration could tip the scales toward TensorFlow. 

Use the table below to match your project requirements with the framework strengths. 

Decision Factor PyTorch TensorFlow
By use case
Natural language processing ✅ NLP standard choice Only if mobile deployment is critical
Computer vision ✅ Research/novel architectures ✅ Production mobile/edge apps
Reinforcement learning ✅ Research and experimentation ✅ Large-scale production RL
By experience level
Beginner ✅ More intuitive API Keras simplifies learning
Intermediate/Advanced ✅ Research and prototyping ✅ Production systems at scale
By project phase
Research/Experimentation ✅ Dynamic graphs aid iteration Graph compilation for optimization
Rapid prototyping ✅ Fast experimentation Keras for simple models
Production deployment TorchServe improving ✅ Mature deployment tools
By deployment target
Cloud/Server Strong performance ✅ Strong performance, slight GCP advantage
Mobile/Edge devices Basic support via PyTorch Mobile ✅ TensorFlow Lite industry standard
Web Applications Via ONNX Runtime ✅ TensorFlow.js optimized
By team context
Research-focused team ✅ Natural fit for researchers If already using TensorFlow
Production-focused team If comfortable with tooling ✅ Proven enterprise patterns