What Are The Security Risks of CI/CD Plugin Architectures?

CI/CD pipelines are deeply embedded in modern software delivery. They interact with source code, secrets, cloud credentials, and production deployment targets. 

That position makes them an attractive target for attackers, and the plugin ecosystems that power many CI/CD platforms are an increasingly common point of entry.

This article explains how plugin-centric CI/CD architectures create security risk, what the vulnerability data actually shows, and how integrated platforms handle these risks differently. 

We’ll also be direct about TeamCity’s own security history, because we think that context matters when a CI/CD vendor writes about security.

What is a plugin-centric CI/CD architecture?

A plugin-centric CI/CD architecture is one where core platform functionality (integrations, triggers, build steps, notifications, and so on) is delivered through independently developed and maintained plugins rather than built into the platform itself.

Jenkins is the most widely used example. The Jenkins ecosystem includes thousands of community plugins, each maintained separately, with its own release cycle, security practices, and maintenance status.

This model offers significant flexibility. It’s also what introduces a specific class of security risk.

What are the security risks of CI/CD plugins?

When you rely on plugin-centric CI/CD architecture, you run the risk of introducing any of these systemic weaknesses:

  • Decentralized development: Community-driven plugin development can result in inconsistent security standards and delayed patching of vulnerabilities. Simply put, you’re not in control of the plugin developer’s coding or security practices.
  • Plugin abandonware: Some plugins may no longer be maintained, leaving known vulnerabilities unaddressed.
  • Opaque dependencies: Complex interdependencies between plugins can create hidden attack surfaces that are difficult to monitor and secure.
  • Excessive permissions: Plugins often require broad permissions, which increases the potential impact of a compromised or vulnerable plugin.

These weaknesses amplify the risk of security breaches and complicate efforts to maintain a secure CI/CD environment.

How many security vulnerabilities do Jenkins plugins have?

In 2025 alone, more than seventy security vulnerabilities have been found in Jenkins, most of them related to plugins. These range from CVE-2025-31722 in the Templating Engine plugin, which allows potential remote code execution due to insufficient sandboxing, to CVE-2025-53652 in the Git Parameter plugin, where misconfigured parameters can be abused for command injection.

Many of these vulnerabilities remain unpatched in live environments long after fixes are available. Last year, the Shadowserver Foundation detected over forty-five thousand internet-exposed Jenkins servers still vulnerable to CVE-2024-23897, indicating that attackers actively scan for and attempt to exploit outdated instances.

In some cases, the fallout has already been severe; the cause of the BORN Group’s supply chain compromise in 2022 was a vulnerable Jenkins plugin.

Has a CI/CD plugin vulnerability ever caused a real breach?

Unfortunately, yes. The 2022 BORN Group supply chain compromise was traced back to a vulnerable Jenkins plugin. Attackers were able to use the plugin as an entry point into the broader build environment.

This incident illustrates a risk pattern that’s also present in other dependency ecosystems like npm and PyPI: a compromised or abandoned plugin that’s automatically trusted and updated by a pipeline can silently inject malicious code into builds before anyone detects it. 

CI/CD plugins sit in a particularly sensitive position because the pipeline has direct access to repositories, secrets, and deployment targets.

What is CI/CD supply chain risk?

CI/CD supply chain risk refers to the possibility that a component in your build and delivery pipeline (a plugin, a dependency, a build image) is compromised in a way that affects the software that you ship to customers.

In plugin-heavy CI/CD environments, this risk is elevated because:

  • Many plugins are maintained outside formal security oversight
  • Abandoned projects can be quietly taken over by malicious actors
  • Pipelines often automatically apply plugin updates without review
  • The CI/CD system’s privileged access means a compromised plugin can affect everything downstream

The scale of CI/CD supply chain incidents is typically smaller than high-profile npm or PyPI cases, but the access that CI/CD systems have to production infrastructure makes the potential impact significant.

CI/CD supply chain risk refers to the possibility that a component in your build and delivery pipeline is compromised in a way that affects the software that you ship to customers.

How do CI/CD plugin vulnerabilities affect compliance?

If your CI/CD pipelines process or have access to personally identifiable information (which is true of most production systems) plugin security has regulatory implications.

GDPR, SOC 2, and HIPAA don’t prescribe specific CI/CD configurations, but they do require organizations to implement adequate security controls and maintain auditability over systems that handle protected data.

An unpatched plugin with known vulnerabilities, sitting inside a pipeline with access to production secrets, is a reasonable finding in a security audit.

Compliance teams and legal counsel are increasingly aware of CI/CD as a risk surface. It’s no longer a concern that can stay entirely within the engineering team.

How do integrated CI/CD platforms handle plugin security differently?

Integrated CI/CD platforms bundle core functionality natively rather than relying on external plugins for essential features. This changes the security model in a few specific ways:

Single vendor accountability. When a vulnerability is discovered in a core platform capability, there is one responsible party, one patch cycle, and one documented upgrade path.

You don’t need to track the release schedules of dozens of independent plugin maintainers.

Narrower external dependency surface. Fewer third-party plugins means fewer external dependencies to audit, monitor, and patch. The attack surface is smaller by design.

Native security capabilities. Secret management, access controls, and audit logging built into the platform are subject to the same security standards as the rest of the product. They don’t inherit the risk profile of a community-developed add-on.

More predictable patching. A critical vulnerability in a core platform feature gets a coordinated response. In plugin ecosystems, patch availability and adoption varies widely depending on who maintains each plugin.

Has TeamCity had security vulnerabilities?

Sadly, yes.

CVE-2024-27198 was a critical authentication bypass vulnerability in TeamCity that allowed unauthenticated remote code execution. It was rated 9.8 out of 10 on the CVSS scale and required urgent patching across all affected installations.

CVE-2023-42793 was another critical authentication bypass, also allowing remote code execution without authentication, which was actively exploited in the wild by threat actors including state-sponsored groups.

These were serious incidents. We’re not in a position to claim that integrated platforms are immune to vulnerabilities: we’re not, and our own history makes that clear. 

What we can say is that when these vulnerabilities were discovered, there was a single coordinated response, clear communication to users, and a defined upgrade path.

That’s the difference integrated platforms offer: not the absence of vulnerabilities, but a more accountable response when they occur.

How do I assess my current CI/CD platform’s security risk?

Regardless of which platform you use, these questions are worth working through periodically:

  • How many plugins are active in your pipeline? Do you have a current inventory?
  • When was each plugin last updated? Are any no longer actively maintained?
  • What permissions do your plugins have? Are they scoped to what they actually need?
  • How long does it take you to apply a critical security patch? Do you have a tested process?
  • Who is responsible for plugin security in your organization? Is there clear ownership?
  • Does your CI/CD configuration receive security review? Or only your application code?

These questions apply to any CI/CD environment. The answers tell you more about your actual risk posture than any platform comparison.

Is Jenkins insecure?

Not inherently. Jenkins is a mature, capable platform that thousands of engineering teams operate successfully and securely.

The security risks associated with Jenkins are largely a function of scale and the plugin model. When you have thousands of community plugins with varying maintenance quality, vulnerabilities are statistically inevitable.

Teams running Jenkins securely tend to do a few things consistently: they maintain a minimal plugin footprint, audit plugins regularly, apply patches promptly, and treat CI/CD configuration with the same rigor as application code. The operational discipline required is higher, but it’s achievable. 

The question isn’t whether Jenkins can be run securely. It’s whether your team has the capacity and processes to do so at your current scale.

When does it make sense to consider switching CI/CD platforms?

While the extensibility of many CI/CD platforms is convenient, it can create hidden vulnerabilities that disrupt operations, compromise sensitive data, or expose the organization to regulatory scrutiny. 

These risks affect IT teams, yes, but they also impact partners, customers, and the organization’s reputation. Leaders who understand these potential exposures can proactively reduce security risks, prevent disruptions, and ensure that CI/CD processes support both reliable operations and strategic growth.

An integrated CI/CD platform can help mitigate these risks. They reduce reliance on third-party plugins, provide native security and compliance capabilities, and offer vendor-managed updates with predictable patch cycles.

Switching CI/CD platforms is a significant undertaking and shouldn’t be driven by vendor comparisons alone. It makes sense to evaluate alternatives when:

  • Your team is spending disproportionate time managing plugin updates and compatibility issues.
  • You’ve had a security incident or near-miss traced to a plugin.
  • Compliance or audit requirements are creating friction your current setup can’t easily address.
  • Your plugin footprint has grown to the point where it’s difficult to audit or maintain.
  • The operational overhead of maintaining your current platform is affecting delivery velocity.

If none of these apply, the case for switching is much weaker than any vendor will tell you.

Summary: what to take away from this

  • Plugin-centric CI/CD architectures introduce structural security risks that are worth understanding clearly, regardless of which platform you use.
  • Jenkins plugin vulnerabilities are frequent and well-documented; the patching gap between fix availability and actual deployment is a real operational challenge.
  • CI/CD supply chain risk is real and follows the same patterns seen in other dependency ecosystems.
  • Integrated platforms offer a different risk profile: not zero risk, but clearer accountability and more predictable patching
  • TeamCity has had critical vulnerabilities of its own, including two severe authentication bypass issues in 2023 and 2024. 
  • The most important security variable is usually your team’s processes and discipline, not your platform choice.

Java Annotated Monthly – March 2026

A lot is happening in tech and beyond, and as we step into March, we have pulled together a fresh batch of articles, thought pieces, and videos to help you learn, connect, and see things from new angles. 

This edition shines with Holly Cummins, whose sharp voice and sharp finds on Java bring both insight and inspiration. 

We are also excited to feature the premiere of IntelliJ IDEA — The IDE That Changed Java Forever. From a tiny team of visionary engineers to a global product powering millions, JetBrains didn’t just build an IDE, it redefined what developer tools could be.
The documentary is now available on the CultRepo YouTube channel.

IntelliJ IDEA, The Documentary

Featured Content

Holly Cummins

Holly Cummins is a Senior Technical Staff Member on the IBM Quarkus team and a Java Champion. Over her career, Holly has been a full-stack JavaScript developer, a build architect, a client-facing consultant, a JVM performance engineer, and an innovation leader. Holly has led projects to understand climate risks, count fish, help a blind athlete run ultra-marathons in the desert solo, and invent stories (although not at all the same time). She gets worked up about sustainability, technical empathy, extreme programming, the importance of proper testing, and automating everything. You can find her at http://hollycummins.com or follow her on socials at @holly_cummins.

Hello, Java-Monthly-ers! This month, Java Marches On (see what I did there?). The cherry trees are blooming, the daffodils are emerging, and there’s so much new Java stuff to play with. This time of year also means conference season, so part of me is excited, and part of me is cursing past-me for being over-optimistic about how much I can synthesise. I’ve got three talks in three days in the middle of March, and all of them are new talks, on semi-unfamiliar topics. Still, it’s good to learn and try new things, right?

Right now, I’m impressed by how many new things Java is trying. If you want to be picky, Java is an inanimate platform and can’t actually try things. But grammar is for parsers, right? Loads of new things are appearing in the Java runtime itself, and even more new things are popping up in the Java ecosystem.

I enjoyed exploring java.evolved as a way of reminding myself how much the Java language has been improving. Most of the new patterns were familiar, but some of them I didn’t know, so it was good learning, too. However, for me, some of the most exciting Java innovations aren’t about syntax, but performance.

I care a lot about sustainability, and that means I care about performance by default. A few years ago, GraalVM knocked everyone’s socks off by showing how a Java application could be compiled to binary and start faster than a lightbulb. But how fast can a Java application start while still being a Java application? The promise of Project Leyden is to allow a sort of sliding scale of do-up-front-ness, while always allowing a fallback to the dynamic Java that we love. The Quarkus team has been experimenting with Leyden and has started to write about it. My colleague Guillaume wrote a fantastic blog post digging deep into some of the optimisations Quarkus was able to make to fully leverage Leyden (spoiler: sub-100 ms start time for a pure-Java application).

Java’s fast and getting faster, but it’s also versatile. Project Babylon is allowing Java to take advantage of GPUs and run machine learning models (with a little help from some FFM friends). Chicory allows the JVM to run WebAssembly, and since almost any language can be compiled to WASM, the JVM can run almost anything (yes, that means JavaScript on the JVM, and C on the JVM, and …).

What about the front end? The ecosystem for Java UIs hasn’t had all that much excitement for a while (like… a decade). But I predict a back-to-the-future moment. The terminal is back, but this time it’s got CSS, pictures, forms, and animations… and Java has joined the party. TamboUI is a Terminal UI framework for Java that enables interactive, pretty terminal-based applications. The demo trailer is pretty eye-popping. After I wrote this, I spotted Awesome Java UI, a catalog of Java UI frameworks which seemed specifically designed to prove me wrong when I said the Java UI space wasn’t where the energy was. I’ll admit that my statement was a bit sweeping, but I also notice that many of the new projects in the awesome-java list are command-line-oriented, like TamboUI, JLine, and Æsh.

And with that, I’d better get back to writing about Commonhaus, Developer Joy, trade-offs, knockers-up, and interest rates. You’ll be able to see what I end up with (and a preview of upcoming talks) on my website.

Java News

Fresh Java news, hot off the press, so you stay sharp, fast, and one step ahead:

  • Java News Roundup 1, 2, 3, 4
  • LazyConstants in JDK 26 – Inside Java Newscast #106
  • Quality Outreach Heads-Up – JDK 26: DecimalFormat Uses the Double.toString(double) Algorithm
  • Quality Outreach Heads-Up – JDK 27: Removal of ThreadPoolExecutor.finalize()
  • JEP targeted to JDK 27: 527: Post-Quantum Hybrid Key Exchange for TLS 1.3
  • Episode 45 “Announcement – The New Inside Java Podcast”
  • JDK 26 Release Candidate | JavaOne and More Heads-Up
  • Towards Better Checked Exceptions – Inside Java Newscast #107
  • JDK 26 and JDK 27: What We Know So Far
  • Episode 46 “Java’s Plans for 2026”

Java Tutorials and Tips

 Dive in and level up your Java game:

  • 25 Years of IntelliJ IDEA: The IDE That Grew Up With Java (#91)
  • Level Up Your LangChain4j Apps for Production
  • Carrier Classes and Carrier Interfaces Proposed to Extend Java Records
  • Bringing Java Closer to Education: A Community-Driven Initiative
  • Local Variable Type Inference in Java: Friend or Foe?
  • Optimizing Java Class Metadata in Project Valhalla
  • Bootstrapping a Java File System
  • Reactive Java With Project Reactor
  • Feedback on Checked Exceptions and Lambdas 
  • A Bootiful Podcast: Java Champion and Hilarious Friend, Richard Fichtner
  • A Bootiful Podcast: Java Developer Advocate Billy Korando on the Latest-and-Greatest in the Java Ecosystem
  • Inside Java Podcast Episode 44 “Java, Collections & Generics, BeJUG”
  • Foojay Podcast #90: Highlights of the Java Features Between LTS 21 and 25
  • What 2,000+ Professionals Told Us About the State of Java, AI, Cloud Costs, and the Future of the Java Ecosystem
  • Ports and Adapters in Java: Keeping Your Core Clean
  • Episode 47 “Carrier Classes” [IJN]
  • The Java Developer’s Roadmap for 2026: From First Program to Production-Ready Professional

Kotlin Corner

Learn the news and pick up a few neat tricks to help you write cleaner Kotlin:

  • Compose Multiplatform 1.9.0 Released 
  • 15 Things To Do Before, During, and After KotlinConf’26 
  • Java to Kotlin Conversion Comes to Visual Studio Code 
  • Koog x ACP: Connect an Agent to Your IDE and More 
  • New tutorial: AI-Powered Applications With Kotlin and Spring AI 
  • klibs.io – the search application for Kotlin Multiplatform libraries is published to GitHub https://github.com/JetBrains/klibs-io
  • Intro to Kotlin’s Flow API
  • Explicit Backing Fields in Kotlin 2.3 – What You Need to Know 
  • Qodana for Android: Increasing Code Quality for Kotlin-First Teams

AI 

Explore what’s possible with smart tools, real use cases, and practical tips on AI:

  • Why Most Machine Learning Projects Fail to Reach Production
  • Anthropic Agent Skills Support in Spring AI 
  • Code. Check. Commit. 🚀 Never Leave the Terminal With Claude Code + SonarQube MCP
  • Let the AI Debug It: JFR Analysis Over MCP
  • Researching Topics in the Age of AI – Rock-Solid Webhooks Case Study
  • Safe Coding Agents in IntelliJ IDEA With Docker Sandboxes
  • Latest Gemini and Nano Banana Enhancements in LangChain4j
  • Spring AI Agentic Patterns (Part 5): Building Interoperable Agent Systems With A2A Integration 
  • From Prompts to Production: A Playbook for Agentic Development
  • The Craft of Software Architecture in the Age of AI Tools
  • Beyond Code: How Engineers Need to Evolve in the AI Era
  • 🌊 Windsurf AI + Sonar: The Agentic Dream Team for Java Devs 🚀
  • Enabling AI Agents to Use a Real Debugger Instead of Logging
  • Runtime Code Analysis in the Age of Vibe Coding
  • Context Engineering for Coding Agents 
  • A Language For Agents 
  • Easy Agent Skills With Spring AI and the New Skillsjars Project!

Languages, Frameworks, Libraries, and Technologies

Discover what’s new in the tools and technologies shaping your stack today:

  • This Week in Spring 1, 2, 3, 4
  • How to Integrate Gemini CLI With IntelliJ IDEA Using ACP
  • A Bootiful Podcast: JetBrains and Spring community Legend Marco Behler
  • Getting Feedback From Test-Driven Development and Testing in Production
  • Kubernetes Drives AI Expansion as Cultural Shift Becomes Critical
  • MongoDB Sharding: What to Know Before You Shard
  • The Shai-Hulud Cyber Worm and More Thoughts on Supply Chain Attacks 
  • Redacting Data From Heap Dumps via hprof-redact – Mostly Nerdless

Conferences and Events

Plan your trips or schedule online presence for the following events:

  • Devnexus – Atlanta, USA, March 4–6; Anton Arhipov will speak about Debugging with IntelliJ IDEA and Database Migration Tools.
  • JavaLand – Rust, Germany, March 10–12; Marit van Dijk is presenting her famous talk on being more productive with IntelliJ IDEA.
  • JavaOne – Redwood City, USA, March 17–19; Anton Arhipov and Arun Gupta will be the event, come and meet them.
  • Voxxed Days Zurich – Zurich, Switzerland, March 24; Marit van Dijk is the speaker.
  • Voxxed Days Bucharest – Bucharest, Romania, March 26–27
  • Voxxed Days Amsterdam – Amsterdam, the Netherlands, April 1-2; Meet the JetBrains people there – Anton Arhipov, Marit van Dijk, and Rachel Appel.

Culture and Community

Join the conversation full of stories, voices, and ideas that bring developers together:

  • How to Be Remarkable
  • So, You ’10x’d’ Your Work… 
  • How I Estimate Work as a Staff Software Engineer 
  • Get Specific!

And Finally…

The most recent IntelliJ IDEA news and updates are here:

  • Wayland By Default in 2026.1 EAP
  • Editor Improvements: Smooth Caret Animation and New Selection Behavior
  • Migrating to Modular Monolith Using Spring Modulith and IntelliJ IDEA

That’s it for today! We’re always collecting ideas for the next Java Annotated Monthly – send us your suggestions via email or X by March 20. Don’t forget to check out our archive of past JAM issues for any articles you might have missed!

ReSharper for Visual Studio Code, Cursor, and Compatible Editors Is Out

ReSharper has been a trusted productivity tool for C# developers in Visual Studio for over 20 years. Today, we’re taking the next step and officially releasing the ReSharper extension for Visual Studio Code and compatible editors.

After a year in Public Preview, ReSharper has been refined to bring its C# code analysis and productivity features to developers who prefer VS Code and other editors – including AI-first coding environments like Cursor and Google Antigravity.

Whether you’re coming from ReSharper in Microsoft Visual Studio, JetBrains Rider, or you’re a VS Code C# developer, the goal is the same – to help you write, navigate, and maintain C# code with confidence and ease.

Why ReSharper for VS Code and compatible editors

ReSharper brings JetBrains’ decades-long C# expertise into lightweight, flexible editor workflows to elevate your code quality.

What it’s designed for:

  • Professional-grade C# code quality
    Advanced inspections, quick-fixes, refactoring, and formatting for C#, Razor, Blazor, and XAML.
  • Refining AI-generated code
    ReSharper helps review and refine AI-assisted code to make sure it meets professional standards before it ships.
  • Wide editor compatibility
    ReSharper works seamlessly across all compatible editors, meeting your needs wherever you code.
  • Proven JetBrains expertise
    Built on over two decades of experience developing .NET tooling used by teams worldwide.
  • Free for non-commercial use
    Available at no cost for learning, hobby projects, and non-commercial development.

Availability

ReSharper is available from:

  • Visual Studio Code Marketplace
  • Open VSX Registry (for Cursor, Google Antigravity, Windsurf, and other compatible editors)

How to install ReSharper

You can install the extension via the Extensions view:

  1. Open Visual Studio Code or another compatible editor.
  2. Go to the Extensions view.
  3. Search for ReSharper.
  4. Click Install.

You can also install the extension via the Command Palette:

  1. Open Visual Studio Code or another compatible editor.
  2. Open the Command Palette (Ctrl+P / Cmd+P).
  3. Paste: ext install JetBrains.resharper-code
  4. After pasting the command, press Enter, and ReSharper will be installed automatically.

Key features at a glance

ReSharper focuses on the core workflows C# developers use daily.

  • Insightful code analysis
    Real-time inspections and quick-fixes help keep your code readable, maintainable, and consistent across projects.
  • Smart coding assistance
    Context-aware code completion, auto-imports, live templates, and inline documentation go way beyond the standard capabilities of a code editor.
  • Solution Explorer
    A central hub for managing files, folders, NuGet packages, source generators, and projects across a solution – just like the one in JetBrains Rider or ReSharper in Microsoft Visual Studio.
  • Reliable unit testing
    Run and manage tests for NUnit, xUnit.net, and MSTest directly in VS Code or a compatible editor, with easy navigation to failing tests.
  • Refactorings you can trust
    Rename works across your solution while safely handling conflicts and references.
  • Fast navigation, including to external and decompiled sources
    Navigate to symbols, usages, files, and types across your solution. When source code isn’t available, ReSharper can decompile assemblies and take you directly to the relevant declarations.

For more information on ReSharper’s functionality, please see our Documentation.

What’s next

The next major area of focus for ReSharper for VS Code is debugging support. Based on feedback collected during the Preview, we’re actively working on support for launching debugging sessions and attaching to processes in .NET and .NET Framework applications.

Beyond debugging, our roadmap includes continued quality improvements and expanding the set of available refactorings.

We’ll be listening closely to your feedback as we define the next priorities. If there’s something that would make ReSharper indispensable in your workflow, we’d love to hear from you.

Licensing

ReSharper for VS Code and compatible editors is available under ReSharper, dotUltimate, and All Products Pack licenses. You can review the pricing options here. 

The extension will continue to be available for free for non-commercial use, including learning and self-education, open-source contributions without earning commercial benefits, any form of content creation, and hobby development.

Get started

  1. Install ReSharper.
  2. Open a workspace/folder in VS Code, Cursor, or another compatible editor.
  3. ReSharper will automatically detect any .sln/.slnx/.slnf (solution) files or a csproj file in the folder:
  • If only one solution is found, it will open automatically.
  • If multiple solutions are found, click the Open Solution button in a pop-up menu to choose which one to open.

If you encounter any issues, have feedback to share, or additional features to request, you can do so by creating a ticket here.

What production-ready AI agent systems look like

What production-ready AI agent systems look like

Many discussions about open source AI agents start with the same image: a single assistant responding to prompts. That model works well for demonstrations, but it breaks down quickly in production.

One of our speakers for the AI track at OCX 26, Luca Bianchi, explained in an interview, “a production system generally uses a lot of models, not just one model doing back and forth with the user.” Once systems move beyond experimentation, even common patterns such as retrieval-augmented generation become multi-stage pipelines rather than simple request–response flows.

In theory, this looks straightforward. Knowledge is embedded, queries are encoded, and results are retrieved based on semantic distance. 

In practice, production constraints surface immediately. Luca described how, in a real system, similarity scores should have ranged from 0 to 1, but instead clustered between 0.5 and 0.7, making it difficult to distinguish results. Solving that problem required additional steps: re-ranking, metadata-based filtering, query rewriting, and selective composition.

Each of those steps introduces another model into the system. Luca mentioned, “You end up building a complex pipeline of many different models, and this is just for RAG (Retrieval-Augmented Generation).” When teams move further into agentic architectures, orchestration becomes unavoidable. A controlling agent must route requests to sub-agents, each of which invokes its own workflows and models. In production environments, these agentic pipelines place very different demands on latency, cost, and orchestration than a single assistant responding to prompts.

This is where the limits of the “single assistant” model become clear. Latency and cost compound across the pipeline. If each model in the chain takes tens of seconds to respond, the complete pipeline is going to take a very long time.

At that point, system design is no longer about prompts or raw model capability. It is about how pipelines are structured, how responsibilities are split across models, and how orchestration is handled. Production-ready AI agents are not assistants. They are pipelines and their success depends on engineering decisions made early.

In this session at OCX 26 in Brussels, Luca Bianchi will break down how real production-ready AI agent ecosystems are designed, using concrete examples of multi-model pipelines and orchestration. Attendees will gain a practical understanding of how agentic systems evolve beyond single assistants and of the architectural decisions that determine whether those pipelines remain usable at scale. 

 

Image
OCX

Daniela Nastase


Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy

Ever wondered how businesses know if customers are happy or not? In this project, I built a machine learning model that classifies Amazon product reviews as Positive or Negative using NLP techniques. Here’s how I did it.

  1. The Dataset
    I used the Amazon Review Polarity Dataset — sampling 200,000 reviews for training and 50,000 for testing. The dataset was perfectly balanced between positive and negative reviews, which is ideal for classification.

  2. Cleaning the Text
    Raw reviews are messy. I wrote a preprocessing function to lowercase text, strip punctuation, numbers, and remove stopwords using NLTK. This is really helpful for the model to identify words properly.

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r"[^ws]", "", text)
    text = re.sub(r"d+", "", text)
    words = [word for word in text.split() if word not in stop_words]
    return " ".join(words)
  1. Converting Text to Numbers with TF-IDF
    Machine learning models need numbers, not words. TF-IDF weighs words by how unique they are to each review — common words like “the” get ignored, meaningful words like “terrible” get prioritised.
vectorizer = TfidfVectorizer(max_features=5000, min_df=5, max_df=0.9)
X_train = vectorizer.fit_transform(train_df["clean_text"])
X_test = vectorizer.transform(test_df["clean_text"])
  1. Training & Comparing Models
    I trained and compared three models — Logistic Regression , Naive Bayes, and Linear SVM. Logistic Regression performed best and was used for the final evaluation.

  2. Results
    Tested on 50,000 reviews:
    Metric Negative – Positive
    Precision = 0.89 – 0.88
    Recall = 0.88 – 0.89
    F1-Score = 0.88 – 0.89
    Overall Accuracy: 88% — balanced performance across both classes.
    Classification Report

  3. Real-Time Predictions

Model identifying positive and negative reviews

def predict_sentiment(text):
    cleaned = clean_text(text)
    vectorized = vectorizer.transform([cleaned])
    prediction = model.predict(vectorized)[0]
    return "Positive" if prediction == 1 else "Negative"

“This product is amazing!” -> Positive
“Completely useless, waste of money” -> Negative

  1. Visualizations
    Three charts helped tell the story:

Sentiment distribution — confirmed the dataset was balanced
Class distribution, Positive class and Negative class

Word cloud — top positive words: great, love, best
Diplayed the top positive words

Confusion matrix — symmetric errors, no class bias
Diagonal and Off diagonal, Showing the TN,FN,TP,FP

What I Learned
Working at this scale (250k reviews) taught me that clean data and a balanced dataset matter more than model complexity. Logistic Regression beat fancier approaches simply because the data was well prepared.
Next steps: hyperparameter tuning, cross-validation, and eventually a BERT-based model for higher accuracy.
Full code on my GitHub — feel free to clone and try it on your own dataset!

Found this helpful? Drop a like or leave a comment below!

Automated Code Review: Benefits, Tools & Implementation (2026 Guide)

Code review has become the single biggest bottleneck in modern software development. As AI coding tools accelerate generation, with 41% of all code now AI-assisted, review queues have ballooned, creating a paradox where individual developer speed rises while organizational throughput stalls or declines. The DORA 2024 report found that a 25% increase in AI tool adoption correlated with a 7.2% decrease in delivery stability, largely because AI enables larger changesets that overwhelm review capacity.

This guide walks you through the three levels of automated code review. From basic linting through Static Analysis to AI-powered semantic analysis, you will see how to implement a system that turns review from a bottleneck into a competitive advantage.

The stakes are real. Research consistently shows that a bug caught in production costs 10x more than one found during design, with some estimates putting that multiplier as high as 100x. The Consortium for IT Software Quality pegs the total US cost of poor software quality at $2.41 trillion annually. Yet analysis of 730,000+ pull requests across 26,000 developers reveals that PRs sit idle for 5 out of every 7 days of cycle time. Automated code review directly attacks this gap by catching defects earlier, accelerating merge velocity, and freeing human reviewers to focus on architecture and business logic.

The AI code explosion has made review the new constraint

A 2025 Faros AI study of 10,000+ developers found that engineers using AI tools complete 21% more tasks and merge 98% more PRs, but PR review time increased by 91%. Teams that once handled 10 to 15 PRs per week now face 50 to 100. Features that take 2 hours to generate can require 4 hours to review. LinearB’s 2025 benchmark of 8.1 million PRs confirmed the pattern: AI-generated PRs wait 4.6x longer before a reviewer picks them up.

More code is entering pipelines than human reviewers can properly validate. A CodeRabbit analysis of 470 GitHub PRs found AI-generated code produces 1.7x more issues than human-written code, logic errors up 75%, security vulnerabilities up 1.5 to 2x, and performance inefficiencies appearing 8x more frequently. The Sonar 2026 State of Code survey confirmed that 96% of developers don’t fully trust AI-generated code’s functional accuracy, yet only 48% always verify it before committing.

The cycle of increasing presssure

DORA’s 2024 research identified the root cause: AI tools violate small-batch principles by enabling larger changesets that increase risk. Elite-performing teams deploy multiple times daily with sub-5% change failure rates. However, AI adoption without review automation pushes teams toward larger batches, eroding the very practices that make elite performance possible. The path forward is automating the review process itself, not just code generation.

Level 1: linting and formatting eliminate the noise

The foundation of any automated review system is deterministic tooling that enforces consistency and catches syntax-level issues before they reach human reviewers. This layer eliminates style debates entirely and ensures every PR starts from a clean baseline.

Linters analyse your code for logical errors, anti-patterns, and style violations. Rather than checking whether code runs, they encode your team’s standards as rules applied automatically on every change. Formatters handle a narrower but equally important job: they take any valid code and rewrite it into a single canonical style, making diffs cleaner and reviews faster. The two tools work in tandem, with the linter catching what you mean, and the formatter controlling how it looks.

In the JavaScript ecosystem, ESLint and Prettier are the dominant tools for these roles respectively, and both saw significant releases in early 2026. ESLint’s v10 completed a multi-year architectural overhaul, added multithreading for large codebases, and expanded beyond JavaScript to cover CSS, HTML, JSON, and Markdown. Prettier’s v3.8 introduced a Rust-powered CLI with meaningful speed improvements. Together they cover virtually every file type in a modern web project.

Implementing both via GitHub Actions is straightforward and should be the first automation any team deploys:

name: Code Quality
on: [push, pull_request]
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx eslint . --cache --max-warnings 0
      - run: npx prettier --check .

In CI, run formatters in --check mode (developers should fix issues locally) and enforce passing checks via branch protection rules. Adding ESLint caching and parallel jobs per language keeps feedback under 30 seconds, which is critical for developer adoption. Pre-commit hooks using tools like Husky and lint-staged catch issues before they even reach CI.

Level 2: SAST and security scanning catch what linters miss

Static Application Security Testing tools analyse code for vulnerabilities, complexity, and deeper quality issues that pattern-based linters cannot detect. SonarQube Server 2026.1 LTA leads this category with support for 30+ languages, advanced taint analysis tracking data flow across functions and files, and detection of OWASP Top 10 vulnerabilities including SQL injection, XSS, SSRF, command injection, and path traversal. SonarQube’s AI CodeFix feature uses LLMs to generate remediation suggestions for detected issues, while its AI Code Assurance capability automatically identifies and applies stricter quality gates to AI-generated code.

SAST tools commonly detect injection flaws (SQL injection, XSS, command injection, LDAP injection, SSRF, and XXE), data exposure issues (hardcoded secrets and credentials, sensitive data in logs, missing encryption), memory and buffer issues (buffer overflows, use-after-free, integer overflows), and input validation failures (path traversal, insecure deserialization, unvalidated redirects).

The different levels of automated review

Detection rates vary significantly. On the OWASP Benchmark, modern AI-enhanced SAST tools like Qwiet AI have achieved 100% true positive rates with 25% false positive rates, while traditional tools historically scored around 33%. SonarQube achieves false positive rates as low as 1% on mature codebases. The key advance in 2025 to 2026 has been combining SAST with LLM-based post-processing. One study showed this combination reduced false positives by 91% compared to standalone Semgrep scanning.

SonarQube’s Clean as You Code philosophy, where quality gates apply only to new code rather than the entire codebase, makes adoption practical for legacy projects. Configure gates to fail on any new blocker or critical vulnerability, while incrementally addressing existing technical debt. This approach follows a zero-noise principle: only flag issues developers can act on right now.

Level 3: AI-powered review and workflow platforms change everything

The most significant shift in 2025 to 2026 has been the emergence of AI-powered code review that understands code semantics, developer intent, and project context, moving well beyond pattern matching into genuine comprehension. This is where platforms like Graphite operate, combining AI review intelligence with workflow automation to address the full “outer loop” of development.

The AI foundation is now proven. Anthropic’s Claude model family powers multiple code review tools across the Claude Sonnet, Haiku, and Opus tiers, balancing capability, speed, and cost for different review workloads. Claude Code includes a built-in /code-review command that launches four parallel review agents, scores issues by confidence, and surfaces only findings above an 80% confidence threshold — important for managing false positives.

Graphite exemplifies the Level 3 platform approach. Following its acquisition by Cursor in December 2025 (at a valuation exceeding its previous $290M), Graphite serves 100,000+ developers across 500+ companies including Shopify, Snowflake, Figma, and Notion. Its thesis: AI tools have dramatically accelerated the “inner loop” of writing code, making the “outer loop” of review, merge, and deploy the new constraint. Graphite addresses this with four integrated capabilities.

Graphite Agent provides AI-powered PR review built on Anthropic’s Claude. Unlike general-purpose AI reviewers with a 5-15% false positive rate, it achieves a 5-8% false positive rate through multi-step validation including voting, chain-of-reasoning, and self-critique. The results are compelling: 67% of AI suggestions lead to actual code changes, and the tool maintains a 96% positive feedback rate from developers. You can define custom review rules in plain language, something like “ensure auth-service never makes direct database calls”, and Graphite Agent enforces them on every PR.

Stacked PRs directly address the batch-size problem identified by DORA. Analysis of 50,000+ PRs shows defect detection rates drop from 87% for PRs under 100 lines to just 28% for PRs over 1,000 lines. Stacking breaks large features into small, dependent PRs that build on each other. Graphite’s CLI (gt stack submit) manages the entire stack lifecycle including automatic recursive rebasing. The impact is measurable: Semgrep saw a 65% increase in code shipped per engineer after adopting stacking, while Shopify reports 33% more PRs shipped per developer.

Merge Queue is the only stack-aware merge queue available, processing dependent PRs in parallel while ensuring the main branch stays green. It supports batching multiple PRs to reduce CI costs and hot-fix prioritization for critical changes.

Customer metrics demonstrate the platform effect. Ramp achieved a 74% decrease in median time between merged PRs (from 10 hours to 3). Asana engineers shipped 21% more code and saved 7 hours per week per engineer within 30 days. Across all customers, the average Graphite user merges 26% more PRs while reducing median PR size by 8 to 11%.

Rolling out automation without overwhelming your team

The most common failure mode is deploying too many blocking checks at once, triggering alert fatigue that erodes developer trust. Research shows false positives are the number-one adoption killer for automated review tools. The solution is a progressive, trust-building rollout.

Phase 1 (Weeks 1 to 4): Foundation. Deploy ESLint and Prettier as non-blocking CI checks. Add PR size warnings for changes exceeding 400 lines. Establish baseline metrics: current cycle time, defect escape rate, and PR merge frequency. This phase should be completely frictionless — developers see suggestions but are never blocked.

Phase 2 (Weeks 5 to 10): Security gates. Introduce SonarQube or equivalent SAST scanning in advisory mode. Configure severity thresholds so only critical security findings (SQL injection, hardcoded secrets) become blocking. All other findings appear as PR comments. Begin tracking false positive rates and tune rules aggressively — a finding that never gets fixed is noise, not signal.

Phase 3 (Weeks 11 to 16): AI-powered review. Enable Graphite Agent or equivalent AI review as a non-blocking reviewer. Start with 1 to 3 volunteer teams who provide feedback on suggestion quality. Use this phase to configure custom team rules and calibrate the AI to your codebase’s conventions. The key metric to track is acceptance rate — the percentage of AI comments that result in code changes.

Phase 4 (Week 17+): Full platform. Introduce stacked PR workflows, merge queue automation, and promote AI review to soft-gate status (require acknowledgment of critical findings). Implement productivity insights to measure before/after impact.

The stages of AI code review adoption

Three principles govern successful rollouts. First, start non-blocking and graduate to blocking only after false positive rates stabilize below 5%. Second, integrate into existing workflows. Review feedback should appear as inline PR comments, not in separate dashboards. Third, measure and share wins: when developers see that automated review caught a real bug or saved them 30 minutes, adoption becomes self-reinforcing.

The cost equation favors aggressive automation

The financial case for automated code review is straightforward to model. A team processing 200 PRs monthly that saves 20 minutes of reviewer time per PR at an $80 loaded rate generates roughly $25,600 in annual savings from review efficiency alone. Blocking even 10 high-severity bugs per quarter that would have cost $5,000 each in production adds another $200,000 in avoided remediation costs. Against typical platform costs of $20,000 to $40,000 annually for a 25-person team, the total benefit of roughly $226,000 delivers an ROI of between 5:1 and 11:1 in the first year, depending on platform tier.

The deeper value is strategic, though. DORA research consistently shows that elite teams combine fast delivery with high stability, and they achieve this through small batches, automated testing, and rapid feedback loops. Automated code review is the mechanism that makes this possible at scale, especially as AI-generated code volumes continue to grow. Teams that treat review as an afterthought will face compounding technical debt: 75% of technology decision-makers are projected to face moderate-to-severe technical debt from AI-speed practices by end of 2026.

Conclusion

The automated code review landscape in 2026 has matured into a clear three-level stack.

Level 1: Linting with ESLint and Prettier. This is table stakes that every team should have deployed.
Level 2: SAST with tools like SonarQube. This catches security vulnerabilities and code smells that linters miss.
Level 3: AI-powered semantic review combined with workflow automation. This represents the frontier, and it’s where the highest-impact gains live.

Platforms like Graphite that integrate AI review, stacked PRs, and merge automation into a unified system address the full outer-loop bottleneck rather than just one piece of it. The data is clear: small PRs reviewed by AI catch 3x more defects than large PRs reviewed by humans alone, and teams using integrated automation platforms ship 20 to 65% more code while maintaining or improving quality. For engineering leaders, the question is no longer whether to automate code review, but how quickly you can reach Level 3.

🚀 Vibe Coding Tools Are Changing the Way We Build Software

A few years ago, building an app meant writing hundreds or thousands of lines of code. Today, things are different. Welcome to the world of vibe coding, where you describe what you want, and AI helps turn that idea into real, working code.

Instead of spending hours debugging or writing boilerplate, developers now collaborate with AI tools that generate, fix, and improve code instantly. It feels less like traditional programming and more like guiding the “vibe” of what you want to build.

Here are some powerful vibe-coding tools developers are loving right now:

Cursor – An AI-powered code editor that can generate files, refactor code, and even fix bugs automatically.
Replit – A cloud coding platform where you can build and deploy apps directly from your browser.
GitHub Copilot – Your AI coding partner that suggests functions and completes code as you type.
Vercel v0 – Perfect for turning UI prompts into beautiful React components instantly.
Lovable – Generate full-stack applications just by explaining your idea.
Bolt – Quickly create and test apps in the browser using AI prompts.

💡 Why vibe coding is trending:
• Faster development
• Easier prototyping
• Less repetitive coding
• Perfect for startups and indie developers

But remember, AI can help build faster, yet understanding the code still matters if you want stable and secure apps.

The future of coding isn’t just writing code…
It’s collaborating with AI to bring ideas to life faster than ever.

What do you think will make coding become the new normal for developers?

Claude Code Framework Preference Bias and Developer Marketing

Something quietly strange is happening inside AI-assisted development workflows. Claude Code—Anthropic’s agentic coding tool—doesn’t just write code. It recommends frameworks. And those recommendations aren’t always neutral.

The pattern is drawing attention from developers who’ve noticed Claude Code steering toward specific stacks in ways that feel less like engineering judgment and more like a popularity contest. Whether that’s a training artifact, a reflection of documentation quality across frameworks, or something more intentional, the implications for developer tooling decisions are worth examining carefully.

Key Takeaways

  • Claude Code’s framework recommendations show measurable bias toward well-documented frameworks like Next.js and React, likely reflecting training data distribution rather than objective technical merit.
  • Anthropic’s growing integration of Claude Code into marketing automation workflows—demonstrated across multiple 2026 community tutorials—creates a conflict of interest in how the tool surfaces recommendations.
  • Developers relying on Claude Code for stack decisions without cross-checking against framework-specific benchmarks risk optimizing for AI familiarity rather than project fit.
  • The Claude Code framework preference bias dynamic is expected to intensify as AI coding tools capture a larger share of the junior-to-mid developer workflow.
  • Framework communities with thinner documentation coverage face a structural disadvantage in AI-assisted project scaffolding, regardless of technical quality.

How We Got Here

Claude launched in March 2023. By late 2024, Anthropic had shipped Claude Code as a standalone agentic tool capable of multi-step programming tasks—not just autocomplete, but full project scaffolding, dependency selection, and architecture recommendations.

That’s a significant shift. When a developer asks Claude Code to “spin up a new web app,” the tool doesn’t just write code. It chooses. React or Vue? Express or Fastify? Supabase or PlanetScale? Each of those choices carries downstream consequences for months of development work.

The timeline matters. Claude 3.5 Sonnet (released mid-2024) demonstrated substantially improved coding benchmarks—scoring 49% on SWE-bench Verified, according to Anthropic’s published model card. Claude 3.7 Sonnet, released in February 2025, pushed further with extended thinking capabilities specifically tuned for agentic workflows. By early 2026, Claude Code had become a default scaffolding layer for a non-trivial slice of greenfield projects.

Parallel to this, Anthropic’s ecosystem partners began shipping Claude Code-powered marketing automation tools—the kind that auto-generate landing pages, email sequences, and content pipelines. Stormy AI’s agentic marketing documentation explicitly frames Claude Code as the orchestration layer for growth workflows. YouTube tutorials like “Claude Skills: Build Your First AI Marketing Team in 16 Minutes” have accumulated significant developer mindshare.

The convergence is the issue. Claude Code is simultaneously a coding tool and increasingly embedded in marketing infrastructure. That dual role creates conditions where framework bias isn’t just a technical curiosity—it’s a business vector.

The Bias Pattern: What Developers Are Seeing

The core complaint is consistent: Claude Code defaults to the same short list of frameworks regardless of project constraints. Ask it to scaffold a backend API and it reaches for Express.js or FastAPI. Ask for a frontend and it defaults to Next.js or React. Ask for a database layer and it gravitates toward PostgreSQL-backed ORMs.

None of those choices are wrong. They’re often reasonable. But “reasonable default” and “best fit for your specific project” are different things.

The mechanism behind this is almost certainly training data distribution. React has dramatically more Stack Overflow threads, GitHub repositories, and documentation pages than Svelte or SolidJS. Next.js has orders of magnitude more indexed tutorial content than Remix or Astro circa 2023–2024—when Claude’s core training likely crystallized. Claude Code’s recommendations are, at least partially, a reflection of documentation density rather than framework quality.

Think of it as search engine result bias. Not conspiracy, but structural advantage baked into the data pipeline.

The Marketing Angle: When Tooling Becomes a Channel

The framework preference bias gets sharper when you examine who benefits from these defaults.

Frameworks with enterprise backing—Vercel (Next.js), Meta (React), Microsoft (TypeScript)—have invested heavily in documentation, tutorials, and community presence. That investment translates directly into training data volume. When Claude Code defaults to Next.js, it’s partly because Vercel has spent years ensuring Next.js is the best-documented React framework on the internet.

That’s not a scandal. It’s a rational content strategy that happens to produce a feedback loop: better docs → more training data → more AI recommendations → more adoption → more investment in docs.

But developers should know that’s what’s happening. The recommendations coming out of Claude Code aren’t agnostic engineering opinions. They carry the weight of documentation investment and—increasingly—explicit commercial relationships as AI tooling integrates deeper into SaaS ecosystems.

Comparing Your Options

Criteria Claude Code GitHub Copilot Manual Research
Speed Seconds Seconds Hours
Bias Source Training data distribution Training data + telemetry Developer experience
Transparency Low Low High
Framework Coverage Broad but weighted Broad but weighted Project-specific
Update Lag Model training cycle Model training cycle Real-time
Best For Rapid scaffolding In-editor completion Strategic stack decisions

Both Claude Code and GitHub Copilot carry structural bias toward high-documentation frameworks. Manual research is slower but surfaces niche frameworks—SvelteKit for performance-critical SPAs, Hono for edge-native APIs—that AI tools consistently underweight.

The trade-off isn’t “AI bad, manual good.” It’s about knowing what each source optimizes for.

For teams shipping fast, Claude Code’s bias toward well-supported frameworks actually reduces risk. React and PostgreSQL have massive community support, which means debugging resources exist at every turn. The gravity toward popular stacks is a feature if your team prioritizes hiring pipelines and long-term maintainability over raw performance optimization.

But for specialized workloads—edge computing, WebAssembly targets, real-time systems—that same bias becomes a liability. Claude Code doesn’t consistently recommend Rust-based frameworks for WASM-heavy projects or Cloudflare Workers-native tooling like Hono, because those ecosystems, despite rapid growth in 2025–2026, haven’t yet accumulated the documentation density needed to shift AI recommendations. The technical quality is there. The training signal isn’t.

Practical Implications

If you’re a developer or engineer: Letting Claude Code make stack decisions without cross-referencing framework-specific benchmarks—like TechEmpower’s Web Framework Benchmarks or State of JS 2025 survey data—means outsourcing a strategic decision to a system that doesn’t know your performance requirements or your team’s actual skill set.

If you’re leading an engineering team: Treat Claude Code recommendations as a starting hypothesis, not a conclusion. Document why you chose a framework—not just what Claude Code suggested. That creates accountability and forces genuine evaluation before a decision calcifies into six months of technical debt.

If you’re thinking about end users: Framework choices affect product performance and shipping velocity. Apps scaffolded toward heavy client-side React where a leaner alternative fit better do ship slower. That’s a user experience problem that traces directly back to tooling bias.

What to Do About It

Short-term (next 1–3 months):

  • When Claude Code scaffolds a project, explicitly ask: “What alternatives exist, and why might they be better for a [specific constraint] project?”
  • Cross-check against State of JS 2025 satisfaction scores—not just popularity metrics
  • Build a team-specific prompt template that includes your stack constraints upfront

Longer-term (next 6–12 months):

  • Watch for Anthropic’s model cards to include training data composition disclosures—developers are already pushing for this
  • Evaluate whether your organization wants to build internally fine-tuned models that reflect your actual stack preferences
  • Track how framework communities are investing in documentation specifically to influence AI training pipelines

What Comes Next

The bottom line:

  • Claude Code’s framework defaults reflect training data distribution, not objective technical ranking
  • The overlap between Claude Code as a coding tool and its role in marketing automation creates structural incentives worth monitoring
  • Popular frameworks with strong documentation pipelines will continue to benefit disproportionately from AI recommendations
  • Manual framework evaluation remains necessary for any project with specific performance, scale, or niche requirements

Over the next 6–12 months, expect framework communities to invest explicitly in “AI-training-friendly” documentation—structured, comprehensive, high-volume. That’s already happening. Vercel’s documentation team, Remix’s contributor guides, and FastAPI’s tutorial library all read like they were written with LLM training in mind. That arms race will only sharpen.

The mindset shift worth making: treat AI framework recommendations the way you treat Google search results. Useful signal, not final answer. Claude Code’s suggestions tell you what’s popular and well-documented. What they don’t tell you is whether that’s actually the right choice for your problem.

This approach can fail quietly. Teams discover the mismatch six months in, after the scaffolding has hardened into architecture. By then, switching costs are real.

What frameworks has your team found Claude Code consistently under-recommending? The answer probably says something interesting about where documentation investment hasn’t caught up with technical quality.

Related Posts

  • Intel 18A Process Node 288-Core Xeon Make or Break Moment
  • MacBook Pro M5 Pro Max Benchmark Real-World Performance
  • WebMCP Chrome Browser AI Agent Standard Explained
  • Fake Job Interview Backdoor Malware Targeting Developer Machines
  • Firefox 148’s setHTML() API: An innerHTML Replacement for XSS Protection

References

  1. Claude Skills: Build Your First AI Marketing Team in 16 Minutes (Claude Code) – YouTube
  2. Claude (language model) – Wikipedia
  3. Agentic Marketing: Automating Your Growth Strategy with Claude Code | Stormy AI Blog

Is SaaS Dead?

There’s been a lot of noise lately about whether SaaS is dead. Spoiler: it’s not. But the way people use SaaS is changing in a pretty significant way.

If we think about how media has evolved, we can see that history has a pattern here. Radio didn’t kill newspapers, TV didn’t kill radio, streaming didn’t kill TV. But each shift changed how people consumed media, and those who adapted survived. SaaS is about to face its own version of that shift.

The “Headless SaaS” Wave

Here’s the change that’s coming: a significant chunk of SaaS users will stop using SaaS UIs directly. Instead, they’re using AI agents and LLMs to do it for them.

So instead of logging in, navigating dashboards and clicking through workflows, users issue commands through a conversational interface:

  • “Update that record.”
  • “Pull last quarter’s churn drivers.”
  • “Generate a renewal forecast.”
  • “Create onboarding tasks for this new client.”

The SaaS app doesn’t disappear. It becomes infrastructure, handling the stuff that actually requires structure: data integrity, permissions, compliance, domain logic. The AI layer just sits on top and acts as the interface.

What This Means for the SaaS Stack

Right now, most SaaS products are optimized around the UI. Product investment has focused on features, workflows and dashboards, and for good reason since that’s where users spent their time.

As AI agents become more capable, though, a bigger share of users will operate “headlessly.” They’ll delegate execution to an AI and never open the dashboard. The SaaS back-end still does all the work. The front-end just becomes one of several possible entry points.

The future stack looks something like:

  • Back-end: Structured data, domain logic, permissions, compliance
  • Interface layer: Traditional UI plus AI-driven, conversational or agent-based access

For many users, the AI becomes the primary operating environment for work.

The Strategic Dilemma for SaaS Companies

This creates a thorny set of questions for product teams:

  • If the UI isn’t the primary engagement point, what’s your differentiator?
  • If AI agents call your API directly, who owns the customer relationship?
  • If multiple LLMs are hitting your endpoints, how do you enforce security, governance, and tenancy isolation?

SaaS companies have historically optimized for UI/UX, feature depth, and native integrations. Now they also need to optimize for:

  • API completeness and consistency
  • Machine-readable action schemas
  • Monitoring of AI-driven traffic
  • Secure mediation between external agents and internal systems

The API is no longer just “for integrations”, it is the interface.

This Isn’t the Death of SaaS

SaaS as a category is fine. The underlying value of cloud software still matters and still drives real business outcomes.

What’s changing is the surface area. The UI was the front door for the last 20 years. Going forward, it’ll share that role with AI-driven interaction.

The front-end won’t vanish overnight, but it won’t be the only front door anymore. The companies that architect for headless, AI-mediated usage early will define the next era of SaaS. The ones that wait may find their API strategy overwhelmed before they’ve had a chance to adapt.

The directional signal is clear. The question is whether you’re building for it now, or scrambling to catch up later.

This post is an adapted version of an article originally published on the Cyclr blog. All credit for the original ideas and content goes to Cyclr CEO, Fraser Davidson.

dotInsights | March 2026

Did you know? The async and await keywords in C# were introduced in C# 5.0 (2012) to simplify asynchronous programming. Under the hood, the compiler uses a state machine to transform your asynchronous code into manageable tasks. As a developer, you never need to worry about that complexity.

dotInsights | March 2026

Welcome to dotInsights by JetBrains! This newsletter is the home for recent .NET and software development related information.

🔗 Links

Here’s the latest from the developer community.

  • The Skill That Separates Good Developers from GREAT ONES 🎥 – Emily Bache
  • Predicting the Next Edit in JetBrains IDEs 🎥 – Michelle Frost
  • You’re Refactoring When You Should Be Deleting 🎥 – Gui Ferreira
  • Async Await Just Got A Massive Improvement in .NET 🎥 – Nick Chapsas
  • Simplifying Grid Layout in .NET MAUI Using Extension Methods – Leomaris Reyes
  • Why Small Changes Turn Into Big Refactors – CodeOpinion by Derek Comartin
  • Lease Pattern in .NET: A Lock With an Expiration Date That Saves Your Data – Chris Woodruff
  • An ode to “Slowly” handcrafted code – Urs Enzler
  • Creating standard and “observable” instruments – Andrew Lock
  • Announcing the Duende IdentityServer4 Migration Analysis Tool – Khalid Abuhakmeh & Maarten Balliauw
  • Encrypting Properties with System.Text.Json and a TypeInfoResolver Modifier (Part 2) and Encrypting Properties with System.Text.Json and a TypeInfoResolver Modifier (Part 1) – Steve Gordon
  • Introducing MoreSpeakers.com and The Technology Behind MoreSpeakers.com – Joseph Guadagno
  • Writing a .NET Garbage Collector in C#  – Part 7: Marking handles – Kevin Gosse
  • A minimal way to integrate Aspire into your existing project – Tim Deschryver
  • WinUI Tips & Tricks for WinForms Developers – Greg Lutz
  • AI-Powered Smart TextArea for ASP.NET Core: Smarter Typing with Intelligent Autocompletion – Arun Kumar Ragu
  • Building a Greenfield System with the Critter Stack – Jeremy D. Miller
  • Are exceptions exposing vulnerabilities in your .NET App? – David Grace
  • Use client assertions in ASP.NET Core using OpenID Connect, OAuth DPoP and OAuth PAR – Damien Bowden
  • I Started Programming When I Was 7. I’m 50 Now, and the Thing I Loved Has Changed – James Randall
  • Public Speaking at Tech Events 101: Being Uncomfortable Is Worth It – Lou Creemers
  • Ralph Wiggum Explained: Stop Telling AI What You Want — Tell It What Blocks You – Matt Mattei
  • Implementing strongly-typed IDs in .NET for safer domain models – Ali Hamza Ansari
  • Automatic Service Discovery in C# with Needlr: How It Works – Nick Cosentino

☕ Coffee Break

Take a break to catch some fun social posts.

😅 American friends…

Coding then vs coding now….

🗞️ JetBrains News

What’s going on at JetBrains? Find out here:

📊 Check out our Developer Ecosystem Survey: The State of .NET 2025 📊

  • C# Extension Members
  • Rider 2025.3: Day-One Support for .NET 10 and C# 14, a New Default UI, and Faster Startup
  • Open Source in Focus: .NET Projects and the Tools Behind Them

✉️ Comments? Questions? Send us an email.

Subscribe to dotInsights