What production-ready AI agent systems look like

What production-ready AI agent systems look like

Many discussions about open source AI agents start with the same image: a single assistant responding to prompts. That model works well for demonstrations, but it breaks down quickly in production.

One of our speakers for the AI track at OCX 26, Luca Bianchi, explained in an interview, “a production system generally uses a lot of models, not just one model doing back and forth with the user.” Once systems move beyond experimentation, even common patterns such as retrieval-augmented generation become multi-stage pipelines rather than simple request–response flows.

In theory, this looks straightforward. Knowledge is embedded, queries are encoded, and results are retrieved based on semantic distance. 

In practice, production constraints surface immediately. Luca described how, in a real system, similarity scores should have ranged from 0 to 1, but instead clustered between 0.5 and 0.7, making it difficult to distinguish results. Solving that problem required additional steps: re-ranking, metadata-based filtering, query rewriting, and selective composition.

Each of those steps introduces another model into the system. Luca mentioned, “You end up building a complex pipeline of many different models, and this is just for RAG (Retrieval-Augmented Generation).” When teams move further into agentic architectures, orchestration becomes unavoidable. A controlling agent must route requests to sub-agents, each of which invokes its own workflows and models. In production environments, these agentic pipelines place very different demands on latency, cost, and orchestration than a single assistant responding to prompts.

This is where the limits of the “single assistant” model become clear. Latency and cost compound across the pipeline. If each model in the chain takes tens of seconds to respond, the complete pipeline is going to take a very long time.

At that point, system design is no longer about prompts or raw model capability. It is about how pipelines are structured, how responsibilities are split across models, and how orchestration is handled. Production-ready AI agents are not assistants. They are pipelines and their success depends on engineering decisions made early.

In this session at OCX 26 in Brussels, Luca Bianchi will break down how real production-ready AI agent ecosystems are designed, using concrete examples of multi-model pipelines and orchestration. Attendees will gain a practical understanding of how agentic systems evolve beyond single assistants and of the architectural decisions that determine whether those pipelines remain usable at scale. 

 

Image
OCX

Daniela Nastase


Classifying Amazon Reviews with Python: From Raw Text to 88% Accuracy

Ever wondered how businesses know if customers are happy or not? In this project, I built a machine learning model that classifies Amazon product reviews as Positive or Negative using NLP techniques. Here’s how I did it.

  1. The Dataset
    I used the Amazon Review Polarity Dataset — sampling 200,000 reviews for training and 50,000 for testing. The dataset was perfectly balanced between positive and negative reviews, which is ideal for classification.

  2. Cleaning the Text
    Raw reviews are messy. I wrote a preprocessing function to lowercase text, strip punctuation, numbers, and remove stopwords using NLTK. This is really helpful for the model to identify words properly.

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r"[^ws]", "", text)
    text = re.sub(r"d+", "", text)
    words = [word for word in text.split() if word not in stop_words]
    return " ".join(words)
  1. Converting Text to Numbers with TF-IDF
    Machine learning models need numbers, not words. TF-IDF weighs words by how unique they are to each review — common words like “the” get ignored, meaningful words like “terrible” get prioritised.
vectorizer = TfidfVectorizer(max_features=5000, min_df=5, max_df=0.9)
X_train = vectorizer.fit_transform(train_df["clean_text"])
X_test = vectorizer.transform(test_df["clean_text"])
  1. Training & Comparing Models
    I trained and compared three models — Logistic Regression , Naive Bayes, and Linear SVM. Logistic Regression performed best and was used for the final evaluation.

  2. Results
    Tested on 50,000 reviews:
    Metric Negative – Positive
    Precision = 0.89 – 0.88
    Recall = 0.88 – 0.89
    F1-Score = 0.88 – 0.89
    Overall Accuracy: 88% — balanced performance across both classes.
    Classification Report

  3. Real-Time Predictions

Model identifying positive and negative reviews

def predict_sentiment(text):
    cleaned = clean_text(text)
    vectorized = vectorizer.transform([cleaned])
    prediction = model.predict(vectorized)[0]
    return "Positive" if prediction == 1 else "Negative"

“This product is amazing!” -> Positive
“Completely useless, waste of money” -> Negative

  1. Visualizations
    Three charts helped tell the story:

Sentiment distribution — confirmed the dataset was balanced
Class distribution, Positive class and Negative class

Word cloud — top positive words: great, love, best
Diplayed the top positive words

Confusion matrix — symmetric errors, no class bias
Diagonal and Off diagonal, Showing the TN,FN,TP,FP

What I Learned
Working at this scale (250k reviews) taught me that clean data and a balanced dataset matter more than model complexity. Logistic Regression beat fancier approaches simply because the data was well prepared.
Next steps: hyperparameter tuning, cross-validation, and eventually a BERT-based model for higher accuracy.
Full code on my GitHub — feel free to clone and try it on your own dataset!

Found this helpful? Drop a like or leave a comment below!

Automated Code Review: Benefits, Tools & Implementation (2026 Guide)

Code review has become the single biggest bottleneck in modern software development. As AI coding tools accelerate generation, with 41% of all code now AI-assisted, review queues have ballooned, creating a paradox where individual developer speed rises while organizational throughput stalls or declines. The DORA 2024 report found that a 25% increase in AI tool adoption correlated with a 7.2% decrease in delivery stability, largely because AI enables larger changesets that overwhelm review capacity.

This guide walks you through the three levels of automated code review. From basic linting through Static Analysis to AI-powered semantic analysis, you will see how to implement a system that turns review from a bottleneck into a competitive advantage.

The stakes are real. Research consistently shows that a bug caught in production costs 10x more than one found during design, with some estimates putting that multiplier as high as 100x. The Consortium for IT Software Quality pegs the total US cost of poor software quality at $2.41 trillion annually. Yet analysis of 730,000+ pull requests across 26,000 developers reveals that PRs sit idle for 5 out of every 7 days of cycle time. Automated code review directly attacks this gap by catching defects earlier, accelerating merge velocity, and freeing human reviewers to focus on architecture and business logic.

The AI code explosion has made review the new constraint

A 2025 Faros AI study of 10,000+ developers found that engineers using AI tools complete 21% more tasks and merge 98% more PRs, but PR review time increased by 91%. Teams that once handled 10 to 15 PRs per week now face 50 to 100. Features that take 2 hours to generate can require 4 hours to review. LinearB’s 2025 benchmark of 8.1 million PRs confirmed the pattern: AI-generated PRs wait 4.6x longer before a reviewer picks them up.

More code is entering pipelines than human reviewers can properly validate. A CodeRabbit analysis of 470 GitHub PRs found AI-generated code produces 1.7x more issues than human-written code, logic errors up 75%, security vulnerabilities up 1.5 to 2x, and performance inefficiencies appearing 8x more frequently. The Sonar 2026 State of Code survey confirmed that 96% of developers don’t fully trust AI-generated code’s functional accuracy, yet only 48% always verify it before committing.

The cycle of increasing presssure

DORA’s 2024 research identified the root cause: AI tools violate small-batch principles by enabling larger changesets that increase risk. Elite-performing teams deploy multiple times daily with sub-5% change failure rates. However, AI adoption without review automation pushes teams toward larger batches, eroding the very practices that make elite performance possible. The path forward is automating the review process itself, not just code generation.

Level 1: linting and formatting eliminate the noise

The foundation of any automated review system is deterministic tooling that enforces consistency and catches syntax-level issues before they reach human reviewers. This layer eliminates style debates entirely and ensures every PR starts from a clean baseline.

Linters analyse your code for logical errors, anti-patterns, and style violations. Rather than checking whether code runs, they encode your team’s standards as rules applied automatically on every change. Formatters handle a narrower but equally important job: they take any valid code and rewrite it into a single canonical style, making diffs cleaner and reviews faster. The two tools work in tandem, with the linter catching what you mean, and the formatter controlling how it looks.

In the JavaScript ecosystem, ESLint and Prettier are the dominant tools for these roles respectively, and both saw significant releases in early 2026. ESLint’s v10 completed a multi-year architectural overhaul, added multithreading for large codebases, and expanded beyond JavaScript to cover CSS, HTML, JSON, and Markdown. Prettier’s v3.8 introduced a Rust-powered CLI with meaningful speed improvements. Together they cover virtually every file type in a modern web project.

Implementing both via GitHub Actions is straightforward and should be the first automation any team deploys:

name: Code Quality
on: [push, pull_request]
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'
      - run: npm ci
      - run: npx eslint . --cache --max-warnings 0
      - run: npx prettier --check .

In CI, run formatters in --check mode (developers should fix issues locally) and enforce passing checks via branch protection rules. Adding ESLint caching and parallel jobs per language keeps feedback under 30 seconds, which is critical for developer adoption. Pre-commit hooks using tools like Husky and lint-staged catch issues before they even reach CI.

Level 2: SAST and security scanning catch what linters miss

Static Application Security Testing tools analyse code for vulnerabilities, complexity, and deeper quality issues that pattern-based linters cannot detect. SonarQube Server 2026.1 LTA leads this category with support for 30+ languages, advanced taint analysis tracking data flow across functions and files, and detection of OWASP Top 10 vulnerabilities including SQL injection, XSS, SSRF, command injection, and path traversal. SonarQube’s AI CodeFix feature uses LLMs to generate remediation suggestions for detected issues, while its AI Code Assurance capability automatically identifies and applies stricter quality gates to AI-generated code.

SAST tools commonly detect injection flaws (SQL injection, XSS, command injection, LDAP injection, SSRF, and XXE), data exposure issues (hardcoded secrets and credentials, sensitive data in logs, missing encryption), memory and buffer issues (buffer overflows, use-after-free, integer overflows), and input validation failures (path traversal, insecure deserialization, unvalidated redirects).

The different levels of automated review

Detection rates vary significantly. On the OWASP Benchmark, modern AI-enhanced SAST tools like Qwiet AI have achieved 100% true positive rates with 25% false positive rates, while traditional tools historically scored around 33%. SonarQube achieves false positive rates as low as 1% on mature codebases. The key advance in 2025 to 2026 has been combining SAST with LLM-based post-processing. One study showed this combination reduced false positives by 91% compared to standalone Semgrep scanning.

SonarQube’s Clean as You Code philosophy, where quality gates apply only to new code rather than the entire codebase, makes adoption practical for legacy projects. Configure gates to fail on any new blocker or critical vulnerability, while incrementally addressing existing technical debt. This approach follows a zero-noise principle: only flag issues developers can act on right now.

Level 3: AI-powered review and workflow platforms change everything

The most significant shift in 2025 to 2026 has been the emergence of AI-powered code review that understands code semantics, developer intent, and project context, moving well beyond pattern matching into genuine comprehension. This is where platforms like Graphite operate, combining AI review intelligence with workflow automation to address the full “outer loop” of development.

The AI foundation is now proven. Anthropic’s Claude model family powers multiple code review tools across the Claude Sonnet, Haiku, and Opus tiers, balancing capability, speed, and cost for different review workloads. Claude Code includes a built-in /code-review command that launches four parallel review agents, scores issues by confidence, and surfaces only findings above an 80% confidence threshold — important for managing false positives.

Graphite exemplifies the Level 3 platform approach. Following its acquisition by Cursor in December 2025 (at a valuation exceeding its previous $290M), Graphite serves 100,000+ developers across 500+ companies including Shopify, Snowflake, Figma, and Notion. Its thesis: AI tools have dramatically accelerated the “inner loop” of writing code, making the “outer loop” of review, merge, and deploy the new constraint. Graphite addresses this with four integrated capabilities.

Graphite Agent provides AI-powered PR review built on Anthropic’s Claude. Unlike general-purpose AI reviewers with a 5-15% false positive rate, it achieves a 5-8% false positive rate through multi-step validation including voting, chain-of-reasoning, and self-critique. The results are compelling: 67% of AI suggestions lead to actual code changes, and the tool maintains a 96% positive feedback rate from developers. You can define custom review rules in plain language, something like “ensure auth-service never makes direct database calls”, and Graphite Agent enforces them on every PR.

Stacked PRs directly address the batch-size problem identified by DORA. Analysis of 50,000+ PRs shows defect detection rates drop from 87% for PRs under 100 lines to just 28% for PRs over 1,000 lines. Stacking breaks large features into small, dependent PRs that build on each other. Graphite’s CLI (gt stack submit) manages the entire stack lifecycle including automatic recursive rebasing. The impact is measurable: Semgrep saw a 65% increase in code shipped per engineer after adopting stacking, while Shopify reports 33% more PRs shipped per developer.

Merge Queue is the only stack-aware merge queue available, processing dependent PRs in parallel while ensuring the main branch stays green. It supports batching multiple PRs to reduce CI costs and hot-fix prioritization for critical changes.

Customer metrics demonstrate the platform effect. Ramp achieved a 74% decrease in median time between merged PRs (from 10 hours to 3). Asana engineers shipped 21% more code and saved 7 hours per week per engineer within 30 days. Across all customers, the average Graphite user merges 26% more PRs while reducing median PR size by 8 to 11%.

Rolling out automation without overwhelming your team

The most common failure mode is deploying too many blocking checks at once, triggering alert fatigue that erodes developer trust. Research shows false positives are the number-one adoption killer for automated review tools. The solution is a progressive, trust-building rollout.

Phase 1 (Weeks 1 to 4): Foundation. Deploy ESLint and Prettier as non-blocking CI checks. Add PR size warnings for changes exceeding 400 lines. Establish baseline metrics: current cycle time, defect escape rate, and PR merge frequency. This phase should be completely frictionless — developers see suggestions but are never blocked.

Phase 2 (Weeks 5 to 10): Security gates. Introduce SonarQube or equivalent SAST scanning in advisory mode. Configure severity thresholds so only critical security findings (SQL injection, hardcoded secrets) become blocking. All other findings appear as PR comments. Begin tracking false positive rates and tune rules aggressively — a finding that never gets fixed is noise, not signal.

Phase 3 (Weeks 11 to 16): AI-powered review. Enable Graphite Agent or equivalent AI review as a non-blocking reviewer. Start with 1 to 3 volunteer teams who provide feedback on suggestion quality. Use this phase to configure custom team rules and calibrate the AI to your codebase’s conventions. The key metric to track is acceptance rate — the percentage of AI comments that result in code changes.

Phase 4 (Week 17+): Full platform. Introduce stacked PR workflows, merge queue automation, and promote AI review to soft-gate status (require acknowledgment of critical findings). Implement productivity insights to measure before/after impact.

The stages of AI code review adoption

Three principles govern successful rollouts. First, start non-blocking and graduate to blocking only after false positive rates stabilize below 5%. Second, integrate into existing workflows. Review feedback should appear as inline PR comments, not in separate dashboards. Third, measure and share wins: when developers see that automated review caught a real bug or saved them 30 minutes, adoption becomes self-reinforcing.

The cost equation favors aggressive automation

The financial case for automated code review is straightforward to model. A team processing 200 PRs monthly that saves 20 minutes of reviewer time per PR at an $80 loaded rate generates roughly $25,600 in annual savings from review efficiency alone. Blocking even 10 high-severity bugs per quarter that would have cost $5,000 each in production adds another $200,000 in avoided remediation costs. Against typical platform costs of $20,000 to $40,000 annually for a 25-person team, the total benefit of roughly $226,000 delivers an ROI of between 5:1 and 11:1 in the first year, depending on platform tier.

The deeper value is strategic, though. DORA research consistently shows that elite teams combine fast delivery with high stability, and they achieve this through small batches, automated testing, and rapid feedback loops. Automated code review is the mechanism that makes this possible at scale, especially as AI-generated code volumes continue to grow. Teams that treat review as an afterthought will face compounding technical debt: 75% of technology decision-makers are projected to face moderate-to-severe technical debt from AI-speed practices by end of 2026.

Conclusion

The automated code review landscape in 2026 has matured into a clear three-level stack.

Level 1: Linting with ESLint and Prettier. This is table stakes that every team should have deployed.
Level 2: SAST with tools like SonarQube. This catches security vulnerabilities and code smells that linters miss.
Level 3: AI-powered semantic review combined with workflow automation. This represents the frontier, and it’s where the highest-impact gains live.

Platforms like Graphite that integrate AI review, stacked PRs, and merge automation into a unified system address the full outer-loop bottleneck rather than just one piece of it. The data is clear: small PRs reviewed by AI catch 3x more defects than large PRs reviewed by humans alone, and teams using integrated automation platforms ship 20 to 65% more code while maintaining or improving quality. For engineering leaders, the question is no longer whether to automate code review, but how quickly you can reach Level 3.

🚀 Vibe Coding Tools Are Changing the Way We Build Software

A few years ago, building an app meant writing hundreds or thousands of lines of code. Today, things are different. Welcome to the world of vibe coding, where you describe what you want, and AI helps turn that idea into real, working code.

Instead of spending hours debugging or writing boilerplate, developers now collaborate with AI tools that generate, fix, and improve code instantly. It feels less like traditional programming and more like guiding the “vibe” of what you want to build.

Here are some powerful vibe-coding tools developers are loving right now:

Cursor – An AI-powered code editor that can generate files, refactor code, and even fix bugs automatically.
Replit – A cloud coding platform where you can build and deploy apps directly from your browser.
GitHub Copilot – Your AI coding partner that suggests functions and completes code as you type.
Vercel v0 – Perfect for turning UI prompts into beautiful React components instantly.
Lovable – Generate full-stack applications just by explaining your idea.
Bolt – Quickly create and test apps in the browser using AI prompts.

💡 Why vibe coding is trending:
• Faster development
• Easier prototyping
• Less repetitive coding
• Perfect for startups and indie developers

But remember, AI can help build faster, yet understanding the code still matters if you want stable and secure apps.

The future of coding isn’t just writing code…
It’s collaborating with AI to bring ideas to life faster than ever.

What do you think will make coding become the new normal for developers?

Claude Code Framework Preference Bias and Developer Marketing

Something quietly strange is happening inside AI-assisted development workflows. Claude Code—Anthropic’s agentic coding tool—doesn’t just write code. It recommends frameworks. And those recommendations aren’t always neutral.

The pattern is drawing attention from developers who’ve noticed Claude Code steering toward specific stacks in ways that feel less like engineering judgment and more like a popularity contest. Whether that’s a training artifact, a reflection of documentation quality across frameworks, or something more intentional, the implications for developer tooling decisions are worth examining carefully.

Key Takeaways

  • Claude Code’s framework recommendations show measurable bias toward well-documented frameworks like Next.js and React, likely reflecting training data distribution rather than objective technical merit.
  • Anthropic’s growing integration of Claude Code into marketing automation workflows—demonstrated across multiple 2026 community tutorials—creates a conflict of interest in how the tool surfaces recommendations.
  • Developers relying on Claude Code for stack decisions without cross-checking against framework-specific benchmarks risk optimizing for AI familiarity rather than project fit.
  • The Claude Code framework preference bias dynamic is expected to intensify as AI coding tools capture a larger share of the junior-to-mid developer workflow.
  • Framework communities with thinner documentation coverage face a structural disadvantage in AI-assisted project scaffolding, regardless of technical quality.

How We Got Here

Claude launched in March 2023. By late 2024, Anthropic had shipped Claude Code as a standalone agentic tool capable of multi-step programming tasks—not just autocomplete, but full project scaffolding, dependency selection, and architecture recommendations.

That’s a significant shift. When a developer asks Claude Code to “spin up a new web app,” the tool doesn’t just write code. It chooses. React or Vue? Express or Fastify? Supabase or PlanetScale? Each of those choices carries downstream consequences for months of development work.

The timeline matters. Claude 3.5 Sonnet (released mid-2024) demonstrated substantially improved coding benchmarks—scoring 49% on SWE-bench Verified, according to Anthropic’s published model card. Claude 3.7 Sonnet, released in February 2025, pushed further with extended thinking capabilities specifically tuned for agentic workflows. By early 2026, Claude Code had become a default scaffolding layer for a non-trivial slice of greenfield projects.

Parallel to this, Anthropic’s ecosystem partners began shipping Claude Code-powered marketing automation tools—the kind that auto-generate landing pages, email sequences, and content pipelines. Stormy AI’s agentic marketing documentation explicitly frames Claude Code as the orchestration layer for growth workflows. YouTube tutorials like “Claude Skills: Build Your First AI Marketing Team in 16 Minutes” have accumulated significant developer mindshare.

The convergence is the issue. Claude Code is simultaneously a coding tool and increasingly embedded in marketing infrastructure. That dual role creates conditions where framework bias isn’t just a technical curiosity—it’s a business vector.

The Bias Pattern: What Developers Are Seeing

The core complaint is consistent: Claude Code defaults to the same short list of frameworks regardless of project constraints. Ask it to scaffold a backend API and it reaches for Express.js or FastAPI. Ask for a frontend and it defaults to Next.js or React. Ask for a database layer and it gravitates toward PostgreSQL-backed ORMs.

None of those choices are wrong. They’re often reasonable. But “reasonable default” and “best fit for your specific project” are different things.

The mechanism behind this is almost certainly training data distribution. React has dramatically more Stack Overflow threads, GitHub repositories, and documentation pages than Svelte or SolidJS. Next.js has orders of magnitude more indexed tutorial content than Remix or Astro circa 2023–2024—when Claude’s core training likely crystallized. Claude Code’s recommendations are, at least partially, a reflection of documentation density rather than framework quality.

Think of it as search engine result bias. Not conspiracy, but structural advantage baked into the data pipeline.

The Marketing Angle: When Tooling Becomes a Channel

The framework preference bias gets sharper when you examine who benefits from these defaults.

Frameworks with enterprise backing—Vercel (Next.js), Meta (React), Microsoft (TypeScript)—have invested heavily in documentation, tutorials, and community presence. That investment translates directly into training data volume. When Claude Code defaults to Next.js, it’s partly because Vercel has spent years ensuring Next.js is the best-documented React framework on the internet.

That’s not a scandal. It’s a rational content strategy that happens to produce a feedback loop: better docs → more training data → more AI recommendations → more adoption → more investment in docs.

But developers should know that’s what’s happening. The recommendations coming out of Claude Code aren’t agnostic engineering opinions. They carry the weight of documentation investment and—increasingly—explicit commercial relationships as AI tooling integrates deeper into SaaS ecosystems.

Comparing Your Options

Criteria Claude Code GitHub Copilot Manual Research
Speed Seconds Seconds Hours
Bias Source Training data distribution Training data + telemetry Developer experience
Transparency Low Low High
Framework Coverage Broad but weighted Broad but weighted Project-specific
Update Lag Model training cycle Model training cycle Real-time
Best For Rapid scaffolding In-editor completion Strategic stack decisions

Both Claude Code and GitHub Copilot carry structural bias toward high-documentation frameworks. Manual research is slower but surfaces niche frameworks—SvelteKit for performance-critical SPAs, Hono for edge-native APIs—that AI tools consistently underweight.

The trade-off isn’t “AI bad, manual good.” It’s about knowing what each source optimizes for.

For teams shipping fast, Claude Code’s bias toward well-supported frameworks actually reduces risk. React and PostgreSQL have massive community support, which means debugging resources exist at every turn. The gravity toward popular stacks is a feature if your team prioritizes hiring pipelines and long-term maintainability over raw performance optimization.

But for specialized workloads—edge computing, WebAssembly targets, real-time systems—that same bias becomes a liability. Claude Code doesn’t consistently recommend Rust-based frameworks for WASM-heavy projects or Cloudflare Workers-native tooling like Hono, because those ecosystems, despite rapid growth in 2025–2026, haven’t yet accumulated the documentation density needed to shift AI recommendations. The technical quality is there. The training signal isn’t.

Practical Implications

If you’re a developer or engineer: Letting Claude Code make stack decisions without cross-referencing framework-specific benchmarks—like TechEmpower’s Web Framework Benchmarks or State of JS 2025 survey data—means outsourcing a strategic decision to a system that doesn’t know your performance requirements or your team’s actual skill set.

If you’re leading an engineering team: Treat Claude Code recommendations as a starting hypothesis, not a conclusion. Document why you chose a framework—not just what Claude Code suggested. That creates accountability and forces genuine evaluation before a decision calcifies into six months of technical debt.

If you’re thinking about end users: Framework choices affect product performance and shipping velocity. Apps scaffolded toward heavy client-side React where a leaner alternative fit better do ship slower. That’s a user experience problem that traces directly back to tooling bias.

What to Do About It

Short-term (next 1–3 months):

  • When Claude Code scaffolds a project, explicitly ask: “What alternatives exist, and why might they be better for a [specific constraint] project?”
  • Cross-check against State of JS 2025 satisfaction scores—not just popularity metrics
  • Build a team-specific prompt template that includes your stack constraints upfront

Longer-term (next 6–12 months):

  • Watch for Anthropic’s model cards to include training data composition disclosures—developers are already pushing for this
  • Evaluate whether your organization wants to build internally fine-tuned models that reflect your actual stack preferences
  • Track how framework communities are investing in documentation specifically to influence AI training pipelines

What Comes Next

The bottom line:

  • Claude Code’s framework defaults reflect training data distribution, not objective technical ranking
  • The overlap between Claude Code as a coding tool and its role in marketing automation creates structural incentives worth monitoring
  • Popular frameworks with strong documentation pipelines will continue to benefit disproportionately from AI recommendations
  • Manual framework evaluation remains necessary for any project with specific performance, scale, or niche requirements

Over the next 6–12 months, expect framework communities to invest explicitly in “AI-training-friendly” documentation—structured, comprehensive, high-volume. That’s already happening. Vercel’s documentation team, Remix’s contributor guides, and FastAPI’s tutorial library all read like they were written with LLM training in mind. That arms race will only sharpen.

The mindset shift worth making: treat AI framework recommendations the way you treat Google search results. Useful signal, not final answer. Claude Code’s suggestions tell you what’s popular and well-documented. What they don’t tell you is whether that’s actually the right choice for your problem.

This approach can fail quietly. Teams discover the mismatch six months in, after the scaffolding has hardened into architecture. By then, switching costs are real.

What frameworks has your team found Claude Code consistently under-recommending? The answer probably says something interesting about where documentation investment hasn’t caught up with technical quality.

Related Posts

  • Intel 18A Process Node 288-Core Xeon Make or Break Moment
  • MacBook Pro M5 Pro Max Benchmark Real-World Performance
  • WebMCP Chrome Browser AI Agent Standard Explained
  • Fake Job Interview Backdoor Malware Targeting Developer Machines
  • Firefox 148’s setHTML() API: An innerHTML Replacement for XSS Protection

References

  1. Claude Skills: Build Your First AI Marketing Team in 16 Minutes (Claude Code) – YouTube
  2. Claude (language model) – Wikipedia
  3. Agentic Marketing: Automating Your Growth Strategy with Claude Code | Stormy AI Blog

Is SaaS Dead?

There’s been a lot of noise lately about whether SaaS is dead. Spoiler: it’s not. But the way people use SaaS is changing in a pretty significant way.

If we think about how media has evolved, we can see that history has a pattern here. Radio didn’t kill newspapers, TV didn’t kill radio, streaming didn’t kill TV. But each shift changed how people consumed media, and those who adapted survived. SaaS is about to face its own version of that shift.

The “Headless SaaS” Wave

Here’s the change that’s coming: a significant chunk of SaaS users will stop using SaaS UIs directly. Instead, they’re using AI agents and LLMs to do it for them.

So instead of logging in, navigating dashboards and clicking through workflows, users issue commands through a conversational interface:

  • “Update that record.”
  • “Pull last quarter’s churn drivers.”
  • “Generate a renewal forecast.”
  • “Create onboarding tasks for this new client.”

The SaaS app doesn’t disappear. It becomes infrastructure, handling the stuff that actually requires structure: data integrity, permissions, compliance, domain logic. The AI layer just sits on top and acts as the interface.

What This Means for the SaaS Stack

Right now, most SaaS products are optimized around the UI. Product investment has focused on features, workflows and dashboards, and for good reason since that’s where users spent their time.

As AI agents become more capable, though, a bigger share of users will operate “headlessly.” They’ll delegate execution to an AI and never open the dashboard. The SaaS back-end still does all the work. The front-end just becomes one of several possible entry points.

The future stack looks something like:

  • Back-end: Structured data, domain logic, permissions, compliance
  • Interface layer: Traditional UI plus AI-driven, conversational or agent-based access

For many users, the AI becomes the primary operating environment for work.

The Strategic Dilemma for SaaS Companies

This creates a thorny set of questions for product teams:

  • If the UI isn’t the primary engagement point, what’s your differentiator?
  • If AI agents call your API directly, who owns the customer relationship?
  • If multiple LLMs are hitting your endpoints, how do you enforce security, governance, and tenancy isolation?

SaaS companies have historically optimized for UI/UX, feature depth, and native integrations. Now they also need to optimize for:

  • API completeness and consistency
  • Machine-readable action schemas
  • Monitoring of AI-driven traffic
  • Secure mediation between external agents and internal systems

The API is no longer just “for integrations”, it is the interface.

This Isn’t the Death of SaaS

SaaS as a category is fine. The underlying value of cloud software still matters and still drives real business outcomes.

What’s changing is the surface area. The UI was the front door for the last 20 years. Going forward, it’ll share that role with AI-driven interaction.

The front-end won’t vanish overnight, but it won’t be the only front door anymore. The companies that architect for headless, AI-mediated usage early will define the next era of SaaS. The ones that wait may find their API strategy overwhelmed before they’ve had a chance to adapt.

The directional signal is clear. The question is whether you’re building for it now, or scrambling to catch up later.

This post is an adapted version of an article originally published on the Cyclr blog. All credit for the original ideas and content goes to Cyclr CEO, Fraser Davidson.

dotInsights | March 2026

Did you know? The async and await keywords in C# were introduced in C# 5.0 (2012) to simplify asynchronous programming. Under the hood, the compiler uses a state machine to transform your asynchronous code into manageable tasks. As a developer, you never need to worry about that complexity.

dotInsights | March 2026

Welcome to dotInsights by JetBrains! This newsletter is the home for recent .NET and software development related information.

🔗 Links

Here’s the latest from the developer community.

  • The Skill That Separates Good Developers from GREAT ONES 🎥 – Emily Bache
  • Predicting the Next Edit in JetBrains IDEs 🎥 – Michelle Frost
  • You’re Refactoring When You Should Be Deleting 🎥 – Gui Ferreira
  • Async Await Just Got A Massive Improvement in .NET 🎥 – Nick Chapsas
  • Simplifying Grid Layout in .NET MAUI Using Extension Methods – Leomaris Reyes
  • Why Small Changes Turn Into Big Refactors – CodeOpinion by Derek Comartin
  • Lease Pattern in .NET: A Lock With an Expiration Date That Saves Your Data – Chris Woodruff
  • An ode to “Slowly” handcrafted code – Urs Enzler
  • Creating standard and “observable” instruments – Andrew Lock
  • Announcing the Duende IdentityServer4 Migration Analysis Tool – Khalid Abuhakmeh & Maarten Balliauw
  • Encrypting Properties with System.Text.Json and a TypeInfoResolver Modifier (Part 2) and Encrypting Properties with System.Text.Json and a TypeInfoResolver Modifier (Part 1) – Steve Gordon
  • Introducing MoreSpeakers.com and The Technology Behind MoreSpeakers.com – Joseph Guadagno
  • Writing a .NET Garbage Collector in C#  – Part 7: Marking handles – Kevin Gosse
  • A minimal way to integrate Aspire into your existing project – Tim Deschryver
  • WinUI Tips & Tricks for WinForms Developers – Greg Lutz
  • AI-Powered Smart TextArea for ASP.NET Core: Smarter Typing with Intelligent Autocompletion – Arun Kumar Ragu
  • Building a Greenfield System with the Critter Stack – Jeremy D. Miller
  • Are exceptions exposing vulnerabilities in your .NET App? – David Grace
  • Use client assertions in ASP.NET Core using OpenID Connect, OAuth DPoP and OAuth PAR – Damien Bowden
  • I Started Programming When I Was 7. I’m 50 Now, and the Thing I Loved Has Changed – James Randall
  • Public Speaking at Tech Events 101: Being Uncomfortable Is Worth It – Lou Creemers
  • Ralph Wiggum Explained: Stop Telling AI What You Want — Tell It What Blocks You – Matt Mattei
  • Implementing strongly-typed IDs in .NET for safer domain models – Ali Hamza Ansari
  • Automatic Service Discovery in C# with Needlr: How It Works – Nick Cosentino

☕ Coffee Break

Take a break to catch some fun social posts.

😅 American friends…

Coding then vs coding now….

🗞️ JetBrains News

What’s going on at JetBrains? Find out here:

📊 Check out our Developer Ecosystem Survey: The State of .NET 2025 📊

  • C# Extension Members
  • Rider 2025.3: Day-One Support for .NET 10 and C# 14, a New Default UI, and Faster Startup
  • Open Source in Focus: .NET Projects and the Tools Behind Them

✉️ Comments? Questions? Send us an email.

Subscribe to dotInsights

Cursor Joined the ACP Registry and Is Now Live in Your JetBrains IDE

Cursor is now available as an AI agent inside JetBrains IDEs through the Agent Client Protocol. Select it from the agent picker, and it has full access to your project.

If you’ve spent any time in the AI coding space, you already know Cursor. It has been one of the most requested additions to the ACP Registry.

What you get

Cursor is known for its AI-native, agentic workflows. JetBrains IDEs are valued for deep code intelligence – refactoring, debugging, code quality checks, and the tooling professionals rely on at scale. ACP brings the two together.

You can now use Cursor’s agentic capabilities directly inside your JetBrains IDE – within the workflows and features you already use. 

A growing open ecosystem

Cursor joins a growing list of agents available through ACP in JetBrains IDEs. Every new addition to the ACP Registry means you have more choice – while still working inside the IDE you already rely on. You get access to frontier models from major providers, including OpenAI, Anthropic, Google, and now also Cursor.

This is part of our open ecosystem strategy. Plug in the agents you want and work in the IDE you love – without getting locked into a single solution.

Cursor is focused on building the best way to build software with AI. By integrating Cursor with JetBrains IDEs, we’re excited to provide teams with powerful agentic capabilities in the environments where they’re already working.

– Jordan Topoleski, COO at Cursor

Get started

You need version 2025.3.2 or later of your JetBrains IDE with the AI Assistant plugin enabled. From there, open the agent selector, select Install from ACP Registry…, install Cursor, and start working. You don’t need a JetBrains AI subscription to use Cursor as an AI agent.

The ACP Registry keeps growing, and many agents have already joined it – with more on the way. Try it today with Cursor and experience agent-driven development inside your JetBrains IDE. For more information about the Agent Client Protocol, see our original announcement and the blog post on the ACP Agent Registry support.