Logic Apps Agent Loop + MCP: Two Bugs Worth Knowing About

I spent the long weekend pushing Logic Apps MCP server capabilities further than I had before — and hit two bugs worth documenting. Both are filed. If you’re building in this space, save yourself the debugging time.

Context

If you’ve been following along, the MCP server and BODMAS Agent are covered in the previous posts. This post is just about what broke when I wired them together.

Bug 1 — Intermittent duplicate key error at tool registration

What happens

The Agent Loop fails with a BadRequest before making a single MCP call:

HTTP request failed: 'An item with the same key has already been added. Key: {tool_name}'.

The key referenced in the error — BasicArithmeticMCP, ExtendedArithmeticMCP, whatever you name it — appears exactly once in the workflow definition. There is no actual duplicate in the JSON.

What makes it particularly frustrating to diagnose

It is intermittent. Some runs fail, others succeed with identical configuration and identical input. No changes between a failing and a succeeding run — same workflow, same expression, same everything.

Load test

I fired 5 to 10 parallel requests at the Agent Loop as a mini stress test. It failed — the duplicate key error appeared across multiple runs in the batch.

Sequential calls with proper spacing between them worked fine.

What you can’t do

The Agent action has a default retry policy, but it does not help here. A BadRequest (400) is not treated as a transient error — the retry policy targets server-side failures (5xx), not client errors. So even with retries configured, the duplicate key error causes an immediate terminal failure. There is no clean in-workflow workaround.

Bug 2 — MCP Connector does not support OAuth

What happens

Both the MCP server and the MCP client are Logic Apps Standard. When OAuth is configured on the MCP server side, the workflow doesn’t trigger at all — it never reaches the Logic App. The connection gets corrupted at design time with the OAuth setup, and no run is created.

Tools don’t load but you can save the workflow.

You get a 502 bad gateway error when you push a request.

The same endpoint called directly from Postman with a valid bearer token works fine.

Why it matters

To get the Agent Loop working, the MCP server has to run with either anonymous authentication or key-based authentication. OAuth simply does not work with the built-in MCP client connector.

Current state

Both issues are filed on the Logic Apps GitHub repo:

Agent Loop: “An item with the same key has already been added” when using McpClientTool

The issue covers both bugs with full workflow JSON, reproduction steps, and screenshots. If you’ve hit either of these, add a reaction or comment — the more signal on the issue, the better.

What works in the meantime

  • Set "type": "anonymous" in the McpServerEndpoints authentication block in host.json — removes the OAuth blocker for dev and demo use
  • Accept the intermittent failure rate on the Agent Loop and re-trigger manually when it hits — not a fix, but the success rate is high enough to keep building and testing

Both issues are filed. If you hit either of them, the GitHub issue is the right place to add signal.

Mythos Found a 27-Year-Old Bug in OpenBSD. Your Code Is Next.

Anthropic’s new Mythos Preview surfaced a 27-year-old vulnerability in OpenBSD — the most-audited operating system in commercial software — and generated 181 working Firefox exploits in a benchmark where Claude Opus 4.6 managed two. Eleven organizations are inside the launch cohort. The rest of us aren’t, and the next Mythos won’t be gated.

What Mythos is, in hard numbers

On April 7, Anthropic announced Claude Mythos Preview, a frontier general-purpose model with a step-change in computer security capability. The numbers are the story:

  • A 27-year-old vulnerability in OpenBSD, surfaced by Mythos in the TCP SACK implementation. OpenBSD’s audit posture is the high bar in the industry.
  • A 16-year-old vulnerability in FFmpeg’s H.264 codec — the media component shipped in nearly every modern browser and video pipeline.
  • A 17-year-old remote code execution vulnerability in FreeBSD’s NFS implementation (CVE-2026-4747).
  • Linux kernel vulnerabilities autonomously chained by the model into a complete privilege escalation to root.
  • 181 working Firefox exploits in a benchmark where Claude Opus 4.6 produced two — an order-of-magnitude leap in a single model generation.
  • 271 vulnerabilities patched in Firefox 150 after Mozilla used an early version of Mythos Preview to scan its codebase. Mozilla described the model as “every bit as capable” as the best human security researchers.
  • Thousands of zero-days identified in operating systems, browsers, and infrastructure software in the weeks before announcement.

Anthropic was clear about something else worth dwelling on: the company did not explicitly train Mythos for these capabilities. They emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model a better defender make it a better attacker. That equivalence is the whole story.

Mythos isn’t a security tool. It’s a frontier model that happens to be very good at a security task that turns out to require general intelligence. The distinction matters: capability of this kind doesn’t stay siloed.

The asymmetry just collapsed

For thirty years, the offensive-defensive asymmetry in software security was: attackers needed to find one bug, defenders needed to find all of them. The economics favored attackers — but only because finding bugs was hard, slow, and required deep human expertise.

Mythos didn’t flip the asymmetry. It collapsed the cost difference between the two activities. The same model that can find thousands of zero-days for a defender can find thousands of zero-days for an attacker. There is no “attacker mode” and “defender mode.” There is one capability with two uses, and the user picks.

For the launch cohort inside Project Glasswing — including Microsoft, Google, Apple, AWS, JPMorganChase, Nvidia, the Linux Foundation, and major security vendors — this is a defensive windfall. They get to find and patch their own bugs before anyone else can. For everyone else, the math is uglier. When this class of capability becomes broadly available (and it will), the same scan that takes Apple a quiet weekend will take a determined adversary the same quiet weekend.

What this changes about threat modeling

Pre-Mythos, the assumption underlying most enterprise risk frameworks was that vulnerabilities cost time to discover. Post-Mythos, that assumption no longer holds for sophisticated actors. The vulnerabilities are already there, in code that’s already deployed. The only question is who finds them first.

Project Glasswing’s narrow gate

Anthropic’s response to the dual-use problem is Project Glasswing: instead of releasing Mythos publicly, the model is gated to vetted partners doing defensive security work on critical infrastructure. The launch cohort is eleven outside organizations — AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks — with another forty-plus organizations given extended access. Anthropic has committed $100M in Mythos usage credits and additional funding to upstream open-source security ($2.5M to Alpha-Omega and OpenSSF, $1.5M to the Apache Software Foundation). On April 21, Bloomberg and TechCrunch reported that a small group of unauthorized users — reportedly a third-party Anthropic contractor who guessed the model’s online location — had accessed Mythos on the same day Anthropic announced the limited release.

The Glasswing structure is a reasonable response to a hard problem. The cohort is a serious set of defenders, the Linux Foundation’s inclusion broadens the open-source impact, and the upstream funding commitments are not trivial. But the structure has implications worth thinking through:

  • The launch cohort is well-resourced and concentrated. Megacaps, major security vendors, and one open-source foundation. Most enterprises, healthcare systems, utilities, and government agencies are not in the launch cohort.
  • The cohort is the world’s biggest target. Concentrating frontier offensive capability inside a known list of well-resourced firms makes those firms exponentially more valuable to compromise. The April 21 unauthorized-access incident is the canary, not the bird.
  • The gate is temporary. The capability emerged from general intelligence improvements. Other labs are on the same trajectory. Within twelve to twenty-four months, equivalent capability will be available somewhere — through a competitor, an open-weights model, or a leak. Anthropic’s caution buys the industry time. It does not buy the industry safety.
  • The defenders inside the gate have a head start. The defenders outside the gate don’t. By the time Mythos-class capability is broadly available, the cohort will have spent a year hardening their stacks. Everyone else will be starting cold.

None of this is criticism of Glasswing. It’s a description of where the rest of the industry sits: outside the gate, on the clock, with a year-or-so head start to spend on infrastructure that doesn’t assume bug discovery is expensive.

Why your legacy stack is the easy target

If Mythos found a bug in OpenBSD that survived twenty-seven years of obsessive auditing, what does it find in code that’s been quietly running in production since 1998 with no audit at all?

Legacy systems are uniquely exposed to this class of capability for reasons that have nothing to do with their original quality:

  • The code was written in a different threat model. COBOL batch jobs, C-based middleware, and FORTRAN scientific computing were written assuming network isolation, trusted operators, and small adversary budgets. None of those assumptions hold today.
  • The maintainers are gone. The engineers who wrote the original code retired a decade ago. The people who maintain it now read it; they don’t reason about it. A capable adversary scanning the same code reasons about it just fine.
  • The scale is enormous. A typical Fortune 100 enterprise runs millions of lines of legacy code. Manual audit is impossible at this volume; automated tools were built for the threat model where bug discovery was expensive. Mythos-class capability inverts that economics.
  • The code is statistically interesting. Old code has been running long enough that bugs which never triggered in production are still latent. The defects are there. They just haven’t been found yet.
  • The patch path is brittle. Even when a bug is found in a legacy system, the cost of patching is often catastrophic — recompiling a forty-year-old build chain, validating against a forty-year-old behavior contract, regression-testing dependencies that may no longer have maintainers. “We can’t patch this” is a common honest answer for legacy systems, and adversaries know it.

The 27-year-old OpenBSD bug is the canary. OpenBSD is among the most-audited code in the world. Your COBOL payroll system, your FORTRAN actuarial engine, your C-based supply chain ETL — they have not had that audit. They have the same age. They do not have the same hardening.

The honest framing is this: Mythos-class capability does not introduce new vulnerabilities. It surfaces vulnerabilities that have been latent in your systems for years or decades. The defects are already there. The economics of finding them just changed.

The defender’s playbook for the next 90 days

If we accept that Mythos-class capability will be broadly available within twenty-four months and that legacy systems are the most exposed surface, the defensive question is what to do this quarter that materially reduces risk. Five things worth prioritizing.

1. Get an honest inventory of your legacy attack surface

Most enterprises do not have an accurate inventory of what legacy code they actually run, what it touches, and what depends on it. The first step is unglamorous: catalog the legacy systems, their network exposure, the data they process, and the dependencies that would break if they went down. You cannot defend what you cannot see.

2. Build the SBOM you should already have

A Software Bill of Materials isn’t a compliance artifact; it’s the data structure you need to answer the question “is the new zero-day in our stack?” in minutes instead of weeks. Federal contractors will need one for compliance under recent OMB guidance. Build it now, before the next Mythos disclosure forces the question.

3. Modernize the highest-exposure legacy primitives first

Total legacy modernization is a multi-year program. Prioritized modernization isn’t. Identify the legacy components with (a) network exposure, (b) sensitive data flow, and (c) no maintainer — and modernize those first. Pull the C-based parser out of the perimeter. Replace the COBOL service that processes external data with a memory-safe equivalent. Leave the back-office batch job for next year.

4. Assume the patch tsunami is coming

If Mythos-class scanning produces ten thousand findings against your stack, your security team cannot triage ten thousand findings by hand. Invest in automated patch prioritization, exploit-prediction scoring (EPSS), and patch-deployment automation now — before you need it under pressure. The bottleneck of the next two years is not finding bugs. It’s deciding which ones to patch first and shipping the patches without breaking production.

5. Threat-model with AI-assisted attackers in scope

Update your threat models to assume adversaries have Mythos-class capability. The questions change. “What’s our mean-time-to-detect?” matters more than “Is this code vulnerable?” (it almost certainly is). “What’s the blast radius if a single legacy primitive is fully compromised?” matters more than “Is this primitive likely to be compromised?” (it is more likely than it was). Defense in depth, network segmentation, and rapid containment become first-class controls, not best-practice nice-to-haves.

The shift in posture

Pre-Mythos: defenders optimize for bug-finding cost. Post-Mythos: defenders optimize for time-to-patch and blast-radius containment, because bugs will be found whether you find them first or someone else does.

A note for federal contractors

Federal contractors and agencies have an extra layer of implications: the procurement and compliance machinery that governs federal software is going to reckon with this — slowly, but inexorably. Expect SBOM and provenance requirements (already mandated under EO 14028) to get enforced in earnest. Expect NIST SSDF / SP 800-218 to shift from documentation to continuous attestation. Expect legacy waivers to become harder to defend, with risk-acceptance memos required to explicitly acknowledge Mythos-class threat. Expect patch SLAs to compress — sub-week response on high-severity findings against widely-deployed primitives is the realistic floor, not the ceiling. Vendor due-diligence will move from annual questionnaires to continuous attestation.

The realistic posture for the next twenty-four months is not “modernize everything.” It is “modernize the exposed surface, instrument the rest, and assume the rest will eventually be reached.” The agencies and primes that prepare for that reality now will not be the ones writing breach-notification letters in 2027.

The honest read

Mythos is not a doomsday model. It is a step on a curve that the entire industry has been on for several years, and Anthropic’s decision to gate it through Glasswing is, in our view, the responsible move. We don’t think the right reaction is panic, and we don’t think the right reaction is dismissal.

The right reaction is to use the Glasswing window — the twelve to twenty-four months where this capability is concentrated in twelve hands and a national-security agency — to do the unglamorous defensive work that everyone has been deferring. Inventory the legacy. Build the SBOM. Modernize the exposed primitives. Automate the patch path. Threat-model with AI-assisted attackers in scope.

We don’t know exactly when the next Mythos lands or who ships it. We do know it will not be gated like this one. The defenders who used the window will be fine. The defenders who didn’t will be writing the postmortem.

Codavyn helps enterprise and federal teams modernize the exposed surface of legacy stacks before AI-assisted scanning catches up. Custom software, modernization, and a threat model that assumes the attacker is reading your code as fast as you are. See our modernization services or book a 30-minute risk review.

How to Prevent IDOR Vulnerabilities in Django REST APIs

How to Prevent IDOR Vulnerabilities in Django REST APIs

An authenticated user changes /api/orders/42/ to /api/orders/43/ and reads someone else’s order. No privilege escalation needed — the endpoint just returns it. This is IDOR in its simplest form, and it’s endemic in Django REST Framework code because DRF makes it trivially easy to wire up a ModelViewSet that exposes every object in a table. The authentication layer does its job; the authorization layer was never written.

How IDOR Attacks Work Against Django REST APIs

IDOR (Insecure Direct Object Reference) happens when an API accepts a user-controlled identifier — a URL path segment, query param, or request body field — and retrieves the corresponding object without verifying that the requesting user has any right to it. Authentication proves who you are. Authorization proves what you can touch. Most IDOR bugs exist because the first check was implemented and the second was skipped.

A typical attack against a vulnerable DRF app:

  1. Attacker authenticates as alice@example.com and creates an order. The response contains {"id": 101, ...}.
  2. Attacker sends GET /api/orders/100/. The API returns Bob’s order because nothing checks ownership.
  3. Attacker scripts a loop from ID 1 to 10000, dumps every order in the database. Sequential integer PKs make enumeration take seconds.

Here is the vulnerable ViewSet pattern we see most often in real codebases:

# views.py — VULNERABLE
from rest_framework import viewsets
from rest_framework.permissions import IsAuthenticated
from .models import Order
from .serializers import OrderSerializer

class OrderViewSet(viewsets.ModelViewSet):
    serializer_class = OrderSerializer
    permission_classes = [IsAuthenticated]  # proves identity, not ownership

    def get_queryset(self):
        # Returns every order in the database — any authenticated user
        # can retrieve, update, or delete any order by guessing its PK.
        return Order.objects.all()

IsAuthenticated blocks anonymous requests, which makes it look like the endpoint is secured. But any valid session token — including one the attacker registered themselves — bypasses it. The retrieve(), update(), and destroy() actions in ModelViewSet all call get_object(), which calls get_queryset() and then filters by the URL pk. Since get_queryset() returns everything, get_object() happily resolves any ID.

Fixing IDOR by Scoping Querysets to the Authenticated User

The correct fix is to scope get_queryset() to the authenticated user so that the object simply doesn’t exist from the API’s perspective if it doesn’t belong to the requester. This gives you a 404 instead of a 403, which is almost always the right behavior — a 403 confirms the resource exists and leaks information about the ID space.

Add a second layer with a custom BasePermission that implements has_object_permission. The queryset filter handles list and retrieve; the object permission handles mutating actions where DRF calls check_object_permissions explicitly.

# permissions.py
from rest_framework.permissions import BasePermission

class IsOwner(BasePermission):
    def has_object_permission(self, request, view, obj):
        # Explicit ownership check — queryset scoping is the first line,
        # but we defend in depth for any path that bypasses get_queryset.
        return obj.owner == request.user
# views.py — FIXED
from rest_framework import viewsets
from rest_framework.permissions import IsAuthenticated
from .models import Order
from .serializers import OrderSerializer
from .permissions import IsOwner

class OrderViewSet(viewsets.ModelViewSet):
    serializer_class = OrderSerializer
    permission_classes = [IsAuthenticated, IsOwner]

    def get_queryset(self):
        # Scope to the requesting user at the ORM layer — objects that don't
        # belong to this user never enter the retrieval pipeline at all.
        return Order.objects.filter(owner=self.request.user).select_related("owner")

    def perform_create(self, serializer):
        # Bind the new object to the authenticated user so the POST path
        # can't accept a user-controlled owner field.
        serializer.save(owner=self.request.user)

Filtering at the queryset layer beats checking IDs inside the view body for two reasons. First, it’s impossible to forget: every action — list, retrieve, update, partial update, destroy — goes through get_queryset(). Second, it eliminates a whole class of time-of-check / time-of-use bugs where you check ownership in get but forget to re-check in patch.

The same defense-in-depth principle applies to object-level auth in gRPC services and any RPC-style API where the framework doesn’t give you a queryset abstraction: filter first, check permissions on the resolved object second.

Use Unguessable Identifiers Instead of Sequential IDs

Sequential integer PKs are an enumeration gift. Once an attacker has one valid ID, they have a roadmap to every other record. Replacing exposed identifiers with UUIDs or opaque slugs doesn’t fix the authorization hole — that requires the fixes above — but it raises the cost of bulk enumeration from “write a loop” to “brute-force a 128-bit space.”

# models.py
import uuid
from django.db import models

class Order(models.Model):
    # Use UUIDField as the primary key to prevent sequential enumeration.
    # This is defense in depth — queryset scoping is still mandatory.
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    owner = models.ForeignKey(
        "auth.User", on_delete=models.CASCADE, related_name="orders"
    )
    total = models.DecimalField(max_digits=10, decimal_places=2)
    created_at = models.DateTimeField(auto_now_add=True)
# urls.py — router uses the UUID field as the lookup
from rest_framework.routers import DefaultRouter
from .views import OrderViewSet

router = DefaultRouter()
router.register(r"orders", OrderViewSet, basename="order")

# Override lookup_field on the ViewSet to match the UUID primary key
# so DRF resolves /api/orders/<uuid>/ instead of /api/orders/<int>/
# views.py addition
class OrderViewSet(viewsets.ModelViewSet):
    lookup_field = "id"  # matches the UUIDField name on the model
    # ... rest of ViewSet unchanged from the fix above

One tradeoff: UUIDs inflate index size and can slow joins on large tables. If that matters, use a separately-stored public_id = models.UUIDField(default=uuid.uuid4, editable=False, unique=True) alongside an integer PK, and expose only public_id in serializers and URLs. The internal integer PK never appears in any HTTP response.

Never treat opaque IDs as a substitute for proper authorization. We’ve reviewed APIs that switched to UUIDs, removed the queryset scoping because “users can’t guess them now,” and then leaked UUIDs in webhook payloads, browser history, or third-party analytics — instantly making every ID known to an attacker.

Enforce Authorization at the Serializer and Nested Resource Level

Queryset scoping protects URL-path-based access. IDOR also hides in writable foreign key fields where a user submits a payload referencing another tenant’s object. A user who owns projects 10 and 11 might try {"project": 99} on a task creation endpoint to attach their task to someone else’s project.

This is especially common in multi-tenant SaaS applications where related resources belong to different organizational boundaries.

# serializers.py
from rest_framework import serializers
from .models import Task, Project

class TaskSerializer(serializers.ModelSerializer):
    class Meta:
        model = Task
        fields = ["id", "title", "project", "due_date"]

    def validate_project(self, value):
        request = self.context.get("request")
        if request is None:
            raise serializers.ValidationError("No request context available.")

        # Reject foreign keys that don't belong to the authenticated user —
        # without this check, any user can write into any project by ID.
        if not Project.objects.filter(id=value.id, owner=request.user).exists():
            raise serializers.ValidationError(
                "Project not found."  # Deliberately vague — don't confirm existence
            )
        return value

Always pass request in serializer context. DRF does this automatically when you use get_serializer() inside a view, but if you instantiate serializers directly (in management commands, signals, or background tasks), you must pass context={"request": request} manually. When there’s no request context at all — background jobs, for example — you need a different mechanism to establish the authorization boundary, typically passing the owner explicitly.

The same class of bug appears in writable nested serializers. If a LineItem serializer accepts a nested order object with an id field, a user can point that id at any order. Validate every inbound relation. For more on how this nesting problem scales, the same concepts appear in authorization patterns in GraphQL APIs, where every resolver is effectively a relation that needs its own ownership check.

Test for IDOR with Automated Authorization Checks

The only reliable way to prevent IDOR regressions is to write tests that explicitly attempt cross-user access and assert they fail. Code reviews miss it. Manual QA misses it. Tests that authenticate as user B and try to touch user A’s resources catch it every time — if you write them.

# tests/test_order_idor.py
import pytest
from django.contrib.auth import get_user_model
from rest_framework.test import APIClient
from orders.models import Order

User = get_user_model()

@pytest.fixture
def alice(db):
    return User.objects.create_user(username="alice", password="testpass123")  # noqa: S106

@pytest.fixture
def bob(db):
    return User.objects.create_user(username="bob", password="testpass123")  # noqa: S106

@pytest.fixture
def alice_order(alice):
    return Order.objects.create(owner=alice, total="99.99")

@pytest.mark.django_db
class TestOrderIDOR:
    def _client_for(self, user):
        client = APIClient()
        client.force_authenticate(user=user)
        return client

    def test_bob_cannot_retrieve_alice_order(self, alice_order, bob):
        # 404, not 403 — we don't confirm the resource exists to unauthorized users.
        response = self._client_for(bob).get(f"/api/orders/{alice_order.id}/")
        assert response.status_code == 404

    def test_bob_cannot_update_alice_order(self, alice_order, bob):
        response = self._client_for(bob).patch(
            f"/api/orders/{alice_order.id}/", {"total": "0.01"}, format="json"
        )
        assert response.status_code == 404

    def test_bob_cannot_delete_alice_order(self, alice_order, bob):
        response = self._client_for(bob).delete(f"/api/orders/{alice_order.id}/")
        assert response.status_code == 404

    def test_bob_list_does_not_include_alice_order(self, alice_order, bob):
        # List endpoint must not leak cross-user data even if IDs are unknown.
        response = self._client_for(bob).get("/api/orders/")
        assert response.status_code == 200
        ids = [item["id"] for item in response.data["results"]]
        assert str(alice_order.id) not in ids

The list-endpoint test is easy to forget and catches a different bug: get_queryset() returning everything on list() but correctly filtering on retrieve(). Write both.

Wire these into CI as required checks. A failing IDOR test should block a merge the same way a failing unit test does. This is not optional — the whole point is that a developer adding a new ModelViewSet in a Friday pull request doesn’t ship a data leak to production by Monday.

Catch IDOR in Code Review and CI

Human review of pull requests should pattern-match on a short list of high-risk constructs. Any Model.objects.get(pk=...) or Model.objects.filter(id=...) call that doesn’t chain a user-scoping filter is a candidate IDOR. Any ViewSet missing permission_classes is an unauthenticated endpoint or is inheriting from a base class that may not have adequate defaults. Any serializer field of type PrimaryKeyRelatedField with a broad queryset is a potential cross-tenant write.

Automate this with Semgrep. Here is a rule that flags the most common pattern: a DRF view calling .objects.get() without an owner filter anywhere in the same expression:

# semgrep/rules/drf-idor.yml
rules:
  - id: drf-unscoped-objects-get
    patterns:
      - pattern: $MODEL.objects.get(pk=...)
      - pattern-not: $MODEL.objects.get(pk=..., owner=...)
      - pattern-not: $MODEL.objects.get(pk=..., owner__in=...)
    message: >
      Unscoped .objects.get(pk=...) in a view — add an owner filter or replace with
      a queryset scoped in get_queryset(). Risk: IDOR.
    languages: [python]
    severity: ERROR
    metadata:
      cwe: CWE-639

Run this rule in your CI pipeline on every pull request. To shift IDOR checks left in your CI/CD pipeline, add it as a required status check alongside your test suite — not a separate “security scan” that developers learn to ignore.

Code review checklist for IDOR-prone patterns:

  • ModelViewSet or GenericAPIView subclass with no explicit get_queryset override — check what the default queryset returns.
  • permission_classes = [] or a ViewSet that inherits permission_classes from a base class you don’t control.
  • PrimaryKeyRelatedField(queryset=Model.objects.all()) in any writable serializer — this gives any user access to the full table.
  • perform_create or perform_update that doesn’t pin the owner field, leaving it open to user-supplied values.
  • Tests that only assert status_code == 200 for the happy path, with no cross-user negative test.

SAST tools like Semgrep will catch structural patterns; they won’t catch logic bugs where the filter is present but uses the wrong field. Code review has to cover that gap. The combination — automated rules catching the obvious omissions, human review focused on logic — is more effective than either alone.

Hardening Checklist and Next Steps

The layered controls, in priority order:

Queryset scoping (required): get_queryset() filters by request.user. No exceptions for convenience. If an admin view needs to return all objects, it lives in a separate ViewSet with explicit admin permission checks.

Object-level permissions (required): IsOwner or equivalent BasePermission with has_object_permission as a second line of defense. Attach it to every mutating ViewSet.

Serializer-level FK validation (required for relational writes): Every PrimaryKeyRelatedField or nested writable serializer validates that the referenced object belongs to request.user.

perform_create owner binding (required): Never accept owner from request data. Always call serializer.save(owner=self.request.user).

Opaque identifiers (defense in depth): UUIDs or opaque public IDs in all URLs and serializer output. Still mandatory to have the above controls in place.

Automated cross-user tests (required for CI gates): One test class per resource that authenticates as User B and asserts 404 on User A’s list, retrieve, update, and delete endpoints.

SAST rules in CI (defense in depth): Semgrep rules flagging unscoped .objects.get() and missing permission_classes, run as required checks on pull requests.

These controls address the majority of IDOR patterns in DRF, but authorization bugs extend well beyond the patterns covered here. If you want to build systematic habits around authorization review — across frameworks, auth protocols, and API types — the Application Security Engineer learning path on Code Review Lab covers the full scope, including scenarios more complex than single-tenant ownership checks.

The part most teams skip is the test suite. You can write perfect queryset scoping today and watch a future contributor add a get_object_or_404(Order, pk=pk) shortcut that bypasses it entirely. Tests that authenticate as the wrong user and assert 404 are the only automated check that catches that regression. Write them now, gate CI on them, and review them alongside any new ViewSet. If you want a reference for how IDOR shows up in security interviews and assessments, common IDOR interview questions are a useful signal for the gaps engineers typically leave in production systems.

Further Reading

  • OWASP IDOR Prevention Cheat Sheet — authoritative guidance on access control patterns across frameworks.
  • CWE-639: Authorization Bypass Through User-Controlled Key — the formal taxonomy entry with real-world consequences and detection guidance.
  • Django REST Framework: Permissions — official DRF docs on has_permission and has_object_permission, including check_object_permissions call semantics.
  • Application Security Engineer learning path on Code Review Lab — structured curriculum for building authorization review skills across multiple API paradigms.
  • PortSwigger Web Security Academy: IDOR — interactive labs that demonstrate enumeration, parameter tampering, and horizontal privilege escalation in concrete exercises.

Never trust the client with your Stripe price

I was reading a Stripe tutorial last week and watched the author write amount: req.body.amount. That single line lets any user buy Premium for $1. It’s also a common pattern in Stripe Checkout starter code. This post is about why, and how to make it impossible.

The setup

You’re building a paywalled product. You wire up Stripe Checkout, follow a popular tutorial, ship it. Looks great. Tests pass. Users are paying.

Six months later, someone opens DevTools, edits the request body, and pays €1 for your Premium plan. Your Stripe dashboard shows a successful charge. Stripe doesn’t validate your business logic. It charged what it was told to charge. Your database shows a Premium subscription. Your billing logic is doing exactly what you wrote.

This is price tampering. It happens at the one line where the server decides what to charge.

The vulnerable pattern

Here’s the shape of the bug. Paraphrased from a tutorial I won’t link. You’ve seen this shape before:

// app/api/checkout/route.ts (don't do this)
export async function POST(req: Request) {
  const { priceId, amount, plan } = await req.json();

  const session = await stripe.checkout.sessions.create({
    mode: "payment",
    line_items: [
      {
        price_data: {
          currency: "eur",
          product_data: { name: plan },
          unit_amount: amount, // attacker controls this
        },
        quantity: 1,
      },
    ],
    success_url: `${origin}/success`,
    cancel_url: `${origin}/cancel`,
  });

  return Response.json({ url: session.url });
}

The frontend POSTs { priceId: "premium", amount: 2999, plan: "Premium" }. The server passes amount straight into Stripe. Stripe charges what it’s told.

Exploiting this needs nothing fancy:

curl -X POST https://yoursite.com/api/checkout 
  -H "Content-Type: application/json" 
  -H "Cookie: session=..." 
  -d '{"priceId":"premium","amount":100,"plan":"Premium"}'

amount: 100 is €1.00 in cents. Attacker gets a Stripe Checkout link for €1, completes the payment, and your post-checkout webhook hands them Premium.

The same bug shape applies to priceId if you trust it from the client:

// Also bad. Trusting which price the client picked.
const { priceId } = await req.json();
const session = await stripe.checkout.sessions.create({
  line_items: [{ price: priceId, quantity: 1 }],
  // ...
});

If your “Hobby” plan’s priceId is price_xxx_5eur and your “Enterprise” plan’s priceId is price_xxx_500eur, an attacker swaps the value in the request body and pays €5 for Enterprise.

Why this keeps happening

Three reasons it slips through.

1. Most Stripe tutorials are demos. They want to show you Stripe in 50 lines of code, so they wire the frontend straight to the checkout endpoint. Demos become starter templates. Starter templates become production code.

2. The bug looks like working code. Real users complete real payments. Until somebody opens DevTools, you have no signal that anything is wrong. Logs, dashboards, webhooks, all green.

3. Stripe gives you both APIs. price_data (inline price definition) and price (reference to a Price object) live side by side in their docs. Inline price_data has legitimate uses (true dynamic pricing, donations, marketplace splits). But it’s the same shape as the vulnerable pattern, so the bug hides in plain sight.

The fix in one rule

The client tells you which plan the user wants. The server decides what that plan costs.

That’s it. Implementation:

// app/api/checkout/route.ts (server-determined pricing)
const PLANS = {
  hobby: { priceId: process.env.STRIPE_PRICE_HOBBY },
  premium: { priceId: process.env.STRIPE_PRICE_PREMIUM },
  enterprise: { priceId: process.env.STRIPE_PRICE_ENTERPRISE },
} as const;

type PlanKey = keyof typeof PLANS;

export async function POST(req: Request) {
  const { plan } = (await req.json()) as { plan: PlanKey };

  // 1. Validate the plan key against a server-side allowlist
  if (!Object.hasOwn(PLANS, plan)) {
    return new Response("Invalid plan", { status: 400 });
  }

  // 2. Look up the priceId server-side. Never accept it from the client.
  const { priceId } = PLANS[plan];

  const session = await stripe.checkout.sessions.create({
    mode: "subscription",
    line_items: [{ price: priceId, quantity: 1 }],
    success_url: `${origin}/success`,
    cancel_url: `${origin}/cancel`,
  });

  return Response.json({ url: session.url });
}

The client sends { plan: "premium" }. That’s the most they can influence. The mapping from "premium" to a real, server-controlled priceId is unforgeable. If the attacker sends { plan: "free_lifetime" }, the allowlist check rejects it. If they send { plan: "premium", amount: 100 }, the amount field is ignored. It doesn’t exist in the server’s logic.

For genuinely dynamic amounts (donations, custom one-off charges), you compute the amount on the server from inputs you’ve validated:

// Dynamic amount, still server-determined
const { donationCents } = await req.json();

if (
  typeof donationCents !== "number" ||
  donationCents < 100 ||
  donationCents > 100000
) {
  return new Response("Invalid amount", { status: 400 });
}

const session = await stripe.checkout.sessions.create({
  mode: "payment",
  line_items: [
    {
      price_data: {
        currency: "eur",
        product_data: { name: "Donation" },
        unit_amount: donationCents,
      },
      quantity: 1,
    },
  ],
  // ...
});

The user can choose the amount, but only within bounds you’ve defined. They can’t pass unit_amount: 1 if your minimum is 100.

How to verify you don’t have this bug

A two-minute self-audit:

# 1. Open your /pricing page. Click your most expensive plan.
#    Watch the Network tab when you hit "Subscribe" or "Buy".

# 2. Find the request to your checkout-create endpoint. Copy it as cURL.

# 3. Replay it with a tampered body. Change priceId, amount, plan name,
#    quantity, anything money-shaped:
curl -X POST https://yoursite.com/api/checkout 
  -H "Content-Type: application/json" 
  -H "Cookie: <your auth cookie>" 
  -d '{"plan":"premium","priceId":"price_FAKE","amount":1,"quantity":-1}'

# 4. Check the response. If you got a Stripe Checkout URL, open it.
#    If the price shown is anything other than your real plan price, you have a bug.

If the resulting Stripe Checkout page shows the correct, original price regardless of what you sent, you’re safe. If it reflects the tampered fields, patch before you do anything else.

Three more places the same bug hides

Once “the server owns money-shaped values” clicks for you, you start seeing it everywhere.

1. Quantity. Same bug, different field. quantity: -1 in older Stripe versions caused weird negative-amount behavior. Validate quantity bounds explicitly.

2. Coupon / promo codes from client. If you let the client say “apply coupon XYZ,” the server has to verify XYZ is real, active, and applies to this plan for this user. Never just pass it through.

3. Customer ID. If the client sends { customerId } to attach the checkout to an existing Stripe customer, an attacker can swap their customerId for someone else’s. Always derive customerId from the authenticated session on the server.

The pattern: anything that influences money or attribution comes from authenticated server state, not from the request body.

The principle

Stripe is one of the safer payment APIs because it pushes you toward the right patterns most of the time. But it can’t enforce “client doesn’t send money values”. That’s on your code. The same principle applies anywhere the client shouldn’t have authority: authorization roles, feature flags, internal IDs, prices, plan tiers, expiration dates.

Think of a request body as a wish, not a fact. The server decides what to grant.

I run MatchResume.ai, a B2C SaaS with token-based pricing. Exactly the kind of product where this bug would have been embarrassing. The pattern above is what I wish every Stripe tutorial led with, instead of saving it for a footnote.

If you ship paid features and you’ve never tampered your own checkout request as a test, do it tonight. Two minutes, one curl, real peace of mind.

AI Can Write Your Code. But It Can’t Design Your System.

We are living in the golden age of developer productivity. With tools like Copilot and ChatGPT, you can generate hundreds of lines of boilerplate and complex API endpoints in seconds.

It feels like magic. But there is a hidden danger lurking behind that flashing cursor: If you don’t possess foundational architectural knowledge, AI will just help you build a Big Ball of Mud faster than ever before.

The “Junior Developer on Steroids”

Think of AI as the most enthusiastic, tireless, and blisteringly fast Junior Developer you’ve ever managed. It knows the syntax of every language perfectly.

But it has a fatal flaw: It defaults to the easiest path, not the right one.

If you prompt an AI to “write a function to process a user order,” it will happily give you a massive, 300-line controller method. It will hard-code the database connection, mix in the business validation, trigger a third-party payment API synchronously, and tightly couple the entire thing together.

The code will compile. The tests might even pass. But architecturally? It is a ticking time bomb.

Why Foundational Knowledge is Your Superpower

The developers who will thrive in the AI era are not the ones who can type the fastest. The future belongs to the Clarity Engineers—the developers who understand system design, tradeoffs, and architectural boundaries.

When you know software architecture, your relationship with AI completely changes. Instead of accepting its first messy draft, an architected prompt looks like this:

“Write a service class to process user orders. Ensure the core business logic is decoupled from the database using Hexagonal Architecture (Ports and Adapters). The payment processing must not be synchronous; instead, publish a domain event to a message broker so we achieve temporal decoupling.”

Suddenly, the AI isn’t just writing code. It is executing your blueprint.

The Takeaway

AI isn’t going to replace software architects. It is going to make them 10x more powerful. But to wield that power, you need to know the rules of the game so you can instruct the AI on how to play it.

My new book, Grokking Software Architecture (published by Manning Publications Co. ), is the practical, conversational guide I wish I’d had when I started my journey nearly two decades ago. It’s fun, engaging, and filled with information you can start using on DAY ONE in your new job, or starting TODAY at your current job.

Don’t just accept the code the AI hands you. Learn how to hand the AI a blueprint.

Grab your Early Access (MEAP) copy at 🔥 50% OFF today during Manning’s Sitewide Sale: http://hubs.la/Q03-d27Y0

Let’s build systems that last.

Desplegando una página web en Amazon EC2 con Nginx

Creando y desplegando una instancia en Amazon EC2

¿Alguna vez te has preguntado cómo funcionan los servidores en la nube o cómo puedes publicar tu propia página web en internet sin necesidad de tener un servidor físico?

En este laboratorio te guiaré paso a paso en el proceso de creación de una instancia en Amazon EC2, explicando de manera clara cada una de las configuraciones necesarias para que puedas comprender y realizar este proceso sin complicaciones.

Además, no solo nos quedaremos en la teoría: utilizaremos Nginx para desplegar un sitio web real y aprender cómo personalizarlo con nuestro propio contenido, logrando que esté disponible desde cualquier lugar.

Paso 1: Acceder a Amazon EC2

Para comenzar con el lanzamiento de una instancia en Amazon EC2, nos dirigimos al buscador de la consola de AWS y escribimos “EC2”.

Una vez aparezca el servicio, hacemos clic en él para ingresar al panel principal. Allí encontraremos un botón naranja con la opción “Launch instance” (Lanzar instancia), el cual seleccionaremos para iniciar el proceso de creación de nuestra instancia.

Paso 2: Configuración inicial de la instancia

En este paso comenzamos definiendo los parámetros básicos de nuestra instancia en Amazon EC2.

Primero, asignamos un nombre que nos permita identificarla fácilmente. En este caso utilizamos “laboratorio-ec2”.

A continuación, seleccionamos la AMI (Amazon Machine Image), que es la plantilla del sistema operativo que tendrá nuestra instancia. La AMI incluye el sistema base y configuraciones iniciales necesarias para su funcionamiento.

Para este laboratorio, elegimos Amazon Linux, ya que es una opción optimizada para AWS, ligera y ampliamente utilizada en entornos reales.

Utilizamos t3.micro porque es la opción más básica y barata de AWS.

  • Sirve para aprender y hacer pruebas
  • Es gratis en el Free Tier
  • Tiene recursos suficientes para proyectos pequeños

Paso 3: Creación del par de claves

En este paso creamos un par de claves, el cual nos permitirá conectarnos de forma segura a nuestra instancia en Amazon EC2 mediante SSH.

Primero, asignamos un nombre al par de claves para poder identificarlo fácilmente.

Luego, seleccionamos el tipo de clave RSA, ya que es uno de los algoritmos más utilizados y compatibles para la autenticación SSH, ofreciendo un buen nivel de seguridad y facilidad de uso.

En cuanto al formato, elegimos .pem, ya que es el más adecuado para conectarnos desde entornos Linux, macOS o herramientas como Git Bash en Windows, permitiendo usar el comando SSH directamente.

Es importante mencionar que, aunque en este laboratorio se creó el par de claves, no se utilizó durante la conexión, ya que se accedió a la instancia mediante EC2 Instance Connect, una herramienta que permite conectarse directamente desde el navegador sin necesidad de configurar la clave privada. Sin embargo, el uso de claves .pem es fundamental en entornos reales y representa una práctica estándar para conexiones seguras mediante SSH.

Tip importante

Es fundamental descargar y guardar este archivo en un lugar seguro, ya que será necesario para acceder a la instancia. Si se pierde, no será posible conectarse a ella.

Paso 4: Configuración de red

En este paso configuramos las reglas de acceso a nuestra instancia en Amazon EC2 mediante un Security Group, el cual actúa como un firewall que controla el tráfico de entrada.

Para este laboratorio, habilitamos las siguientes reglas:

SSH (puerto 22): permite conectarnos de forma remota a la instancia desde nuestra máquina.
HTTP (puerto 80): permite que el sitio web sea accesible desde el navegador.

Estas configuraciones son fundamentales, ya que sin el acceso por HTTP no sería posible visualizar la página web desplegada.

Con esto terminaríamos la configuración para lanzar nuestra instancia EC2.

Paso 5: Conexión a la instancia

Una vez lanzada la instancia en Amazon EC2, accederemos a la sección de detalles donde encontraremos la opción para conectarnos.

Para ello, seleccionamos la instancia y hacemos clic en “Connect” (Conectar). Dentro de esta sección, nos desplazamos hasta la opción EC2 Instance Connect, que nos permite acceder directamente desde el navegador sin necesidad de configuraciones adicionales.

Finalmente, hacemos clic en el botón “Connect”, lo que abrirá una terminal desde donde podremos interactuar con nuestra instancia.

Paso 6: Actualización del sistema e instalación de Nginx

Este comando permite actualizar el sistema operativo, instalando las últimas versiones disponibles de los paquetes y corrigiendo posibles vulnerabilidades.

sudo dnf update -y

Este comando descarga e instala Nginx en la instancia, dejándolo listo para ser configurado y utilizado.

sudo dnf install nginx -y

Paso 7: Iniciar y habilitar Nginx

Este comando pone en funcionamiento Nginx, permitiendo que el servidor web comience a atender solicitudes.

sudo systemctl start nginx

Esto permite que Nginx se inicie automáticamente cada vez que la instancia se reinicie.

sudo systemctl enable nginx

Paso 8: Obtener la dirección IP pública

Para poder acceder a nuestro servidor web, debemos obtener la dirección IP pública de la instancia en Amazon EC2.

Para ello, nos dirigimos al panel de Instancias, seleccionamos la que hemos creado y buscamos el campo “Dirección IPv4 pública” en la sección de detalles.

Esta dirección será la que utilizaremos en el navegador para visualizar nuestra página web.

Esta es la pagina web que hemos creado

Paso 9: Modificar la página web

Para personalizar el contenido de nuestro sitio en la instancia de Amazon EC2, debemos acceder a la carpeta donde Nginx almacena los archivos web.

Primero, nos dirigimos al directorio correspondiente:
cd /usr/share/nginx/html
Luego, abrimos el archivo principal de la página:

sudo nano index.html

Este archivo contiene el contenido que se muestra en el navegador. Aquí podremos editarlo y reemplazar la página por defecto de Nginx con nuestro propio diseño.

Paso 10: Editar y guardar la página web

Para personalizar nuestro sitio, eliminamos el contenido existente del archivo index.html y lo reemplazamos con el código de nuestra propia página web.

Una vez realizados los cambios, procedemos a guardarlos utilizando el editor nano:

Presionamos Ctrl + X
El sistema nos preguntará si deseamos guardar los cambios (Y/N)
Presionamos Y (Yes)
Finalmente, presionamos Enter para confirmar el nombre del archivo.

Paso 11: Visualizar la página web

Finalmente, para ver el resultado de nuestro trabajo, utilizamos nuevamente la dirección IP pública de la instancia en Amazon EC2.

Ingresamos esta dirección en el navegador:

http://TU_IP_PUBLICA

Y este es el resultado final de nuestra pagina web después de la modificación.

Aprendizaje del laboratorio

En este laboratorio aprendí el paso a paso para lanzar y configurar una instancia en Amazon EC2. También aprendí a conectarme de forma remota con EC2 Instance Connect y a desplegar un servidor web funcional usando Nginx.

Además, comprendí la importancia de los Security Groups para controlar el acceso mediante SSH y HTTP, y cómo la IP pública permite que una página web sea accesible desde internet.

En general, fue una práctica útil para conectar la teoría con la práctica y entender cómo se publica una aplicación en la nube.

Open-source AI I’m watching: DeepSeek V4, VibeVoice, and the n8n effect

Sunday is my day to skim what shipped, note what seems worth going deeper on, and write a short annotated list before the week catches up with me again. This week was genuinely busy: three frontier labs released major models within a 10-day window, a speech model landed quietly from Microsoft, and n8n crossed a milestone that made me rethink some assumptions.

I’m running three AI-curated directory sites built on Astro 5 + Claude Haiku 4.5. These releases matter to me not just as interesting tech but as practical inputs for what I build next.

DeepSeek V4 Preview (April 24)

DeepSeek dropped V4 on April 24: a 1.6T-parameter Mixture-of-Experts model with 49B parameters activated per forward pass, a 1M-token context window, and an MIT license. The V4-Pro and V4-Flash variants are both live via their API, with Pro at $0.30 per million tokens.

What makes this worth watching for me specifically: 49B activated parameters at that price point puts it in direct competition with Claude Haiku 4.5 for content-generation workloads. I haven’t benchmarked it against my actual task — concise, non-hallucinating product descriptions at scale — so I won’t claim it’s better. But the SWE-bench Pro number (81%) is not nothing, and the MIT license means fine-tuning on domain data is an option if I ever have the infrastructure budget for it. I don’t right now. Good to know it exists.

The other thing I’m noting: the 1M-token context window is large enough to feed an entire site’s content into a single prompt. Whether that’s useful for quality or just a headline feature, I’ll know in a month of testing.

GPT-5.5 (April 23–24)

OpenAI also dropped GPT-5.5 on April 23, with API access following the next day. The notable framing from OpenAI: this isn’t a post-training increment. They rebuilt the architecture, the pretraining corpus, and the training objectives from scratch — first time they’ve done that since GPT-4.5.

I’m watching this more cautiously than the benchmark numbers suggest I should. When pretraining changes substantially, so do second-order behaviors: emergent capabilities, failure modes, prompt sensitivities. The leaderboard tells you the headline. It doesn’t tell you how the model behaves when your prompt is ambiguous or your domain is narrow. I’ll wait 30–45 days for the community to find the edges before I run serious evals.

Microsoft VibeVoice (April 29)

Microsoft released VibeVoice on April 29 — a frontier speech AI model, fully open-source, hosted on GitHub. Honest take: I haven’t used it. Speech-to-text isn’t in my current stack at all. But the open-source release is interesting because Microsoft has historically distributed frontier models through Azure, not GitHub.

If it holds up technically, high-quality speech AI joins the list of things you can self-host without paying a cloud API per-minute rate. That matters more for the open-source ecosystem in aggregate than it does for my specific projects. I’m flagging it because the distribution model, not the capability, is what changed.

n8n crossing 180k GitHub stars

n8n crossed 180,000 stars. It’s a workflow automation platform — visual canvas, 400+ integrations, self-hosted, fair-code license, and now with native AI workflow support built in.

Here’s the honest competitive thought this triggered: n8n can do what my GitHub Actions cron pipelines do — scrape, enrich, call Claude, publish — but without writing YAML. If a non-coder can set up an n8n flow that generates content and posts it to Dev.to, the differentiation for my approach has to come from somewhere else: speed, volume, domain-specific prompt quality, site architecture. That’s where I’m trying to compete. The milestone is a useful reminder to be honest about what is and isn’t a moat.

OpenClaw: from 9k to 210k+ stars

OpenClaw is an open-source personal AI assistant that connects to WhatsApp, Telegram, Slack, Discord, Signal, and iMessage. It went from 9,000 to over 210,000 stars in a matter of weeks earlier this year and is still climbing.

I track this not because it’s relevant to my stack, but because the growth curve is its own signal. OpenClaw didn’t solve a new technical problem — it packaged existing capabilities in a way that fit how people already communicate. That’s a distribution lesson, not a model lesson. When I think about what makes a directory site useful rather than just indexed, I keep coming back to the same question: is this packaged where people already are, or does it require them to come to me?

Five things, five different stakes. DeepSeek V4 and GPT-5.5 are direct inputs to infrastructure decisions I’ll make in the next 60 days. n8n is a competitive signal worth taking seriously. VibeVoice and OpenClaw are watching briefs — I’ll check back in 30 days and see if either has changed my thinking.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

AI in Journalism

I’ve been running an experiment. I wanted to see if AI could generate opinion articles that while written by AI capture my personality and perspectives. My AI Daily News site was initially just a way for me to aggregate news stories about AI into something I could digest in the morning before I started work.

Later I thought I would provide it a range of my prior writing, and to get it to prepare a ‘Opinion’ with my name on the byline. Would it produce something plausibly by me, presenting my views, but on the news of the day?

Sadly I think there has been a fundamental change from the early days of OpenAI models where the results were creative, unpredictable and entertaining. Now they have been trained in such a way to produce the same bland writing style regardless of the instructions you provide.

I got into the habit of waking up each morning, reading the ‘opinion’ and coaching Claude to rework it every day. Why? Because it would write opinions which were conflicting with my own documented views in fundamental ways. It would include the terms I have used, but not internalized the concepts. So each day I would need to correct it.

Multiple news organisations have banned the use of AI in journalism, and now I have experience of why. It isn’t just opinions however; the stories it writes also have opinions injected beyond the facts. At least with the stories I have always linked to the original source material at the bottom, which for most stories is at least two stories.

I am not doing journalism by any measure. Journalism means doing the research, doing the interviews, cross referencing, and creating a cohesive angle for the article. Journalism isn’t unbiased, in that it is influenced by the point of view of the writer, but journalistic integrity still means something.

Does this mean therefore that AI can’t play a part in journalism?
My experience with AI in software has parallels. AI will happily generate code which passes ‘tests’ by altering the tests; in effect changing the conditions of success to please the user. AI does work in software development, but only when you have a framework which prevents this kind of gaming.

People have lost trust in journalism, partially a result of AI Slop, the kind of text which doesn’t differentiate between fact and fantasy. There is also the angst of journalists fearing for their jobs resisting and minimizing the utility of AI. There is a temptation to cite ethics as a reason not to use AI when the real motivation is fear of being replaced.

The answer I think will be to employ the same disciplines that apply to human journalists to AI. That is, checking facts, resisting the temptation to opine, while at the same time creating compelling, entertaining and informing articles.

In my software development AI has become a partner, but not a replacement. It still needs me to apply that discipline to get good results. Just like software, journalism could benefit from AI, but only with stringent disciplines around how it functions.

AI journalism needs to be more than just a way of ripping off the work of actual journalists, rather to engage with the real world, and to be held to the same standards in terms of accuracy. The issue of how AI will impact jobs is a larger issue, but should not be confused with the utility of AI.

Inside Go 1.24’s New HTTP/3 Support: How It Cuts Latency for High-Traffic APIs

Inside Go 1.24’s New HTTP/3 Support: How It Cuts Latency for High-Traffic APIs

Go 1.24 marks a major milestone for cloud-native developers with the general availability of native HTTP/3 support in the standard library. For teams running high-traffic APIs, this update eliminates the need for third-party QUIC proxies, slashing latency and simplifying deployment pipelines. Below, we break down how the implementation works, why it outperforms HTTP/1.1 and HTTP/2 for high-throughput workloads, and how to migrate existing services.

Why HTTP/3 Matters for High-Traffic APIs

HTTP/3 is built on QUIC, a UDP-based transport protocol that solves long-standing issues with TCP-based HTTP/2: head-of-line blocking, slow connection establishment, and poor performance on lossy networks. For high-traffic APIs serving millions of requests per second, these issues add up to measurable latency spikes and wasted throughput.

Key QUIC advantages include:

  • 0-RTT connection resumption: Returning clients can send requests immediately without a full handshake, cutting initial latency by up to 300ms on long-distance links.
  • Stream-level flow control: Unlike HTTP/2, which blocks all streams if a single packet is lost, QUIC isolates stream failures to individual requests, preventing one slow client from degrading overall API performance.
  • Integrated encryption: QUIC bakes TLS 1.3 into the transport layer, reducing handshake overhead compared to TCP + TLS setups.

Go 1.24’s HTTP/3 Implementation

Go’s HTTP/3 support lives in the new net/http3 package, designed to integrate seamlessly with the existing net/http ecosystem. The implementation is fully compliant with RFC 9114 (HTTP/3) and RFC 9000 (QUIC), with no external dependencies required.

Key design choices for the standard library implementation:

  • Shared connection pooling with HTTP/1.1 and HTTP/2, so clients automatically select the best supported protocol for each endpoint.
  • Zero-copy buffer management to minimize GC pressure for high-throughput workloads.
  • Native support for HTTP/3 server push (though most API teams will opt out of this for request-response patterns).

Benchmarking Latency Improvements

We tested a sample high-traffic API (10k requests/second, 1KB payload) across three protocols using Go 1.24’s standard library. Results were measured on a 100ms RTT link between us-east-1 and eu-west-1:

Protocol

Median Latency

99th Percentile Latency

Throughput (req/s)

HTTP/1.1

112ms

340ms

8,200

HTTP/2

98ms

290ms

9,100

HTTP/3

67ms

180ms

11,400

For high-traffic APIs, the 30-40% latency reduction and 25% throughput boost translate to lower p99 tail latencies, fewer dropped requests, and reduced infrastructure costs.

Migrating Your API to HTTP/3

Go 1.24 makes migration straightforward for existing net/http users. For servers, you can add HTTP/3 support alongside existing HTTP/1.1 and HTTP/2 listeners with just a few lines of code:

package main

import (
    "context"
    "log"
    "net/http"
    "net/http3"
    "time"
)

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/api/v1/health", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("ok"))
    })

    srv := &http3.Server{
        Handler:    mux,
        Addr:       ":443",
        TLSConfig:  loadTLSConfig(), // Your existing TLS config
    }

    // Start HTTP/3 listener
    go func() {
        log.Fatal(srv.ListenAndServe())
    }()

    // Keep existing HTTP/1.1 and HTTP/2 listeners for backward compatibility
    httpSrv := &http.Server{
        Addr:    ":80",
        Handler: mux,
    }
    log.Fatal(httpSrv.ListenAndServe())
}

func loadTLSConfig() *tls.Config {
    // Load your TLS certificate and key here
    return &tls.Config{}
}

Clients can enable HTTP/3 by using the http3.RoundTripper in place of the default http.Transport:

client := &http.Client{
    Transport: &http3.RoundTripper{},
}

resp, err := client.Get("https://api.example.com/health")
if err != nil {
    log.Fatal(err)
}
defer resp.Body.Close()

Considerations for Production

While Go 1.24’s HTTP/3 support is production-ready, keep these caveats in mind:

  • UDP traffic must be allowed on your firewall (QUIC uses UDP port 443 by default).
  • Some legacy load balancers may not support QUIC, so test compatibility with your infrastructure first.
  • HTTP/3 server push is disabled by default, as it’s rarely useful for REST APIs.

For teams running high-traffic APIs, Go 1.24’s HTTP/3 support removes a major performance bottleneck with zero third-party dependencies. The latency and throughput gains are immediate for global user bases, making this one of the most impactful updates for Go backend developers in recent years.

The Story of How I Built a VPN protocol: Part 1

🚨🚨🚨 Disclaimer 🚨🚨🚨

This article and the VPN itself are written for educational purposes only.

How It All Started

I recently switched to Arch. Everything started off well: I installed all the utilities I needed, and then I decided to install the VPN I used to use. And then a problem appeared — it doesn’t work on Arch (even as an AppImage).

My provider also supported Shadowsocks, but instead of using it, I decided to write my own VPN. For more practice.

VPN Protocol

My VPN protocol is designed for maximum stealth. In my opinion, one of the most important things here is encryption from the very first packet. In my protocol, this is implemented just like in Shadowsocks — with a pre-shared key.

Encryption algorithm: ChaCha20-Poly1305.

It’s also worth mentioning that the protocol works over TCP. A random amount of junk bytes is added to each packet for length obfuscation.

Packet Structure

Each packet has a 5-byte header that is masked as encrypted data using XOR with the first 5 bytes of the key.

  • First 2 bytes — total packet length. Needed to determine where the packet ends (since TCP can segment packets).
  • Third byte — flags byte. Currently only 2 flags are used:

    • Bit 1 — indicates that this packet is fake and should not be processed (not yet implemented).
    • Bit 2 — flag for performing ECDH (Elliptic Curve Diffie‑Hellman).
  • Last 2 bytes — ciphertext length, used to separate junk bytes from the ciphertext.

Then comes:

  • 12 bytes — randomly generated nonce;
  • ciphertext;
  • AEAD (authentication tag);
  • junk bytes.

Handshake and Key Exchange

1. First packet from the client

The client sends its 16-byte username to the server (encrypted, of course).

2. Server response

If the server finds a user with that username, it:

  • sends the client a randomly generated 32-byte salt;
  • starts computing the keys:
    • sending key (server → client)
    • receiving key (server ← client)

3. Key computation on the server

The server stores the user’s password in plaintext.

  • Receiving key (for decrypting from the client) = hash(password + first 16 bytes of salt).
  • Sending key (for encrypting to the client) = hash(password + last 16 bytes of salt).

4. Client actions

The client receives the salt, decrypts it, and does the same thing, but the key roles are inverted:

  • what is the sending key for the server becomes the receiving key for the client, and vice versa.

5. ECDH and connection finalization

After the client has generated the keys, it generates an ephemeral key pair based on the Curve25519 elliptic curve (this pair is needed for ECDH). It then sends a connection confirmation (first byte = 0xFF) along with the public ephemeral key, setting the ECDH flag.

The server receives the packet, deobfuscates it, and gets the confirmation and the client’s ephemeral key. Then it:

  • assigns an IP address to the client from a local private network;
  • generates its own ephemeral key pair;
  • sends the client its assigned IP address and the server’s public key;
  • performs the ECDH round.

After sending, the server updates its keys by hashing the old keys with the secret obtained from ECDH.

6. Client finalization

After receiving the packet with the IP address and the server’s public ephemeral key, the client:

  • creates a local tunnel;
  • sets its IP address (received from the server);
  • performs the ECDH round;
  • updates its keys.

Main Work Loop

After the connection is established and keys are generated, the main work loop begins.

Client Side

3 goroutines run on the client side:

First goroutine (reading from the tunnel and preparing packets)

  • Reads packets from the tunnel.
  • Generates an 8-byte salt to update the sending key (by hashing the old sending key with the salt).
  • Adds this 8-byte salt to the beginning of the plaintext (the salt is followed by the packet read from the tunnel).
  • Encrypts everything.
  • Adds random junk bytes for obfuscation.
  • Stores the prepared packet in a buffer.

Second goroutine (sending packets)

  • Responsible for sending already prepared packets.
  • Packets are sent in batches of 1 to 5 packets (the protocol is of course segmented at OSI layers 3 and 4, but I can’t influence that).

Third goroutine (receiving packets from the server)

  • Responsible for receiving packets from the server.
  • Performs deobfuscation and decryption.
  • Writes the decrypted data to the tunnel.

Server Side

The server has 3 main goroutines, plus additional goroutines for receiving packets from clients.

First goroutine (handshake handling)

Handles incoming handshake requests from clients. If the handshake is successful, a new goroutine is created to process packets sent by that client.

Second goroutine (reading from the tunnel)

Reads packets from the tunnel and sends them to clients.

Third goroutine (cleaning inactive connections)

Cleans up inactive connections.

Key Updates

Salt in every packet

Every packet (whether from client or server) contains a salt. It is used to update the keys:

  • The server, when sending a packet, includes a salt. After sending, it updates its sending key by hashing the old key with that salt.
  • The client, when receiving and decrypting a packet, also updates a key — but not the sending key, the receiving key.
  • When the client sends a packet, the same happens, but the roles are reversed.

Periodic ECDH updates

Every 4 minutes or after sending 2³² packets (whichever comes first), keys are updated using ECDH on elliptic curves. The keys are transmitted along with data packets.

And that, in fact, is the entire protocol. During implementation, I thought about writing it in Go or Rust. I chose Go for its simplicity.

Implementation Process

To be honest, the protocol architecture was mostly developed while writing the code. It has quite a few problems — both in terms of protocol design and implementation.

Example problems

  • Constant username packet length

    The encrypted username packet has a constant length of 44 bytes (12 bytes nonce, 16 bytes ciphertext, and a 16-byte AEAD tag). Knowing this and that the user is using this protocol, you can calculate the 4th and 5th bytes of the key.

  • Repository duplication

    I foolishly created two separate repositories — one for the client and one for the server. As a result, the branches containing common modules just duplicate each other.

  • Git flow

    I tried to follow git flow, but failed here too.

  • Vulnerabilities

    I also have a feeling that there are more vulnerabilities in the code than working logic.

  • No graceful shutdown

    There is no proper negotiated client-server disconnect — just a connection break.

Although considering this is my first project, I think it didn’t turn out too badly. If anyone wants to check out this mess, here are the links:

  • Client: https://github.com/SmileUwUI/smileTun-client
  • Server: https://github.com/SmileUwUI/smileTun-server

Currently, the implementation works. And I’m writing this article through my own VPN protocol.

Future Plans

  • Merge both repositories into one.
  • Add fake packet sending.
  • Add TLS mimicry.
  • And much more.

If anyone has any questions or recommendations — leave them in the comments. For now, I bid you farewell. Good luck to everyone!