How To Make Your Design System AI-Ready

AI-generated prototypes often don’t deliver consistently decent results because of tiny inconsistencies scattered all across a design system. I’s decisions made but not documented, hard-coded values never cleaned up, or relying too much on AI making sense of mock-ups or design flows on its own.

Yesterday I stumbled upon a useful practical guide by Hardik Pandya from Atlassian — on how to reduce drifts, minimize mistakes, maintain context, and improve the quality of AI-generated prototypes. Let’s see how it works.

1. Design Decisions Are Infrastructure

Unsurprisingly, better AI prototypes come from better data — but also from better human guidance. We shouldn’t assume that AI knows how to choose the right component and how to design with accessibility in mind. It needs priorities, a clear path on how we make decisions, design principles, examples, do’s and don’ts.

In fact, we should treat design decisions as infrastructure. That means that every time we make a decision — not just a design decision, but even a decision on how to actually prioritize our work and how we make decisions around here — it must find a path into the spec file that is then consumed by AI.

2. Auditing: FigmaLint

One of the useful tools to audit the quality of the design system is FigmaLint. It’s a useful free Figma plugin for auditing tokens, states, accessibility, binding tokens, renaming layers, detecting detached instances, missing interactive states and hard-coded values — and preparing the design documentation.

If you often have to work with vendors and third parties who supply you with their design systems and component libraries, that’s a great helper to have by your side — especially if you want to improve the quality of prototypes, AI-generated code, and AI-written documentation.

3. Three Layers: Spec Files + Token Layer + Auditing

To ensure quality, we establish design principles, guidelines, and rules in the form of “spec files”. It’s structured Markdown files that include spacing rules, color choices, component usage guidelines, priorities, etc. AI is going to read and reuse that spec file every time it’s going to generate a prototype.

Because the spec files are text files, it’s much more cost-effective but also much more accurate, just because we don’t rely on AI recognizing or decoding patterns from mock-ups but get specific guidelines instead. In fact, extending code is often a more effective way than generating code from mock-ups.

The token layer lists and keeps updated all tokens used throughout the design system. AI always chooses from a closed set of named variables instead of inventing plausible values ad hoc.

An audit script catches what AI gets wrong. It scans the prototype and flags every hard-coded value and flags it if necessary. It can be a regular software doing that, with AI waiting for its feedback to come back.

Finally, when a design system ships updates, a sync routine flags which spec files need updating. The goal is to make sure that AI always reads up-to-date, current specs, not the ones written against an outdated version.

4. Examples of AI-Ready Design Systems

  • Atlassian
  • Carbon
  • CMS Design System
  • Nordhealth

Wrapping Up

Ultimately, AI cannot magically resolve technical debt or design debt without proper guidance. It relies heavily on clear decisions, established priorities, and well-defined principles.

The more deliberate and precise designers are in guiding AI, the better the overall outcomes will be. This requires not just cleaning up and improving design systems but also maintaining them over time as decisions need to trickle down into Markdown files. We’ll be busy for years to come.

Meet “Design Patterns For AI Interfaces”

Meet Design Patterns For AI Interfaces, Vitaly’s new video course with 100s of real-life examples and UX guidelines to design AI features that people actually use — with a live UX training later this year. Jump to a free preview.

Meet Design Patterns For AI Interfaces, Vitaly’s video course on interface design & UX.

  • Video + UX Training
  • Video only

Video + UX Training

$ 450.00 $ 799.00

Get Video + UX Training

30 video lessons (10h) + Live UX Training.
100 days money-back-guarantee.

Video only

$ 275.00$ 395.00

Get the video course

30 video lessons (10h). Updated yearly.
Also available as a UX Bundle with 3 video courses.

Useful Resources

  • FigmaLint, by TJ Pitre
  • Atlassian AI-Ready Design System Example, by Atlassian
  • Carbon AI-Ready Design System Example, by IBM
  • CMS Design System AI-Ready Example, by Centers for Medicare & Medicaid Services
  • Nordhealth AI-Ready Design System Example, by Nordhealth

I Tried to Stretch DeepSeek’s 5M Free Tokens to 30 Days. R1 Is the Trap.

DeepSeek’s 5M free API tokens sound generous. The takes I kept seeing were:

“That’s basically a free month of AI.”
“R1 is the obvious default because it’s smarter.”
“Just prototype until the balance is gone.”

Two of those are wrong. The third is how you wake up with an empty token balance and no idea what happened.

I spent time digging through a real 14-day burn log from one DeepSeek test account. The numbers changed how I’d use free API credits.

TL;DR

  • No, 5M free tokens is not a huge credit balance. At DeepSeek V4 rates, it’s roughly $3.40 of paid usage.
  • The fastest way to waste it is defaulting to R1 for non-reasoning tasks. In our test prompts, R1 burned 3x to 6.7x more tokens than V4.
  • Missing max_tokens is the quiet killer. One classification task dropped from 380 output tokens to 8 after adding a 20-token cap.
  • Full-document RAG in every prompt is how you donate your free tier back to the provider.
  • If you’re disciplined, 5M tokens can support a real solo-dev prototype for almost a month. If you’re sloppy, it can feel gone in a long weekend.

What actually happened

DeepSeek gives new accounts 5,000,000 free tokens. No credit card is required, based on the account setup flow we tracked in the signup walkthrough, and the account balance is visible in the DeepSeek platform dashboard.

The catch: a token grant is not the same thing as a month of usage.

At DeepSeek’s published V4 pricing of $0.27 / 1M input tokens and $1.10 / 1M output tokens (DeepSeek pricing docs), a balanced 5M-token allowance is worth about:

Mix Input cost Output cost Total value
2.5M input + 2.5M output $0.675 $2.75 $3.425

That number is tiny and useful at the same time.

Tiny, because you shouldn’t treat it like a serious cloud credit. Useful, because DeepSeek is cheap enough that $3.40 still buys a meaningful prototype if your calls are controlled.

The test account used DeepSeek for a documentation Q&A bot, basic coding help, classification, extraction, and some RAG experiments. Every call’s prompt_tokens and completion_tokens was logged into SQLite.

Here’s the burn curve that mattered:

Period Main activity Tokens used Cumulative burn
Days 1-2 Wrapper code, hello world 18K 0.4%
Day 3 RAG prototype, naive chunking 712K 14.6%
Days 4-5 RAG fixes + reruns 480K 24.2%
Day 6 Switched from R1 back to V4 215K 28.5%
Days 7-9 Real prototype iteration 1.64M 61.3%
Day 10 Found max_tokens was unset 410K 69.5%
Days 11-13 Prompt/output trimming 1.18M 93.1%
Day 14 Quota exhausted mid-session 345K 100%

The embarrassing part is that the two big spikes were avoidable.

Day 3 was a RAG design mistake.

Day 10 was a missing parameter.

That’s the whole story of AI API cost: not one catastrophic bill, just small defaults compounding while you’re focused on shipping.

The number that made me stop using R1 by default

R1 is the fun model. It reasons. It thinks more. It feels like the serious choice.

But for a lot of API work, “serious” means “expensive for no reason.”

Same task, same prompt family:

Task DeepSeek V4 tokens DeepSeek R1 tokens Multiplier
Short classification ~400 ~1,200 3x
Code review ~800 ~2,500 3.1x
Math problem ~600 ~4,000 6.7x
Creative writing ~1,200 ~1,500 1.25x

My rule now is simple:

Use V4 by default. Escalate to R1 only for math, multi-step logic, or tasks where the reasoning trace is worth the burn.

Here’s the pain translated into a monthly bill:

Scenario Model choice Approx tokens/call 500 calls/day Monthly burn
Classification on V4 Right default 400 200K/day 6M/month
Classification on R1 Wrong default 1,200 600K/day 18M/month
Math on V4 Possibly underpowered 600 300K/day 9M/month
Math on R1 Worth it 4,000 2M/day 60M/month

At free-tier scale, the R1 mistake drains your grant faster.

At paid scale, the same mistake becomes a recurring line item.

The max_tokens bug is more expensive than it looks

This was the funniest and most annoying discovery in the log.

The task was classification. Expected output: one label.

The model returned paragraphs.

Before:

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": "Classify this support ticket into one of 5 categories: ..."
        }
    ],
)

After:

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "user",
            "content": "Classify this support ticket into one of 5 categories. Return only the label: ..."
        }
    ],
    max_tokens=20,
    temperature=0,
)

The average output dropped from 380 tokens to 8.

That’s a 47x output reduction for one parameter and one sentence.

Now translate it:

Workload Before After What it means
10K classifications 3.8M output tokens 80K output tokens Almost the whole free grant saved
50K classifications/month 19M output tokens 400K output tokens Paid bill stops being silly
200K classifications/month 76M output tokens 1.6M output tokens This becomes architecture, not tuning

This is why I don’t trust “cheap model” discussions that ignore output caps.

A cheap model with runaway output is not cheap.

The RAG mistake: full context is not retrieval

Day 3 burned 712K tokens because the prototype pasted a 2,400-token reference document into every call.

That’s not RAG. That’s panic with a context window.

The fix was boring: top-k retrieval.

Approach Average input tokens Quality result
Full document in every prompt 2,400 Baseline
Top-3 chunks, ~120 tokens each ~400 Slightly better

The quality improved because the model stopped reading irrelevant context.

This is the part people miss: context reduction is not just cost optimization. It can be quality optimization.

Let’s do the monthly math:

RAG style Calls/day Input tokens/call Monthly input tokens
Full-doc prompt 200 3,000 18M
Top-k retrieval 200 800 4.8M

Same product. Same user experience. 13.2M fewer input tokens/month.

On a free grant, that is the difference between finishing your prototype and spending the last week debugging quota errors.

The 5M-token decision tree

If I were starting with a fresh DeepSeek balance today, this is the routing function I’d use:

def deepseek_free_tier_plan(workload):
    if workload in ["classification", "extraction", "short_qa", "rewrite"]:
        return {
            "model": "deepseek-chat",   # V4
            "max_tokens": 20 if workload == "classification" else 300,
            "temperature": 0,
            "rule": "Do not use R1 here."
        }

    if workload in ["math", "formal_reasoning", "multi_step_debugging"]:
        return {
            "model": "deepseek-reasoner",  # R1
            "max_tokens": 1200,
            "temperature": 0,
            "rule": "Use R1, but log token cost per task."
        }

    if workload in ["rag", "docs_bot", "support_search"]:
        return {
            "model": "deepseek-chat",
            "retrieval": "top_k_3_to_5",
            "max_context_tokens": 900,
            "rule": "Never paste the whole document."
        }

    return {
        "model": "deepseek-chat",
        "max_tokens": 500,
        "rule": "Start cheap, escalate only after failure."
    }

I like writing it as code because it exposes the real decision.

The question is not “which model is best?”

The question is “which model is enough for this task?”

What I’d do if I were starting today

If I were a solo developer:

  • I’d claim the 5M tokens and spend the first hour building a usage logger.
  • I’d use V4 for everything by default.
  • I’d set max_tokens on every call before writing real app code.
  • I’d keep system prompts under 200 tokens.
  • I’d only switch to R1 after writing down why V4 failed.

If I were building a RAG prototype:

  • I’d ban full-document prompts.
  • I’d start with top-3 retrieval.
  • I’d log input tokens separately from output tokens.
  • I’d test answer quality after removing context, not only after adding it.
  • I’d budget 100-150 calls/day if I wanted the grant to last close to 30 days.

If I were running this inside a small team:

  • I’d treat the 5M grant as onboarding, not infrastructure.
  • I’d give each workflow a daily token ceiling.
  • I’d set a fallback before the balance hits zero.
  • I’d compare DeepSeek V4 against OpenAI/Claude only on cost per successful task, not vibes.

The bigger picture

The interesting part isn’t that DeepSeek gives away 5M tokens.

The interesting part is that the allowance is big enough to teach you the economics of AI APIs before you pay.

You learn fast that:

  • Reasoning models are not default models.
  • Output tokens are where “cheap” gets expensive.
  • RAG without retrieval is just context stuffing.
  • Free credits hide the same mistakes that later show up as paid bills.

DeepSeek is one of the few providers where a small token balance can still support real experimentation. But free-tier discipline matters precisely because the paid tier is cheap. If your workflow is wasteful at $3.40, it will still be wasteful at $34, $340, or $3,400.

If you want to swap between OpenAI / Anthropic / Google / DeepSeek models through one OpenAI-compatible endpoint, that’s roughly what TokenMix does. Disclosure: I work on the research side. The full data-cited breakdown of this DeepSeek test is on the original article.

Bottom line

DeepSeek’s 5M free tokens are enough for a serious prototype, not enough for careless defaults.

My default is now V4, capped outputs, short system prompts, and top-k retrieval. R1 earns its place per task.

If you had 5M free tokens and 30 days, what would you spend them on first: a coding assistant, a docs bot, a RAG prototype, or something else?

Honest Build: No Fake Exception Codegen | Build honesto: sem codegen falso de exceptions

Bilingual post · Post bilíngue

Jump to: English · Português

English {#english}

Honest Build: No Fake Exception Codegen

CrabPascal offers two execution paths: run (internal runtime interpreter) and build-exe (C code generation + native compiler). For a long time, the codegen path pretended to support exceptions — emitting stub blocks that compiled but did not behave like Delphi or like run. v2.21.0 (Sprint 13) ends that charade.

The problem with simulated exceptions

Consider typical Delphi error handling:

try
  ProcessOrder(OrderId);
except
  on E: Exception do
    LogError(E.Message);
end;

In the runtime, try/except unwinds the stack and matches exception types. The old C backend generated placeholder code — enough to pass a compile step, not enough to run correctly. Worse, it hid parity gaps: developers thought native build worked until production crashed differently than run.

That violates a core CrabPascal principle: honest tooling. If we cannot do it correctly yet, say so loudly.

What changed in v2.21.0

The codegen module now refuses to generate C for:

  • try / except / finally blocks
  • raise statements

Instead you get an explicit error pointing to run:

crab-pascal build-exe MyApp.dpr
# error: exception handling not supported in native codegen yet; use `crab-pascal run`

This aligns expectations immediately. CI pipelines fail for the right reason, not silently.

When to use each path

Command Best for
run Development, Horse APIs, OOP with exceptions, rapid iteration
check IDE feedback, CI static analysis
build-exe Performance-sensitive code without exception constructs
crab-pascal check examples/crud/crud.dpr
crab-pascal run examples/crud/crud.dpr

The CRUD example uses JSON and HTTP — exception-free hot paths compile fine with build-exe where supported.

Regression test

Sprint 13 added codegen::tests::test_codegen_errors_on_try_raise, locking the behavior:

// Pseudocode intent: codegen must Err on try/raise, not emit fake C
assert!(codegen_result.is_err());
assert!(message.contains("use `run`"));

Future work on real exception lowering (setjmp/longjmp, table-based handlers, or LLVM) will flip specific tests green — not reintroduce silent stubs.

Roadmap for real native exceptions

Honest failure is step one. Step two is implementing Delphi-compatible exception tables in codegen — likely coordinated with RTL types in System.SysUtils and runtime object layout. Until then, document the split clearly (this article, release notes, check hints).

Developer takeaway

If your project relies on structured exception handling — and most Delphi code does — run is the supported path today. Native build is for subsets of the language where parity is proven. Sprint 13 chose trust over checkbox features. That makes CrabPascal safer to adopt incrementally.

Questions? @crabpascal on Dev.to or issues on Bitbucket.

Português {#portugus}

Build honesto: sem codegen falso de exceptions

O CrabPascal oferece dois caminhos de execução: run (interpretador/runtime interno) e build-exe (geração de C + compilador nativo). Por muito tempo, o codegen fingia suportar exceptions — emitindo blocos stub que compilavam mas não se comportavam como Delphi nem como run. O v2.21.0 (Sprint 13) acaba com essa farsa.

O problema das exceptions simuladas

Considere tratamento de erro típico em Delphi:

try
  ProcessOrder(OrderId);
except
  on E: Exception do
    LogError(E.Message);
end;

No runtime, try/except desempilha a stack e casa tipos de exception. O backend C antigo gerava código placeholder — bastava para passar compilação, não para rodar certo. Pior: escondia gaps de paridade — desenvolvedores achavam que o build nativo funcionava até produção falhar diferente de run.

Isso viola um princípio central do CrabPascal: ferramentas honestas. Se ainda não dá para fazer certo, diga alto.

O que mudou no v2.21.0

O módulo codegen agora recusa gerar C para:

  • blocos try / except / finally
  • instruções raise

Em vez disso, você recebe erro explícito apontando para run:

crab-pascal build-exe MyApp.dpr
# error: exception handling not supported in native codegen yet; use `crab-pascal run`

Expectativas alinhadas na hora. Pipelines CI falham pelo motivo certo, não silenciosamente.

Quando usar cada caminho

Comando Melhor para
run Desenvolvimento, APIs Horse, OOP com exceptions, iteração rápida
check Feedback IDE, análise estática em CI
build-exe Código sensível a performance sem construtos de exception
crab-pascal check examples/crud/crud.dpr
crab-pascal run examples/crud/crud.dpr

O exemplo CRUD usa JSON e HTTP — hot paths sem exception compilam com build-exe onde suportado.

Teste de regressão

O Sprint 13 adicionou codegen::tests::test_codegen_errors_on_try_raise, fixando o comportamento:

// Intenção: codegen deve Err em try/raise, não emitir C falso
assert!(codegen_result.is_err());
assert!(message.contains("use `run`"));

Trabalho futuro em lowering real de exceptions (setjmp/longjmp, handlers tabulares ou LLVM) fará testes específicos passarem — sem reintroduzir stubs silenciosos.

Roadmap para exceptions nativas reais

Falhar honestamente é o passo um. O passo dois é implementar tabelas de exception compatíveis com Delphi no codegen — provavelmente coordenado com tipos RTL em System.SysUtils e layout de objetos no runtime. Até lá, documentar a divisão claramente (este artigo, release notes, hints do check).

Conclusão para desenvolvedores

Se seu projeto depende de exception handling estruturado — e a maioria do código Delphi depende — run é o caminho suportado hoje. Build nativo é para subconjuntos da linguagem onde a paridade está comprovada. O Sprint 13 escolheu confiança em vez de checkbox de feature. Isso torna o CrabPascal mais seguro para adoção incremental.

Dúvidas? @crabpascal no Dev.to ou issues no Bitbucket.

Published on dev.to/@crabpascal · Código em CrabPascal

The Most Confusing Thing in VirtualBox: Networking Explained

If you’ve ever run this command in Kali Linux:

ip a

you’ve probably seen something like this:

lo      127.0.0.1
eth0    10.0.2.15
eth1    192.168.56.10
docker0 172.17.0.1

And then immediately asked:

Why does my machine have multiple IP addresses?

What are eth0 and eth1?

Which IP should I use?

Why can Windows access one IP but not the other?

You’re not alone.

This is probably one of the most confusing topics for beginners learning VirtualBox, Kali Linux, and networking.

Let’s fix that.

First: What Is an IP Address?

Think of an IP address as a house address.

If someone wants to send you a letter, they need your address.

The internet works the same way.

Every device needs an address so other devices know where to send data.

Example:

Your Phone
192.168.1.10

Your Laptop
192.168.1.20

Your Router
192.168.1.1

Without addresses, communication wouldn’t be possible.

Then Why Does Kali Have Multiple IP Addresses?

Because Kali has multiple network interfaces.

Think of a computer as a building.

A building can have:

  • Front Door
  • Back Door
  • Garage Door

Each door connects to a different area.

Computers work similarly.

Each network interface is a separate network door.

Understanding Network Interfaces

When you run:

ip a

Linux shows every network interface available on the system.

Example:

lo
eth0
eth1
docker0

Each one serves a different purpose.

Interface 1: lo (Loopback)

lo
127.0.0.1

This is called the loopback interface.

Think of it as talking to yourself.

Visual:

Kali
 │
 └── talks to Kali

When you visit:

http://127.0.0.1

the traffic never leaves your machine.

It doesn’t reach:

  • Windows
  • Your Router
  • The Internet

Everything happens internally.

This is commonly used by:

  • Web servers
  • Databases
  • Local applications

Interface 2: eth0

In my lab:

eth0
10.0.2.15

This interface was created by VirtualBox using NAT mode.

Visual:

Kali
  │
  ▼
VirtualBox NAT
  │
  ▼
Windows
  │
  ▼
Internet

This interface allows Kali to:

  • Browse websites
  • Download tools
  • Run apt update
  • Access the internet

Example:

ping google.com

Most likely uses eth0.

Interface 3: eth1

In my lab:

eth1
192.168.56.10

This interface belongs to a Host-Only network.

My Windows machine has:

192.168.56.1

Visual:

Windows
192.168.56.1
       │
       │
       │
Kali
192.168.56.10

This network exists entirely inside my laptop.

No internet.

No router.

No external devices.

Just Windows and the virtual machines.

Why Cybersecurity Labs Use Host-Only Networks

Imagine you are learning:

  • Nmap
  • Metasploit
  • Burp Suite
  • Web Security Testing

You need targets.

Instead of attacking real systems, you create a private lab.

Example:

Windows
192.168.56.1

Kali
192.168.56.10

Metasploitable
192.168.56.20

Now Kali can safely scan and test Metasploitable.

Everything stays inside your laptop.

What Is a Virtual NIC?

NIC stands for:

Network Interface Card

A real computer has a physical network card.

A virtual machine doesn’t.

So VirtualBox creates a virtual network card.

Visual:

Physical Laptop
│
├── Real Network Card
│
└── VirtualBox
      │
      ├── Virtual NIC (eth0)
      │
      └── Virtual NIC (eth1)

To Kali, these look like real network adapters.

Even though they are completely virtual.

Reading ip a Like a Professional

Suppose Kali shows:

lo
127.0.0.1

Meaning:

I can talk to myself.

Suppose Kali shows:

eth0
10.0.2.15

Meaning:

I can reach the Internet.

Suppose Kali shows:

eth1
192.168.56.10

Meaning:

I can communicate with machines on my lab network.

Once you understand what each interface is connected to, the output becomes much easier to read.

A Real Example

I started Apache on Kali.

My Kali address:

192.168.56.10

From Windows, I opened:

http://192.168.56.10

And immediately reached the Apache web server running inside Kali.

Why?

Because Windows and Kali were connected through the same Host-Only network.

The request traveled like this:

Windows Browser
        │
        ▼
Host-Only Network
        │
        ▼
Kali eth1
        │
        ▼
Apache

No internet involved.

Everything happened inside a single laptop.

The Biggest Networking Mistake Beginners Make

Many people see multiple IP addresses and assume they’re all the same.

They’re not.

Each interface belongs to a different network.

Each network serves a different purpose.

The IP address itself is only half the story.

The interface attached to that IP matters just as much.

Final Thoughts

VirtualBox networking feels complicated at first because one machine suddenly has multiple IP addresses.

But the idea is actually simple.

Think of every interface as a separate door.

lo      = Door to yourself

eth0    = Door to the Internet

eth1    = Door to your lab network

docker0 = Door to your containers

Once you understand what each door connects to, networking becomes much easier to visualize.

And that’s the moment VirtualBox starts making sense.

Async-вызовы и Batch API в LLM: как сэкономить до 50% и ускорить обработку

Инфографика производительности LLM: слева блок «синхронный цикл for» с длинной последовательной полосой времени, в центре «asyncio.gather параллельный» с короткой плотной полосой, справа «Batch API −50%» с маленькой полосой и подписью скидки; плоский векторный стиль

Когда у вас 10 запросов в LLM — синхронный for нормально. Когда 1000 — он становится бутылочным горлышком, и пайплайн крутится часами. Когда 100 000 — обычный API становится дорогим, и расходы на токены съедают юнит-экономику. Два классических решения: async-параллельность (asyncio + aiohttp для 50–500 запросов в секунду) и Batch API (off-line режим со скидкой 50% на input/output).

Этот гайд — рабочий код обоих паттернов через единый шлюз Promptra (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, DeepSeek V4 Pro), расчёт реальной экономии на типовых сценариях, паттерны очередей и retry для production, и чёткие правила «когда что брать». оплата в рублях по договору, полный пакет закрывающих документов, цены в рублях по курсу ЦБ.

TL;DR — два режима, две экономии

Async (через asyncio.gather + AsyncOpenAI):

  • Real-time, ответ за секунды
  • Throughput до 500 RPS на ключ
  • Та же цена, что у обычного API
  • Когда: UI, агенты, real-time чаты

Batch API (через client.batches.create):

  • Offline, SLA до 24 часов (обычно час-два)
  • Скидка 50% на input и output
  • Лимита на размер нет (миллионы запросов в одном файле)
  • Когда: разметка, классификация архива, summary всей базы

Производительный production-стек использует оба.

Часть 1: Async-вызовы через asyncio

Базовый паттерн — параллельное выполнение N запросов через asyncio.gather:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key="sk-promptra-...",
    base_url="https://api.promptra.ru/v1",
)

async def call_one(prompt: str) -> str:
    response = await client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

async def main:
    prompts = [f"Расскажи короткий факт про число {i}" for i in range(100)]
    results = await asyncio.gather(*[call_one(p) for p in prompts])
    for p, r in zip(prompts, results):
        print(p, "", r[:80])

asyncio.run(main)

100 запросов в секунду — обычно реально (упирается в rate limit ключа, не в SDK). Сравнение со синхронным циклом:

Сценарий Время Throughput
Sync for (100 запросов) ~180 сек 0.5 RPS
asyncio.gather(100) ~3.5 сек ~28 RPS
asyncio.gather + Semaphore(20) ~6 сек ~17 RPS

Async ускоряет в 30–50 раз на типовом latency 1–2 секунды на запрос. Но без контроля параллельности вы быстро упрётесь в rate limit.

Сравнительная диаграмма throughput: «Sync for — 0.5 RPS» крошечная полоса, «asyncio.gather — 28 RPS» средняя, «aiohttp + Semaphore(50) — 70 RPS» крупная терракотовая; подпись «×60 на одном ключе»; заголовок «Async ускоряет в десятки раз»

Rate limit через Semaphore

Если параметр N в gather слишком большой — ловите 429 от API. Решение — asyncio.Semaphore:

async def call_with_semaphore(sem: asyncio.Semaphore, prompt: str) -> str:
    async with sem:
        return await call_one(prompt)

async def main:
    sem = asyncio.Semaphore(20)   # максимум 20 параллельных запросов
    prompts = [f"Запрос {i}" for i in range(1000)]
    results = await asyncio.gather(*[call_with_semaphore(sem, p) for p in prompts])

Semaphore(20) означает: всегда не больше 20 параллельных, как только одна задача завершилась — следующая стартует. Это даёт постоянную нагрузку без всплесков.

Производительный лимит на ключ через Promptra — обычно 600 RPM (10 RPS), при росте трафика — поднимается через дашборд. Semaphore — это контроль на вашей стороне, чтобы не пропускать 429 в код приложения.

Retry с exponential backoff

Даже с Semaphore периодически прилетят 429 (всплески), 503 (временные сбои API), таймауты. Стандарт — tenacity:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from openai import RateLimitError, APIConnectionError, APITimeoutError

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    retry=retry_if_exception_type((RateLimitError, APIConnectionError, APITimeoutError)),
    reraise=True,
)
async def call_with_retry(prompt: str) -> str:
    response = await client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": prompt}],
        timeout=120,
    )
    return response.choices[0].message.content

Что важно:

  • 5 attempts — больше обычно бесполезно, проблема не разрешится.
  • Exponential backoff 2 → 4 → 8 → 16 → 30 сек — даёт API время восстановиться.
  • Только на retry-able ошибках — 400 (bad request) или 401 (auth) не retry’ить, это ваши ошибки.
  • reraise=True — после 5 неудачных попыток ошибка пробрасывается в код, не глотается.

Полный production-шаблон async

Собираем всё вместе — паттерн для обработки тысяч задач:

import asyncio
from openai import AsyncOpenAI
from tenacity import retry, stop_after_attempt, wait_exponential

client = AsyncOpenAI(
    api_key="sk-promptra-...",
    base_url="https://api.promptra.ru/v1",
)

@retry(stop=stop_after_attempt(5), wait=wait_exponential(min=2, max=30))
async def classify(text: str, sem: asyncio.Semaphore) -> dict:
    async with sem:
        response = await client.chat.completions.create(
            model="gpt-5-4",
            messages=[
                {"role": "system", "content": "Классифицируй текст: positive/neutral/negative."},
                {"role": "user", "content": text},
            ],
            temperature=0,
            timeout=60,
        )
        return {
            "text": text[:100],
            "label": response.choices[0].message.content.strip.lower,
            "tokens": response.usage.total_tokens,
        }

async def process_dataset(texts: list[str], concurrency: int = 20) -> list[dict]:
    sem = asyncio.Semaphore(concurrency)
    results = await asyncio.gather(
        *[classify(t, sem) for t in texts],
        return_exceptions=True,
    )
    # отделяем успехи от ошибок
    successes = [r for r in results if not isinstance(r, Exception)]
    failures = [r for r in results if isinstance(r, Exception)]
    print(f"Успешно: {len(successes)}, ошибок: {len(failures)}")
    return successes

asyncio.run(process_dataset(my_texts, concurrency=30))

return_exceptions=True — критично: одна упавшая задача не валит весь батч. Анализируете ошибки отдельно, ретраите при необходимости.

Часть 2: Batch API — −50% за оффлайн режим

Async помогает с throughput, но цену за токены не меняет. Batch API даёт скидку 50% на input и output, если согласны ждать до 24 часов. Эта статья — часть pillar-гида: полный технический гид по LLM API на Python — токены, function calling, streaming, RAG, batch.

Архитектура: вы готовите JSONL-файл с тысячами запросов, загружаете на сервер, ждёте окончания, скачиваете JSONL с результатами.

Шаг 1. Готовим JSONL

Каждая строка — один запрос:

import json

def make_batch_file(texts: list[str], model: str, path: str):
    with open(path, "w", encoding="utf-8") as f:
        for i, text in enumerate(texts):
            request = {
                "custom_id": f"task-{i}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": model,
                    "messages": [
                        {"role": "system", "content": "Классифицируй текст."},
                        {"role": "user", "content": text},
                    ],
                    "temperature": 0,
                },
            }
            f.write(json.dumps(request, ensure_ascii=False) + "n")

make_batch_file(texts, "gpt-5-4", "/tmp/batch.jsonl")

custom_id — ваш идентификатор для сопоставления результата с исходным запросом. Обычно — id записи в БД или индекс.

Шаг 2. Загружаем и стартуем batch

from openai import OpenAI

client = OpenAI(
    api_key="sk-promptra-...",
    base_url="https://api.promptra.ru/v1",
)

# загружаем файл
upload = client.files.create(file=open("/tmp/batch.jsonl", "rb"), purpose="batch")
print(f"File ID: {upload.id}")

# стартуем batch
batch = client.batches.create(
    input_file_id=upload.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"task": "classify_tickets", "version": "v3"},
)
print(f"Batch ID: {batch.id}, status: {batch.status}")

completion_window="24h" — SLA, обычно реально завершается за час-два.

Шаг 3. Опрос статуса и скачивание результата

import time

def wait_for_batch(batch_id: str, poll_interval: int = 60) -> dict:
    while True:
        batch = client.batches.retrieve(batch_id)
        print(f"Status: {batch.status}, completed: {batch.request_counts.completed}/{batch.request_counts.total}")
        if batch.status in ("completed", "failed", "expired", "cancelled"):
            return batch
        time.sleep(poll_interval)

batch = wait_for_batch(batch.id)

if batch.status == "completed":
    output = client.files.content(batch.output_file_id)
    with open("/tmp/batch_results.jsonl", "wb") as f:
        f.write(output.content)

Шаг 4. Парсим результаты

results = {}
with open("/tmp/batch_results.jsonl") as f:
    for line in f:
        record = json.loads(line)
        custom_id = record["custom_id"]
        if record.get("error"):
            results[custom_id] = {"error": record["error"]}
        else:
            answer = record["response"]["body"]["choices"][0]["message"]["content"]
            results[custom_id] = {"answer": answer}

# теперь по custom_id сопоставляете с исходными данными

Подробности API — в официальной документации Batch у OpenAI и в Message Batches у Anthropic.

Schema-диаграмма Batch API: 4 шага сверху вниз — «1. JSONL файл (1000+ запросов)», «2. files.create + batches.create», «3. опрос статуса 1-2 часа», «4. скачиваем output JSONL + парсим по custom_id»; справа крупный значок «−50% цены» терракотовый; заголовок «Batch API: оффлайн со скидкой 50%»

Экономика: реальные числа

Пример сценария: классификация 100 000 тикетов, средний 1500 input + 500 output токенов.

Через обычный async API

Модель Цена Стоимость
Claude Opus 4.7 350/1790 ₽ 142 000 ₽
Claude Sonnet 4.6 210/1070 ₽ 85 000 ₽
GPT-5.5 350/2150 ₽ 160 000 ₽
GPT-5.4 170/1070 ₽ 78 500 ₽
Gemini 3.1 Pro 140/860 ₽ 64 000 ₽
DeepSeek V4 Pro 30/60 ₽ 7 500 ₽

Через Batch API (−50%)

Модель Цена batch Стоимость Экономия
Claude Opus 4.7 175/895 ₽ 71 000 ₽ 71 000 ₽
Claude Sonnet 4.6 105/535 ₽ 42 500 ₽ 42 500 ₽
GPT-5.5 175/1075 ₽ 80 000 ₽ 80 000 ₽
GPT-5.4 85/535 ₽ 39 250 ₽ 39 250 ₽

Если у вас есть оффлайн-процессинг — Batch это просто бесплатные −50% к расходам. На больших объёмах это миллионы рублей экономии в год.

Сравнительная горизонтальная диаграмма цен Sync vs Batch для четырёх моделей: для каждой две полосы — серая «sync» и терракотовая «batch» вдвое короче, подписи в рублях; заголовок «Batch API: −50% к цене токенов»

Когда брать async, когда batch — дерево решений

Задача поступила:

┌─────────────────────────────────┐
│ Нужен ответ за секунды?         │
└────┬────────────────────┬───────┘
     │ да                 │ нет
     ▼                    ▼
  ┌──────┐         ┌─────────────────┐
  │ async│         │ Объём > 1000    │
  │ (UI, │         │ запросов?       │
  │ агент,│        └─────┬──────┬────┘
  │ чат) │              │ да   │ нет
  └──────┘              ▼      ▼
                  ┌──────┐ ┌──────┐
                  │batch │ │async │
                  │−50%  │ │быстро│
                  └──────┘ └──────┘

Правила:

  • Real-time UI (чат, ассистент) → async + streaming
  • Агент с tool calls → async (нужно несколько roundtrips)
  • Embedding большой базы → batch (вместо 100K параллельных async)
  • Ночная переклассификация → batch
  • A/B тест промтов на датасете → batch
  • Если хочется и того, и того → async с фоновой очередью + batch для архивных задач

Production-паттерн: async + batch в одной системе

Архитектура зрелого LLM-сервиса:

# real-time эндпоинт — async
@app.post("/chat")
async def chat(req: ChatRequest):
    return await async_chat_via_streaming(req)

# background задачи через очередь (Celery/RQ/Dramatiq)
@celery.task
def reclassify_all_tickets:
    tickets = db.query("SELECT id, text FROM tickets WHERE status='new'").all
    batch_id = submit_to_batch(tickets, model="claude-sonnet-4-6")
    schedule_check_batch(batch_id, after_minutes=60)

@celery.task
def check_batch(batch_id: str):
    batch = client.batches.retrieve(batch_id)
    if batch.status == "completed":
        results = download_and_parse(batch.output_file_id)
        save_to_db(results)
    elif batch.status in ("in_progress", "validating"):
        # ещё не готов — перепланировать
        schedule_check_batch(batch_id, after_minutes=30)
    else:
        alert_team(f"Batch {batch_id} failed: {batch.status}")

Та же база токенов, тот же ключ Promptra, один счёт от юр.лица. Через единый шлюз биллинг един — async расходы и batch расходы видны в одном дашборде.

Архитектурная схема production-системы: слева блок «Real-time UI чаты» соединён с «async endpoints», в центре «Promptra gateway», справа блок «Background: Celery worker» через «Batch API submit/poll»; обе линии сходятся на «единый счёт + ЭДО»; заголовок «Async + Batch в одной системе»

Распространённые ошибки

1. Использовать sync for на 1000+ запросов. На latency 1.5 сек — это 25 минут вместо 30 секунд через async.

2. Делать async без Semaphore. Получаете шторм 429-х, ваши попытки retry усугубляют ситуацию.

3. Использовать обычный API там, где нужен Batch. На 100K запросов это лишние 40-80K ₽ за прогон. На 10 прогонов в месяц — почти 1М ₽ в год.

4. Не парсить ошибки batch results. Каждая запись может быть либо response, либо error. Если игнорировать error — потеряете 0.1-1% записей молча.

5. Не использовать custom_id осмысленно. Если это просто индекс — после реорганизации файла теряете связь с исходными данными. Используйте id из вашей БД.

6. Запускать batch без оценки стоимости заранее. Считайте ожидаемый input/output через tokenizer («Как считать токены в LLM»), умножайте на batch-ставки, сверяйте с бюджетом — до запуска.

Оплата и закрывающие документы

Async и Batch — это одни и те же модели через тот же шлюз Promptra. Юрлицо-исполнитель — российское юр.лицо , резидент РФ. Сервисная комиссия 5% берётся только при пополнении баланса, на токены наценки нет. Batch-скидка 50% видна непосредственно в дашборде по статье «Batch usage». Полный пакет закрывающих документов (договор-оферта, счёт на оплату, акт оказанных услуг, счёт-фактура, УПД) приходит через ЭДО — Диадок, СБИС, Контур. Подробнее — на странице «Тарифы».

Что дальше

Async-вызовы — это переход с 0.5 RPS до 30+ на одном ключе и Semaphore’е. Batch API — это −50% к цене токенов за готовность подождать час-два. Зрелый production-стек использует оба: real-time async для пользовательских интерфейсов, batch для оффлайн-обработки и архивных задач. На объёме 100K+ запросов в месяц экономия от Batch измеряется сотнями тысяч рублей. Полезные следующие шаги: «Function calling и tool use» для async-агентов, «Embeddings и векторный поиск» для batch-индексации больших баз, «Streaming LLM-ответов» для real-time UI. Если нужно прикинуть стоимость на вашем трафике или подключить ключ через юрлицо — напишите команде Promptra в Telegram.

📚 Главный гайд по теме: Лучшая нейросеть 2026: какую LLM выбрать под задачу — связанные материалы и обзор всей категории.

Promptra — Russian LLM API aggregator. One OpenAI-compatible endpoint to all flagship models: OpenAI (GPT-5.5, GPT-5.4), Anthropic (Claude Opus 4.7, Sonnet 4.6), Google (Gemini 3.1 Pro, 3.5 Flash), DeepSeek V4 Pro, Qwen 3.6 Plus.

Provider prices 1-to-1 at CBR rate — no markup on tokens. Ruble billing per contract, full closing documents through EDI. No VPN — legal B2B service in Russia.

Try: promptra.ru · model catalog · docs

I Tried to Build a Car Listing Website From Scratch. Here’s What Actually Happened.

Everyone told me building a marketplace was “just a CRUD app.”
Those people have never built a marketplace.
So a few months ago I had this idea.
People in my city were still selling cars through WhatsApp groups and Facebook posts with zero photos and descriptions like “good condition, contact for price.”
There was clearly a gap. I was going to fill it. I was going to build the next OLX.
This was, in retrospect, extremely optimistic.

Week 1: The Excitement Phase

I sat down and mapped out everything the site needed.
• Car listings with specs
• Search and filters (make, model, year, price, location)
• User accounts for buyers and sellers
• Messaging between users
• Payment for featured ads
• Mobile responsive design
• Admin panel to manage listings and flag spam
Simple, right?
I opened my code editor and started building.
I named the project car-marketplace-v1.
It is currently called car-marketplace-v7-final-ACTUAL-final-please-work.

Week 3: The Reality Phase

The search filter alone took me two weeks.
Not because filters are hard.
Because I kept discovering things I hadn’t thought about.
“What if someone searches by fuel type but doesn’t pick a price range?” “What if two users message each other simultaneously?” “What if someone uploads a 40MB image?”
These are not edge cases. These are Tuesday.

The Specific Things That Broke Me

The image upload. I had not thought deeply about image storage, compression, or what happens when someone uploads a photo of their car taken in 2009 on a Nokia phone. Spoiler: bad things happen.
The spam problem. First week of testing I invited 10 people to post listings. Three of them were fake dealers trying to post the same car 15 times under different accounts. I had not built a moderation system yet. I had built exactly zero moderation.
The mobile layout. Looked perfect on my laptop. On my phone the search button was behind the keyboard. For two weeks.

What I Actually Shipped

Eventually I stopped trying to build everything custom and used a ready-made car classified script instead.
I know. I know.
But here’s what I realised after month two of solo development:
The boring infrastructure stuff — VIN lookup, spam controls, admin panel, multi-image upload with compression, Ajax search — none of that is the competitive advantage.
The advantage is the community you build on top of it.
Which is the part I should have been spending my time on from day one.

What I Learned

Building a marketplace is not a CRUD app. It is a trust problem wearing a CRUD costume.
The admin panel is not optional. It’s actually like 40% of the product. Nobody warns you about this.
Your first users will try to break everything. Not maliciously. They are just users.
The 20% of features you skip “for now” are always the ones your first real user asks for on day one.
Anyway. The site is live now. We have real listings. Real buyers. Zero disasters this week, which is my current definition of success.
Honest question for anyone who’s built a marketplace: What was the one thing you completely underestimated?
For me it was moderation. Still thinking about those fake dealers.

RustWeek 2026: What We Learned, Who We Met, and What’s Next for Rust

RustWeek 2026 brought more than 900 Rust developers, educators, and maintainers to Utrecht, Netherlands, for a few days of talks, hallway conversations, community meetups, hackathons, and workshops about all things Rust.

As a Gold sponsor for the third year in a row, the RustRover team joined the event to support the Rust ecosystem, connect with developers in person, and learn more about how people are building with Rust today.

Along the way, we sat down with members of the Rust community for a series of quick interviews about the future of Rust. In this post, we’re sharing the conversations, trends, and community moments that stood out most to us.

What brought RustRover to RustWeek 2026

Developer conferences are one of the few places where conversations happen completely outside of tickets, issue trackers, and release notes. For our team, RustWeek 2026 was an opportunity to step away from our screens and spend time with the people building real projects with Rust every day.

RusRover team at RustWeek 2026
RustRover team at the RustWeek

We brought demos, stickers, quizzes, a few quirky prize ideas, and some of our latest RustRover updates, including ACP, Cargo nextest support, and call hierarchy features.

We also came with a camera and five quick questions for members of the Rust community. Our goal was simple – to capture honest perspectives from the people shaping, teaching, and working with Rust in different ways – and we think we succeeded.

At the booth

One thing we underestimated before RustWeek was how competitive Rust developers can get during quiz sessions.

We hosted one Rust quiz each day of the conference, with attendees testing their knowledge. At the end of the quiz, the winner got to choose a prize from the booth table, which somehow made stickers, cat ears, and Francesco Ciulla’s The Rust Programming Handbook feel incredibly high stakes.

Congratulations to our two Rust quiz champions Nikolai Golub and Mateusz Mackowski, who survived ownership questions, async trivia, and increasingly competitive crowd reactions!

Conversations naturally drifted toward the kinds of projects people are building with Rust today. Embedded systems came up far more often than we expected, especially discussions around probe-rs, remote workflows, and debugging setups. Other attendees shared onboarding stories, editor preferences, or simply stopped by to talk about their experience learning Rust.

The most common questions we heard about RustRover

Many attendees were curious about how RustRover fits into existing Rust workflows, especially for developers already using VS Code, Vim, or Zed. Conversations often centered around debugging, Cargo integration, embedded development, and onboarding to Rust for newer developers.

Several attendees were also interested in remote workflows, custom toolchains, and how RustRover handles larger multi-language projects.

5 questions, 3 perspectives from the Rust community

One of the best parts of RustWeek was the chance to hear what different people in the Rust ecosystem think about the language and its future. To capture some of those perspectives, we caught three members of the Rust community between sessions for separate one-on-one conversations. Same five questions, three very different perspectives.

Our guests: Vlad Beskrovny from the RustRover team, Lori Lorusso from the Rust Foundation, and Stefan Baumgartner, Rust educator and author.

“In five years, Rust will be ___”

We asked each of our guests how they see Rust evolving over the next few years.

“Rust will become a boring language, and that’s a good thing. It’ll become boring when it’s adopted by everyone.”

Vladislav RustRover team, JetBrains

Vladislav Beskrovny
RustRover team, JetBrains

“Adopted by more companies than you would have imagined.”

Lori Lorusso speaker at the Rustweek

Lori Lorusso
Rust Foundation

Full interview from the RustWeek 2026 with Lori Lorusso

”What’s your unpopular opinion about Rust?”

“Everyone says it’s really hard to learn, but it’s just finding the right pathway in. It’s not that hard”

Lori Lorusso

“Rust is an easy language to learn. It’s just really hard to unlearn old habits”

Stefan Baumgartner Rust Educator and author

Stefan Baumgartner

Full interview with Stefan Baumgartner

”What are you most excited about at RustWeek 2026?”

“Every corner you turn, you see someone you haven’t met in a year.”

Stefan Baumgartner Rust Educator and author

Stefan Baumgartner

“I’m excited about giving my talk. It’s a movie theater, so it’s just awesome.”

Lori Lorusso

“I’m really excited about my talk with Lukas Wirth about IDE engines.”

Vladislav RustRover team, JetBrains

Vlad Beskrovny

Full interview with Vlad Beskrovny

One of the conference highlights for the RustRover team was Vlad’s talk with Lukas Wirth, where they explored IDE architecture, developer tooling, and ideas behind modern Rust language support.

Watch the livestream
Read the blog post

What Rust developers were talking about at RustWeek 2026

A few themes kept coming up throughout the conference, both at the booth and in hallway conversations.

Embedded Rust is growing. The Espressif booth stayed consistently busy, and multiple attendees stopped by to talk about embedded workflows, no_std development, and debugging setups. While many projects were still experimental or hobby-focused, interest in embedded Rust continues to grow steadily.

Beyond embedded, a lot of conversations centered around tooling, learning resources, and how developers are integrating Rust into existing workflows. Compared to previous years, there also seemed to be more people actively using Rust professionally rather than experimenting with it on the side.

More than anything, RustWeek felt energetic. Every hallway conversation seemed to turn into another recommendation, debate, or spontaneous deep dive into someone’s latest project.

What the RustRover team learned at RustWeek 2026

RustWeek reinforced something our team already believed but doesn’t always get to experience firsthand: The most useful feedback often comes from the conversations you don’t plan. Between talks, during quiz sessions, or while someone’s waiting for a sticker – that’s where the real insights surface.

We left Utrecht with a clearer picture of where the Rust ecosystem is heading: more embedded interest, more developers using Rust professionally, and a growing focus on tooling and developer experience. And we left with a long list of ideas, feature requests, and conversations we want to continue.

RustWeek reminded us that the Rust community continues to grow without losing what makes it special: the curiosity, the willingness to help, and the kind of enthusiasm that makes you want to start a new project on the train ride home.

IntelliJ IDEA 2025.3.6 Is Out!

IntelliJ IDEA 2025.3.6 is now available with the latest Oracle critical patch update for Java 21. The update includes the corresponding JetBrains Runtime changes and fixes the issue [IDEA-389015], providing improved reliability and security.

You can update to this version from inside the IDE, using the Toolbox App, or using snaps if you are a Ubuntu user. You can also download it from our website.

For a comprehensive overview of the fixes, see the release notes. If you spot any issues, let us know via the issue tracker.

Happy developing!

Async VFS Content Writes – What Plugin Authors Need to Know

Some plugin code follows this pattern:

  1. Save open documents.
  2. Get a file or directory path.
  3. Pass that path to something outside the IDE, such as a formatter, linter, compiler, VCS command, language server, or custom CLI tool.

Historically, it was reasonable to assume that once the save finishes, the file on disk already contains the latest editor text.

That is no longer guaranteed.

The IntelliJ Platform can now update the VFS first and finish the disk write in the background a bit later. Code that reads the file through IntelliJ Platform file APIs still sees the new content immediately. Code that reads the same file through Path, File, Files.*, or an external process may need an explicit flush before the handoff.

The official SDK docs cover that contract in When are VirtualFile changes persisted on disk and loaded from disk to VFS?.

Why This Exists

Writes to VirtualFile must happen under a write action. Until now, saving a file often meant doing the actual file-system write while that write action was still open.

That is expensive when the file system is slow, remote, or mounted through WSL or Docker. Moving the disk write out of the write action is meant to reduce freezes during document saves.

The Rule

If your plugin saves and reads files using IntelliJ Platform file APIs, you probably do not need to change anything. This is fine:

  • save a document through FileDocumentManager
  • read it later through VirtualFile
  • use VFS APIs such as contentsToByteArray, getInputStream, or VfsUtil

VFS behaves as if the write has already happened. For example, a read action started after the write action should see the new content when it reads through VFS.

If your code is about to read the physical file directly, or pass the path to another process, flush pending VFS writes first with ManagingFS:

import com.intellij.openapi.vfs.newvfs.ManagingFS

FileDocumentManager.getInstance().saveAllDocuments()

// Flush outside a write action; this may wait for disk I/O.

ManagingFS.getInstance().flushPendingUpdates()

commandLine.createProcess()

If you know the exact file, use the narrower version:

FileDocumentManager.getInstance().saveDocument(document)

// Flush outside a write action; this may wait for disk I/O.

ManagingFS.getInstance().flushPendingUpdates(virtualFile)

val textOnDisk = Files.readString(virtualFile.toNioPath())

The throwing variants can wait for I/O and can throw IOException, so call them at the boundary where disk access is about to happen. Do not add a flush after every save just to be safe.

For user-triggered actions where an IDE notification is more appropriate than handling an exception in your own code, use:

ManagingFS.getInstance().flushPendingUpdatesOrNotify()

For example, an action that opens a generated or saved file in a browser can flush before BrowserLauncher hands it to the browser:

FileDocumentManager.getInstance().saveAllDocuments()

ManagingFS.getInstance().flushPendingUpdatesOrNotify()

BrowserLauncher.instance.browse(url, browser, project)

If saving happened earlier, keep the same idea: flush immediately before the external reader touches the file system.

Places Worth Checking

The fragile spots are handoffs from VFS-written files to direct disk readers. These can show up as stale reads, external tools seeing old content, or tests that become flaky because they write through VFS and assert through NIO.

The platform codebase has been adjusted for many of these transitions, but plugins may still have their own cases. Common examples:

  • launching formatters, linters, compilers, test runners, VCS commands, or language servers
  • reading through Files.readString, Files.newInputStream, Path, or File
  • passing a project directory or file path to a CLI tool
  • tests that write through VFS and assert through NIO
  • VFS listeners that schedule later disk I/O

For VFS listeners, flush where the disk access actually happens. If the listener only enqueues work, do not flush inside the synchronous listener. That puts waiting back under the write action.

Current platform code may flush pending writes from some VirtualFile.toNioPath() paths, because path conversion is often followed by NIO access or process launch. Do not use path conversion as the synchronization point in plugin code. If disk visibility matters, call the flush API explicitly.

Opt-In and Troubleshooting

The feature is enabled by default, but not every getOutputStream() call automatically becomes async.

The requestor passed to VirtualFile.getOutputStream(requestor) has to opt in. Today, the important path is editor saves: FileDocumentManagerImpl opts in, so files saved from the editor can go through the new branch.

The opt-in marker itself, AsyncFileContentWriteRequestor, is currently internal, so most third-party plugins should not rush to adopt async writes directly. The more immediate task is to audit assumptions around saveAllDocuments() and direct disk access.

To check whether a problem is related to this behavior, temporarily disable it with:

-Dvfs.async-content-write.enabled=false

When running a plugin with the IntelliJ Platform Gradle Plugin, pass the flag to the IDE process through the runIde task:

import org.gradle.process.CommandLineArgumentProvider
tasks {
  runIde {
    jvmArgumentProviders += CommandLineArgumentProvider {
      listOf("-Dvfs.async-content-write.enabled=false")
    }
  }
}

Test Failures You May See

This kind of test can become flaky:

writeThroughVfs(virtualFile)

assertEquals("expected", Files.readString(virtualFile.toNioPath()))

The test writes through one view of the file system and reads through another. Make the boundary explicit:

writeThroughVfs(virtualFile)

ManagingFS.getInstance().flushPendingUpdates(virtualFile)

assertEquals("expected", Files.readString(virtualFile.toNioPath()))

If the assertion reads through VFS, no flush should be needed.

WPF Hot Reload Is Here: Edit Your XAML and Watch It Update Live in Rider

WPF Hot Reload is now available in Rider, starting with the 2026.2 EAP 2 build. You can edit your XAML while your app is running under the debugger and see the changes immediately, with no rebuild, no restart, and no losing your place in the application. Together with the C# Hot Reload support that’s already in Rider, this completes the Edit and Continue workflow for WPF.

This one has been a long time coming, and we want to be straight about that. The request for WPF Hot Reload is one of the most upvoted issues in Rider’s entire history. We read every word of your comments, and the feature we’re shipping today is shaped directly by that feedback. 

A quick, honest note before we go further: this is a beta. The surface area of WPF is large, and some scenarios are not covered yet. We’ll lay those out plainly below so you know exactly what to expect. 

Download Rider 2026.2 EAP

What WPF Hot Reload does

When you’re running a WPF app under the Rider debugger, you can now modify your XAML and have the saved changes reflected in the live, running application. Adjust a margin, restyle a button, tweak a DataTemplate, change a color, rework a layout – then save, and the UI updates in place. The application keeps its current state. You don’t need to navigate back through five screens to get to the view you were working on. You don’t have to rebuild. You just keep iterating.

Why this matters for the way you work

WPF UI development has a natural rhythm: Change something, see how it looks, adjust. Without Hot Reload, that rhythm keeps getting interrupted by a rebuild and a click-path back to the screen you were on. A few situations make that friction especially frustrating, and they are unfortunately not uncommon.

Large, long-lived WPF applications. For many teams, WPF isn’t legacy. It’s the present and the roadmap, with a decade or more of active development ahead across applications maintained by many developers at once. In a codebase like that, UI iteration speed isn’t a nicety; it compounds across every developer, every day. Hot Reload takes the most repetitive loop in WPF UI work – the change, rebuild, navigate back, and check cycle – and collapses it.

Applications with complex UI structures. This is where it gets interesting. Hot Reload has held up across genuinely non-trivial setups, including XAML resource dictionaries that hold control templates for custom controls in a shared WPF library. That’s exactly the kind of structure that tends to be fragile under hot-reload implementations elsewhere. If your UI is built from layered styles, templated custom controls, and shared resource dictionaries, there’s a good chance Hot Reload was a crucial missing piece of your workflow.

Dual-IDE setups. It’s common for teams to run Rider for most of their work while keeping Visual Studio open specifically for the live XAML loop. Maintaining two IDEs for one task is friction nobody wants. Hot Reload removes the need to keep switching, and for a lot of developers, it’s the only remaining reason to keep a second IDE installed.

Supported target frameworks

The current EAP build supports the latest versions of .NET and .NET Framework. net9.0-windows and net10.0-windows work as expected, and .NET Framework targets are supported as well.

How to try it

  1. Download Rider 2026.2 EAP 2 or later.
  2. Open your WPF project and start a Debug session from Rider.
  3. Edit and save your XAML changes, whether styles, templates, layout, or resources.
  4. Watch the running app update in place.
In this example we’re using changes to a weather app UI to illustrate the seamless Hot Reload experience for a WPF project in Rider 2026.2

That’s the whole loop. No extra configuration, no separate mode to enable.

Known limitations

Here’s the part worth reading carefully: These are the cases where Hot Reload won’t apply a change in place today, along with possible workarounds. For some of these limitations, we don’t have any immediate plans for development, while others will be resolved in upcoming releases. Here is where the beta stands now.

  • Adding, removing, or updating NuGet packages. Restore packages, then restart the debugging session.
  • Adding new controls, windows, pages, or other files to your project while the app is running. Restart the debugging session to pick them up.
  • Changing the root type or x:Class of an already-loaded XAML file (for example, turning a Window into a Page). 
  • Making changes to runtime-created resources or runtime-switched theme dictionaries. Restart the debugging session to apply them.
  • Adding new WPF class members that rely on static registration, such as a DependencyProperty, an attached property, or a RoutedEvent. Note: This only applies when registration happens in a static field initializer; assigning the field later in a method may also work.
  • Adding new x:Name values in XAML. This one is partial: The XAML update itself is applied live, but the new names only become available from your C# code-behind after you restart the debugging session.
  • Changing animations started by one-time triggers, such as an EventTrigger on Loaded. The updated animation won’t restart until the view is loaded again. 

Help us prioritize 

Most of the limitations above are tracked in YouTrack with plans to address them in upcoming releases. Each has its own ticket where you can upvote and add details from your setup. The more signal we have on which ones are blocking your work, the easier it is to prioritize them:

  • [RIDER-138349] Hot Reload after Attach to Process
  • [RIDER-138659] Changes to runtime-created resources or runtime-switched theme dictionaries
  • [RIDER-138874] Adding new WPF class members that rely on static registration
  • [RIDER-138348] Changing animations started by one-time triggers

A few scenarios sit further out, including Hot Reload after attaching to an already-running process. These require a bit more research before we can commit to a particular approach in implementation. If your architecture relies on one of those, please open a ticket with concrete repro details from your real setup.

Tell us how it goes

WPF Hot Reload in Rider exists because developers repeatedly and specifically told us what they needed, so the best thing you can do now is keep that feedback coming. Try it on your real projects, not just a sample, and tell us how it’s working for you here in the comments, over on X, or via our issue tracker.

We’re going to keep expanding framework coverage and chipping away at the limitations above through the EAP cycle and beyond. Thanks to everyone who voted, commented, and waited, and welcome to everyone trying this for the first time. 

Download Rider 2026.2 EAP