Why Your $200 AI Workflow Actually Costs $20k in DevOps 😭

I’ve been spending a lot of time lately looking at different AI automation setups. Mostly, I’ve just been trying to figure out where the actual leverage is for smaller engineering and ops teams.

What I keep finding? A lot of what we’re calling “AI workflows” are really just traditional, deterministic scripts with a chatbot tacked onto the front.

And for the ones that actually do rely on LLMs for core logic? They end up being surprisingly expensive to run in production. But rarely for the reasons people expect.

📌 The Reality Check

  • The token bill is a rounding error: You aren’t going broke on OpenAI API calls. You are bleeding cash on the developer time required to figure out why those calls randomly failed over the weekend.

  • Traditional software fails loudly; AI rots silently: An API endpoint changes, and your old code throws a clean 500 Internal Server Error. An AI agent hits an undocumented data format change and confidently writes garbage data into your CRM.

  • Self-hosting is an infrastructure trade-off: Moving to open-source tools like n8n looks cheap on paper right up until you’re debugging Redis queue bottlenecks at 2 AM.

  • Human-in-the-loop often turns into a rubber stamp: When people get alert fatigue from reviewing non-deterministic outputs, they stop auditing and start blindly clicking “approve.”

💸 Why API Token Costs Aren’t the Main Problem

People get really hung up on API token pricing. You see deep comparison guides tracking input/output costs down to the sixth decimal place. And to be fair, inference is remarkably cheap.

But the token bill is almost never what kills a project’s budget.

Just last month, I was working on an automation designed to handle a shared group financial settlement process. The goal was simple: use an LLM to parse detailed bank statement records and automatically reconcile incoming payments and subsidies into a clean tracking sheet for a 10-person group.

The API calls to run the extraction cost pennies. The real cost was the entire weekend spent debugging.

The model kept hallucinating calculated amounts whenever two distinct line items shared an identical transaction date. I ended up spending a massive amount of engineering hours writing fallback scripts, custom schema validators, and data sanitization layers just to handle exceptions for a system that supposedly cost $0.15 to run.

I also realized during all this that I was spending more time trying to fix the prompt than the original manual process would have taken in the first place, which was a slightly depressing moment. Half the time, the workflow technically worked. I just stopped trusting it enough to leave it unattended.

📉 Automation Entropy: How AI Systems Drift Over Time

When you test an AI pipeline in a closed environment, it feels like magic. But production environments are fundamentally unsympathetic to probabilistic software.

Traditional software infrastructure is rigid, which makes it stable. You write a Python script that pulls a specific JSON key from a third-party API. It runs cleanly until the third party deprecates the endpoint. When it breaks, it throws a loud exception.

AI systems are vulnerable to a much subtler decay: Automation Entropy. The models change. The data changes. Eventually the workflow starts drifting. If you look closely at how AI agents vs traditional automation interact, you’ll see a massive divergence in long-term reliability.

A few months ago, I was helping a 6-member team build out a specialized chatbot designed to estimate groundwater variations by reading through highly unstructured geological reports. In the dev sandbox, our vector retrieval setup was crushing every test query.

Then we pushed it live and left it unattended over a weekend.

An upstream source changed its document formatting slightly, and our pipeline experienced a sudden timeout loop that nobody had properly caught in the error-handling config. The retry queues backed up silently. Because the system aggressively retried failed steps without a hard circuit breaker, it repeatedly hammered the model endpoints. By Monday morning, we didn’t have an elegant groundwater report—we had a backed-up queue, duplicated database writes, and an incredibly messy cleanup job.

At some point, somebody still ends up babysitting the thing. Maybe less than before, sure, but definitely not zero. This is a core part of ai reliability engineering that most teams ignore until their production data gets corrupted.

You can usually tell an AI pipeline is succumbing to operational rot when nobody on the engineering team wants to touch the prompt anymore. The instruction set becomes an accumulation of hyper-specific edge cases like: “Do not format dates as DD/MM if the client is based in North America, unless the text explicitly references European logistics hubs, and make sure to ignore headers that look like…”

We’ve moved past basic text prompt generation at this stage; what you are actually dealing with is what context engineering is and why prompt engineering is no longer enough. Without it, the prompt becomes just as fragile as legacy regex code.

🏗️ Self-Hosting vs. SaaS: The Infrastructure Trapdoor

When the monthly usage bills from managed platforms start scaling up, teams face a choice: stay on a managed platform or spin up an open-source framework.

At first glance, understanding what is n8n and why companies are replacing zapier with open automation looks like an open-and-shut financial win. You replace unpredictable execution tiers with a predictable monthly server invoice.

At first, self-hosting honestly feels great. The dashboard looks clean, everything is fast, and you stop thinking about usage-based pricing for a while. I remember feeling weirdly proud the first time I had n8n running properly on a tiny AWS instance.

Then a few weeks later something broke at 1 AM and suddenly I was reading Redis documentation instead of sleeping.

We’d had a spike in webhook traffic, the node ran out of memory, and because I hadn’t configured a durable queue, we lost a bunch of active state data. You realize pretty quickly that when you self-host, you’re not just saving on software fees. You are essentially volunteering for a DevOps role to patch server vulnerabilities and manage uptime alerts. For some teams, that makes sense. For me, it was exhausting.

The Three Big Choices:

  1. Managed iPaaS (zapier vs make vs n8n): Best for rapid prototyping and isolated tasks. The catch is that per-execution billing scales aggressively, and visual logic becomes unmanageable spaghetti past 20 nodes. Low scalability ceiling.

  2. Self-Hosted Tools: Best for high-volume pipelines managed by technical teams who are willing to run their own infrastructure. If you look at the 8 best ai workflow automation tools in 2026, self-hosting can save money at high scale, but you pay for it in maintenance overhead.

  3. Code-First Frameworks (what is langchain and langgraph): Best for multi-stage reasoning agents and core product features. If you are learning how to build a rag system with pgvector and langchain, code gives you absolute control over state machines and fallback logic.

🛑 Why “Human-in-the-Loop” Often Breaks Down

When a workflow requires high accuracy, the instinctive response is to introduce a human approval stage. The AI handles the messy work and stages the output as a draft. A human operator reads it and clicks “execute.”

In theory, this gives you the efficiency of automation with the safety net of human judgment. In practice, it frequently creates Alert Fatigue.

The pattern loops like this: High Output Volume leads to Repetitive Safe Approvals, which causes Attention Drops, resulting in the Blind Approval of Garbage.

By week three of operating the pipeline, operators are no longer reading the text critically. They are simply click-approving hundreds of staged payloads a day to clear their queue. The human presence stops being an active safety filter and becomes a passive rubber stamp. You haven’t bought meaningful leverage; you’ve just assigned an employee a massive, daily proofreading chore. If you look at how companies try to solve this when figuring out how to build an ai agent for your business, managing human override fatigue is always the hardest piece of user experience to design.

💻 Stop Making Everything an Agent

The industry currently has an obsession with making everything agentic.

If the logic of your business process can be mapped out using clear, conditional rules (if X data is present, route to Y database), you should not be using an AI agent. Deterministic code is infinitely scalable, lightning fast, and entirely predictable. Look for real automations that save time using simple scripts before forcing an LLM to guess the next step.

Save agentic frameworks for highly unstructured problems—like interpreting natural human sentiment or normalizing chaotic, multi-source text summaries. If you’re building out production ai systems, it’s worth reading up on what is mcp model context protocol ai agents to see how models should clean up their tool interactions instead of relying on open-ended logic loops.

🛠️ The Real Engineering

AI automations simply do not behave like standard software licenses. They behave far more like human operational hires. They are flexible, capable of handling incredible complexity, and occasionally brilliant—but they require ongoing management, structural constraints, and deliberate oversight to keep them from wandering off course.

The API call is always the cheapest part of your architecture. Managing the unpredictability that surrounds it is where the real engineering begins.

Backtesting an ICT strategy at 184 speed: timezone-cache + bisect lookup

I have been running an ICT-based reversal strategy live on US500 for a few months. The strategy itself is fine, but the bottleneck was nowhere near the strategy logic. It was in the backtest harness. A 30-day single-instrument simulation took 27 minutes when I wrote the first version. Iterating on parameters was painful, exploring alternative setups was effectively impossible.

After two evenings of profiling and one targeted change, the same 30-day backtest now runs in 8.9 seconds. That is a 184× speedup, and the change was almost embarrassingly small.

This is the story of what was slow, why it was slow, and the cache-plus-bisect pattern that fixed it. If you write your own backtesting code in Python, you are very probably leaving a similar speedup on the table.

The setup

The strategy is a Smart Money Reversal style entry with LRB (liquidity-run break) re-entries. The harness is a fairly standard event-driven loop. For each minute bar in the historical data, we evaluate signal conditions, manage open positions, check pyramid re-entries, and update P&L. The data is roughly 7000 minute bars per US500 trading day, multiplied across 30 days gives around 210k bars per simulation.

210k bars in 27 minutes is 130 bars per second, which is laughable for what is essentially a tight numeric loop in Python. Even with pandas overhead I expected 10× better. Time to profile.

The profiler told a clear story

I dropped cProfile in front of the harness and got the breakdown. The top function by cumulative time was not the strategy evaluator or the order manager. It was pandas.tslib.tz_convert, called from inside the bar iterator. Specifically:

for ts, bar in bars.iterrows():
    local_ts = ts.tz_convert('America/New_York')
    if is_in_session(local_ts):
        ...

The naive code converts the bar timestamp to NY time on every single iteration. pandas timestamp conversion is not free. It runs through tzdata lookup, calculates DST offsets, allocates new Timestamp objects. On a single conversion call that is microseconds, no problem. Called 210k times per backtest, suddenly you are spending eight or nine minutes inside pandas internal C extension before even hitting your own code.

The second-slowest function was a bisect_left on a sorted list of session boundaries that I had written naively as a linear scan. That was eating another four minutes per simulation. The third was unnecessary DataFrame slicing to find the previous N bars, which I had also written as df.loc[prev_ts:ts] and was doing index lookups linearly.

So three independent issues, all rooted in the same mistake: I was doing in the hot loop what should have been done once at the start.

The fix, part one: timezone cache

Instead of converting every bar timestamp on the fly, I precomputed a single column of NY-local timestamps when loading the historical data, and dropped the conversion entirely from the hot loop.

# Before (per-iteration conversion, killing perf)
for ts, bar in bars.iterrows():
    local_ts = ts.tz_convert('America/New_York')
    minute = local_ts.hour * 60 + local_ts.minute
    if NY_OPEN <= minute <= NY_CLOSE:
        ...

# After (one-shot conversion at load, then plain int comparison)
bars['ny_minute'] = (
    bars.index
    .tz_convert('America/New_York')
    .map(lambda ts: ts.hour * 60 + ts.minute)
)

NY_MINUTES = bars['ny_minute'].to_numpy()
# In the hot loop:
for i in range(len(bars)):
    if NY_OPEN <= NY_MINUTES[i] <= NY_CLOSE:
        ...

The session-check becomes a single integer comparison against a numpy int. Zero pandas overhead, zero timezone object allocation, zero string lookup. The pre-computation cost is essentially free, it runs once at the start of the simulation in under 200ms for a month of data.

This change alone took the backtest from 27 minutes down to about 4 minutes. A nice 7× speedup, but I was not done.

The fix, part two: bisect over sorted boundaries

The strategy uses session-relative reference points (NY session open, midnight UTC, last hour of trading, etc.). My naive implementation rebuilt these references for every bar by walking back through the data. The right fix is to precompute boundary timestamps as a sorted array and bisect into them.

import bisect

# Precompute once
ny_session_starts = bars[bars['ny_minute'] == NY_OPEN].index.to_list()

# In the hot loop, find the most recent session start
def session_start_for(ts):
    idx = bisect.bisect_right(ny_session_starts, ts) - 1
    return ny_session_starts[idx] if idx >= 0 else None

bisect_right is O(log n) where n is the number of session-starts. For 30 days that is around 22 (US500 trading days). log2(22) is about 4.5 comparisons per lookup. Compare to the original linear walk which averaged 11 comparisons per lookup. The win per call is modest, but the constant factor (bisect is C-level builtin, my original Python loop was interpreter-level) is large.

This brought the backtest down to about 45 seconds. 36× total speedup. Still not done.

The fix, part three: numpy-native bar windows

The strategy needs to evaluate features over rolling windows of recent bars (last 5, last 20, last 60). My original code was doing bars.loc[prev_ts:ts] for each window for each bar, which does an index lookup and returns a DataFrame slice. DataFrame slicing has noticeable per-call overhead in pandas.

The fix was to precompute the entire OHLC data as numpy arrays at load time, and then slice them by integer index in the hot loop:

# Precompute
OPENS = bars['open'].to_numpy()
HIGHS = bars['high'].to_numpy()
LOWS = bars['low'].to_numpy()
CLOSES = bars['close'].to_numpy()

# In the hot loop (i is the current bar index)
last_20_highs = HIGHS[max(0, i-20):i]
last_20_lows = LOWS[max(0, i-20):i]

Numpy slicing is O(1) view creation, no copy. Pandas slicing on a DatetimeIndex with the same intent allocates intermediate objects. The difference for a single call is small. Multiplied by 210k bars across multiple window sizes per bar, the difference is dramatic.

This last fix brought the final number to 8.9 seconds. From 27 minutes start to 8.9 seconds end, the total speedup is 182×, or 184× depending on how you round the original measurement.

What this unlocks

A 184× speedup is not just nice to have. It changes what is possible in strategy research. With a 27-minute baseline, exploring a parameter grid of 20 combinations took 9 hours. You think hard before launching the run, you wait until next morning, you batch experiments carefully. With a 9-second baseline, the same 20-combination grid finishes in 3 minutes. You explore freely, you try ideas that would have been too expensive to test before, you actually see the parameter landscape.

For me, the practical consequence has been a faster cycle on the live strategy that runs at tgsignals.com, the production system I run on US500 NY session. Strategy ideas that would have taken a week of backtest babysitting now take an afternoon. That difference compounds.

The general lesson

The bigger pattern here is that Python performance bottlenecks for backtesting almost always live in the same three places: timezone handling, slow lookups inside hot loops, and pandas slicing where numpy slicing would do. None of these are exotic. Any decent Python developer profiling the code would find them. The reason they survive in real codebases is that the first version of a backtest is written to be correct, not fast, and once it is correct nobody bothers to optimize.

Profile your hot loop. Convert timezones once. Bisect into sorted arrays. Use numpy slicing instead of pandas slicing when you can. None of these are hard, and any one of them might give you the 10× that turns “I will run this overnight” into “I will run it now.”

The 184× I got was the lucky combination of all three landing on the same codebase. Your mileage will vary, but most backtest harnesses I have seen have at least one of these wins waiting to be picked up.

aion-indian-market-calendar: Python market calendar for NSE, BSE, MCX, and is-market-open checks in India.

If you build for Indian markets, market timing should not live as hardcoded if/else logic inside bots, cron jobs, or trading scripts.

aion-indian-market-calendar is a Python package for Indian market holidays, NSE trading calendar checks, BSE trading session checks, MCX evening session handling, and simple is market open today in India validation for developers.

## Install


bash
  pip install aion-indian-market-calendar

  ## Import

  from aion_indian_market_calendar import IndiaMarketCalendar, is_market_open, next_trading_day

  ## What problem this solves

  A lot of trading systems start with a few hardcoded dates and standard market hours.

  That usually breaks for Indian markets because:

  - holidays change year to year
  - NSE, BSE, and MCX do not behave the same way
  - partial sessions matter
  - commodity sessions and evening sessions need separate handling
  - execution systems need a timing layer before they touch broker or order logic

  This package exists to make market session validation a reusable infrastructure layer instead of a fragile script-level shortcut.

  ## What developers get here

  This is not a generic exchange-calendar post.

  This package is specifically useful when you need:

  - NSE holidays in Python
  - BSE trading calendar checks
  - MCX trading hours and evening-session handling
  - Indian stock market calendar logic for bots and schedulers
  - is market open today India checks before execution
  - next trading day lookup for Indian markets

  For developers, the main difference is practical.

  aion-indian-market-calendar gives you a more direct India-focused execution surface:

  - simple is_market_open() helper
  - next_trading_day() helper
  - session-aware get_session() lookups
  - support for Indian market aliases like NFO and FNO
  - support for instrument-style inputs like NIFTY, BANKNIFTY, and SENSEX
  - bundled offline calendar data
  - optional live event refresh for schedule changes and deltas
  - explicit Asia/Kolkata handling for Indian-market workflows

  ## Example

  from aion_indian_market_calendar import is_market_open

  if is_market_open("NSE"):
      print("NSE is open")
  else:
      print("NSE is closed")

  And if you need more than a yes/no answer:

  from datetime import datetime
  from aion_indian_market_calendar import IndiaMarketCalendar

  cal = IndiaMarketCalendar.bundled(2026)
  probe = datetime.fromisoformat("2026-01-27T09:05:00+05:30")

  print(cal.is_market_open(probe, "NSE_EQUITY"))
  print(cal.get_session(probe, "NFO"))

  ## Why I built this

  I kept seeing the same issue in Indian-market tooling:

  developers were forced to mix strategy logic with exchange-timing logic.  Got tired of hardcoding dates based on published calendar by NSE/BSE/MCX.  

  That is a bad boundary.

  Your strategy should decide what to do.
  Your calendar layer should decide whether the market session is actually valid.

  That separation matters even more in algorithmic trading, quantitative finance, execution schedulers, and pre-trade validation systems.

  ## What this is not

  This package is not:

  - a broker API
  - an order-routing tool
  - a data feed
  - a strategy engine
  - a compliance substitute for exchange circulars

  It is a focused India financial market calendar layer for developers.

  ## Download

  - PyPI: pip install aion-indian-market-calendar
  - Package index: https://pypi.org/project/aion-indian-market-calendar/

  If you build for NSE, BSE, MCX, or India-specific trading infrastructure, this package is meant to remove one boring but expensive class of bugs: running execution logic when the session assumptions are wrong.

like one of the users quoted "This is the kind of infra tooling that quietly saves people from painful production bugs later. Timing logic around Indian markets gets messy fast once you mix NSE, BSE, MCX, holidays, and evening sessions.

Also very true about LLM-generated trading bots defaulting to generic market calendars. AI is great at scaffolding systems, but regional operational details like this are where specialized tooling still matters a lot."

Cheers,
Lokesh (AION ANALYTICS)

Ten Data-Backed Truths Of User Experience ROI

In the high-stakes economy of today, the cost of a friction-heavy interface is no longer just “lost clicks”, but potentially millions in wasted engineering spend and lost business value. As a veteran UX designer who has helped build digital products since the early mobile-first era, I’ve watched business leaders shift from viewing design as a “cosmetic preference” to recognising that user experience is actually the primary engine of business survival.

A UX design role is as much about research and analytics as it is about pixels, and I believe that hard data is the only tool powerful enough to bridge the gap between design and the boardroom. Facts don’t just advocate for the user; they prove that UX is a non-negotiable requirement for a healthy bottom line. Even in the rooms where decisions are made, UX is frequently undervalued as a ‘visual’ role. I’ve learned that the most effective way to dismantle this myth is through data.

The following ten facts represent the current reality of the digital world. These are not just “design tips”; they are the clinical, data-backed pillars for financial growth in a saturated market. Some of these facts are also commonly used by designers as best practices.

For example, I once led a B2C mobile design project, where I was able to strip 1.2 seconds off the mobile load time by reducing and removing some of the visual assets. The result was an immediate 12% lift in completed transactions, proving that in UX, every tenth of a second is a direct lever for revenue.

1. Fixing Issues In The Design Phase Is 100 Times Cheaper

One of the most compelling financial arguments for UX is the 1:100 rule. Modern studies, such as from the IBM Systems Institute and Sugue Technologies, show that fixing an error after a product has been developed and launched can be up to 100 times more expensive than fixing it during the initial design and prototyping phase.

Think of UX as “engineering insurance.” By the time a developer touches the code, every interaction should have been validated. If you discover a fundamental navigation flaw after launch, you aren’t just paying for the fix; you’re paying for technical debt, lost developer time, and the revenue lost while users struggle with a broken flow.

2. Performance Impacts User Experience

In the current landscape, performance is the essential foundation of user experience. A beautiful interface is worthless if the user bounces before it renders. The data is uncompromising: 47% of users expect a page to load in two seconds or less, and missing this window is a financial catastrophe. A mere one-second delay can reduce conversions by 20% and satisfaction by 16%, while retail businesses lose an estimated $2.6 billion annually to slow load times. When mobile load time moves from one to three seconds, the bounce rate spikes by 32%, and by the third second, conversion rates typically plummet from 40% to 29%.

However, this volatility offers a massive lever for growth. Even a microscopic 0.1-second improvement can lift retail conversions by 8.4%, and travel site conversions by 10.1%. Improving your Largest Contentful Paint (LCP) by 31% — a benchmark 67% of websites achieved as of June 2025 — can drive a direct 8% increase in sales. As a long-time designer, I treat speed as a primary design element.

If the site isn’t instantaneous, the design hasn’t just failed — it effectively doesn’t exist.

3. Your Site Has 50 Milliseconds to Impress Your Customers

First impressions are both visceral and aesthetic. Research indicates that users form an opinion about a website’s visual appeal in approximately 50 milliseconds (0.05 seconds). That’s not a lot of time! This split-second “gut-feeling” is a survival mechanism that dictates whether a user stays to explore your value proposition or bounces immediately.

In the current market, 94% of first impressions are strictly design related. If your interface feels “off” or dated, users subconsciously project that lack of quality onto your entire product or service. Your content effectively doesn’t exist if your design hasn’t earned the five seconds of attention required to read it.

4. Hick’s Law: The Cost of Overwhelm

Stakeholders often think “more options” equals “more value.” Psychology proves the opposite. Hick’s Law states that the time it takes to make a decision increases with the number of options available.

Every extra menu item or form field is a “tax” on the user’s brain. As noted by Landbase, top-performing sites now achieve conversion rates exceeding 11%, while average performers struggle below 3%. Those performing well have applied personalization and optimization strategies to simplify the experience.

If you want to increase your revenue by tomorrow, find one field to delete from your checkout flow today.

5. White Space Improves Comprehension

“White space” is often viewed as wasted real estate by non-designers. In reality, it is a tool for focus. Strategic use of white space can increase a user’s content comprehension by up to 20%.

White space prevents “cognitive load” from peaking. By giving the user’s eyes a place to rest, you guide them toward the most important elements, usually your “Buy” or “Sign Up” button. In 2026, as attention spans have dropped to roughly 8 seconds, simplicity is the ultimate luxury and a major driver of engagement.

For example, in a fintech dashboard I worked on, analyst users were feeling overwhelmed by a ‘data dump’ layout in some of the dashboard components. I applied more white space around the data to lower their cognitive load. Simply giving the data room to breathe led to a 25% decrease in time-on-task and a significant boost in trial-to-paid conversions.

6. The Power Of “Fake” Progress

One of the most surprising psychological hacks in UX is that users will complete a task faster if they believe they have already made progress. This is known as the Goal Gradient Effect.

In a classic study, researchers found that a 10-stamp coffee card with two stamps already “pre-filled” was completed significantly faster than an 8-stamp card with zero pre-fills, even though the total spend required was identical. In digital design, showing a progress bar that starts at 15% (simply for creating an account) increases completion rates for onboarding by over 40%. We aren’t just designing screens — we are managing the user’s dopamine and sense of momentum.

7. Make Your Content Readable

Many stakeholders believe that cramming more text “above the fold” increases value. Data proves the opposite. Proper typography, specifically line spacing (leading) and paragraph width, can increase content comprehension and reading speed by up to 20%.

Optimal line height (generally 1.5x the font size) reduces “visual noise,” allowing the brain to process information with less cognitive effort. When users struggle to read your text due to tight spacing or small fonts, their “perceived effort” increases, leading to a higher bounce rate. Legibility is a conversion tool: if it’s hard to read, it’s hard to buy.

There are many ways to display more legible text. For example, if line spacing (leading) is too small or the font is too heavy, this also impacts readability.

8. Your Users Only Read 20% of Your Content

This truth meshes well with the previous one. Users do not read your website; they scan it. On a typical web page, users read only about 20% to 28% of the text.

Because modern users scan in an F-pattern or Spotted pattern, designing for reading is a tactical error. We must design for scanning.

This requires the following:

  • Bold headers that narrate the value proposition.
  • Bullet points for key benefits.
  • White space to connect users to key information (discussed in the previous truth).
  • High-contrast call-to-action (CTA) buttons. If your core message is buried in a paragraph, it is invisible to nearly 80% of your audience.

9. Why User Testing With 5 People Is the Magic Number

I have heard of companies that waste six-figure budgets on massive user studies with 100 people, only to get buried in noise. The reality is that testing with just 5 users typically uncovers 85% of usability problems.

This is a mathematical sweet spot. After the fifth user, you reach the point of diminishing returns — you spend more money to find fewer new bugs. The competitive advantage belongs to small and frequent user testing activities. Test with 5 people, iterate, and test with 5 more. It is the most cost-effective way to build a bulletproof product.

Personally, I have followed this guideline many times during user testing activities, and I can confidently say that testing with 5 people does deliver the majority of issues in your design.

10. The Financial ROI of 9,900%

Last, but definitely not least, the most staggering statistic in our industry remains consistent. On average, every $1 invested in UX returns $100. This 9,900% ROI isn’t magic, but the sum of increased conversion and reduced support.

A fully optimised UX design can improve conversion rates by up to 400%. Furthermore, intuitive design significantly lowers customer support requirements. When a product is self-explanatory, you don’t need a massive call centre to explain how to use it.

The Depth of UX Investment

Beyond these individual statistics, we must address the cumulative effect of a mature UX practice. In my years of practising, the most successful firms are those that treat UX as a continuous improvement loop rather than a one-off project. The data shows that companies with high design maturity see 32% higher revenue growth and 56% higher total returns to shareholders compared to their less design-focused peers.

This discrepancy exists because mature UX organisations move beyond “user delight” and into “user efficiency.” When you shave 30 seconds off a workflow for a team of 1,000 employees, you aren’t just making them happier; you are reclaiming hundreds of thousands of dollars in annual productivity. This internal ROI is often overlooked, but it is just as vital as consumer-facing conversion rates.

Furthermore, the “experience gap” is real. 80% of companies believe they deliver a “superior experience,” but only 8% of customers agree. This massive disconnect represents a significant market opportunity for those willing to look at the hard data. By bridging this gap through continuous user testing and performance optimisation, you aren’t just improving a product but capturing market share that your competitors are leaving on the table.

The Impact of AI

Today, we cannot talk about UX without talking about AI. However, AI hasn’t replaced these 10 facts, but it has accelerated the solution on some of these.

  • Agentic UX
    60% of designers are now building “AI agents” that take actions on behalf of the user, drastically reducing the impact of Hick’s Law by narrowing down choices before the user even sees them.
  • Real-Time Personalisation
    32% of teams use AI to personalise interfaces in real-time, meaning the F-Pattern scanning habits are catered to by moving the most relevant content to exactly where that specific user’s eyes are likely to land.
  • Automated ROI
    93% of designers are using generative AI tools to prototype faster, which brings the 1:100 Cost Ratio even lower by allowing us to find and fix errors before a single line of production code is written.

AI has turned UX from a static map into a living, breathing guide for users. But the fundamental rules of human psychology, such as our 50ms judgments and our need for white space, remain unchanged.

Conclusion

In summary, here is a list of the key truths to remember:

  1. Fixing issues in the design phase is 100 times cheaper.
  2. Performance impacts user experience.
  3. Your site has 50 milliseconds to impress your customers.
  4. Hick’s Law: The cost of overwhelm.
  5. White space improves comprehension.
  6. The power of “fake” progress.
  7. Make your content readable.
  8. Your users only read 20% of your content.
  9. Why user testing with 5 people is the magic number.
  10. The financial ROI of 9,900%.

As we move deeper into the late 2020s, the line between “design” and “business strategy” has vanished. The data is in, and companies that lead in design outperform their competitors by 1.7x in revenue growth.

UX design is no longer a team you hire to “make things look nice.” It is the research-driven, data-backed discipline that ensures your digital product isn’t just a cost centre, but a revenue-generating machine.

In fact, this has always been the case, but I hope that in presenting these cold, hard truths, it now becomes a reality for your business.

As I have found over the years, implementing factual design improvements does make a difference that intuition alone can’t replicate. We are past the era of subjective opinions. The data is clear, the psychology is proven, and the ROI is undeniable. The only question left is whether you’re ready to let the facts lead your design, or if you’ll let your competitors do it first.

Further Reading On SmashingMag

  • “The Human Element: Using Research And Psychology To Elevate Data Storytelling”, Victor Yocco & Angelica Lo Duca
  • “AI In UX: Achieve More With Less”, Paul Boag
  • “Six Key Components of UX Strategy”, Vitaly Friedman
  • “When Friction Is A Good Thing: Designing Sustainable E-Commerce Experiences”, Anna Rátkai

Pyrefly LSP Integration with Type Engine in PyCharm 2026.1.2

In PyCharm 2026.1.2, you can enable Pyrefly as an external type provider, dramatically increasing the speed of the IDE’s code insight features.

What is the Pyrefly LSP?

“LSP” stands for the Language Server Protocol – a standardized protocol that allows code editors and IDEs to communicate with language servers. The LSP enables language servers to provide code intelligence features, such as:

  • Code completion
  • Information on hover (for example, quick documentation)
  • Go to definition and other actions
  • Error checking and type-related diagnostics

The key benefit of the LSP is that it allows a single language server to be used across multiple tools. This means that language-specific intelligence does not have to be implemented separately in every editor, IDE, or CI pipeline.

Pyrefly is Meta’s next-generation Python type checker, engineered from the ground up in Rust to replace its predecessor, Pyre (written in OCaml). With the move to Rust, Pyrefly achieves significantly faster performance and improved cross-platform portability. More than just a rewrite, it is designed to be more capable and robust, offering an efficient toolset for maintaining large-scale Python codebases with high precision and minimal overhead.

Pyrefly provides the following benefits:

  • Higher performance and efficiency – Thanks to its Rust-based architecture, Pyrefly achieves significantly faster speeds and improves cross-platform portability. 
  • Enhanced code intelligence – As an external type provider, Pyrefly powers essential code insight features in the IDE, including type inference, type-related diagnostics, quick documentation, and inlay hints.
  • Scalability – Pyrefly is designed to handle large-scale Python codebases with high precision and minimal overhead.

Pyrefly is highly beneficial for projects and developers dealing with large, complex Python codebases that prioritize performance and robust typing. Integrating Pyrefly via the LSP is part of our ongoing work to enhance code insight performance in PyCharm.

Using Pyrefly in PyCharm

Once enabled, Pyrefly powers all code insight functionality in PyCharm, including type inference and type-related diagnostics, quick documentation, and inlay hints. Delegating analysis to this faster engine delivers significantly improved performance.

To start using Pyrefly in your PyCharm project, go to the Type widget at the bottom of the window. By default, the IDE uses the built-in type engine. Click on the widget and select the option to use Pyrefly. If you do not have Pyrefly installed yet, PyCharm will install it automatically. 

Once you’ve switched to the Pyrefly type engine, you will see a Pyrefly icon at the bottom, which you can hover over to check the version being used.

Please note that the integration currently works for local interpreter configurations. Support for Docker, Docker Compose, WSL, SSH, and multi-module projects is planned for future releases.

Pyrefly vs. the built-in type engine

Now let’s look at how Pyrefly and the built-in type engine behave in a complex Python project. In this FastAPI example, multiple files are typed, but in this file, the variable ref is incorrectly typed, causing four errors. When using the built-in type engine, the IDE identifies that something is wrong, but it suggests running further analysis to fix the problem, which requires an extra step.

Using Pyrefly as the type engine, the IDE reports errors immediately and highlights where they originate. However, it is worth noting that, in our example, there are four errors, but Pyrefly picks up only three of them. It misses the one in self._storage[ref].

Download the latest version of PyCharm and try it out

Ready to experience a dramatic leap in Python development performance? The Pyrefly type engine in PyCharm 2026.1.2 delivers the next generation of type checking. Engineered in Rust for unparalleled speed, it resolves files in as little as 0.5–1 seconds, significantly faster than the built-in engine. If you maintain large, complex Python codebases and prioritize robust typing, this feature is essential, as it allows you to delegate analysis to a faster engine and receive immediate type-related diagnostics. Download the latest version of PyCharm (2026.1.2) to unlock superior efficiency, scalability, and code insight.

Help Shape the Future of Kotlin in the Age of AI

AI is rapidly changing the way developers write, review, learn, and maintain code. Code completion, AI chat assistants, autonomous coding agents, and other tools are giving rise to new workflows almost every month.

But one important question remains:

How well do these tools actually work with Kotlin?

We want to better understand how Kotlin developers use AI today, what works well, where the friction points are, and what opportunities lie ahead. That’s why we’re launching a short community survey focused on AI-assisted development with Kotlin.

Share your perspective

AI is already reshaping software development. By participating, you can help ensure that the Kotlin community’s real experiences, expectations, and needs are part of that conversation.

Complete the survey to have the chance to win a prize of your choice:

  • USD 50 Amazon Gift Card
  • Six-month JetBrains All Products Pack subscription
Take the survey

We’re looking forward to hearing your thoughts.

Taking part in KotlinConf 2026?

If you’re attending KotlinConf 2026, make sure to visit the registration counter after completing the survey to chat with our team and receive a small thank-you gift.

A New Default Project Structure for Kotlin Multiplatform

We are updating the default project structure for Kotlin Multiplatform projects to give modules clearer responsibilities, better align with conventions used by other build systems and frameworks, and reflect the changes in Android Gradle Plugin 9.0.

You’ll see this project structure in newly created projects generated by our tools, in the official documentation, and in samples for Kotlin Multiplatform.

These changes are already live in the KMP wizard, both in your IDE (with the Kotlin Multiplatform plugin installed) and on kmp.jetbrains.com. We’re also working on updating our sample projects and other learning materials to match this new structure. You can already check out kotlinconf-app, KMP-App-Template, or RSS Reader as a reference.

To get support for AGP 9.0 in IntelliJ IDEA, update to 2026.1.2 or newer, and use the latest version of the Android plugin.

This post explains the changes that we’re making, why we’re changing the structure, and how you can update existing projects.

What’s changing

With our previous structure, most projects had a single composeApp Gradle module that contained a Kotlin Multiplatform library and also acted as an application for one or more platforms, containing their entry points and other related configuration.

In the Project view, this looked like:

Our new default structure has a shared module with a single, clear responsibility: It’s a Kotlin Multiplatform library containing the shared code. Then, for each platform where you want to build a runnable application on top of the shared library code, you’ll have separate application modules such as androidApp, desktopApp, and webApp.

The new structure looks like this in the Project view:

Why we’re making changes

The composeApp module in the old structure had several different responsibilities. As a result, it contained a lot of configuration, including platform-specific packaging details for all platforms. This could make it difficult to tell which parts were setting up a Kotlin Multiplatform library and which parts were setting up the applications themselves.

If you chose not to share UI on a client platform (for example, to use SwiftUI for your iOS application), the old structure included an additional shared module besides composeApp. This was a significant change to the module structure, but it only happened in certain configurations.

There was also asymmetry when it came to iOS apps. Because they require an Xcode project that consumes the shared code, iOS applications were already in a separate iosApp folder, while the rest of the applications built on the shared code were all co-located in composeApp.

Android Gradle Plugin 9.0 requires the entry point of the Android application to be in a separate module from the shared code, as it no longer supports applying the Android application Gradle plugin in a multiplatform module.

Finally, we previously had a different structure for Gradle-based and Amper-based projects. While Gradle supports multiple applications configured in a single module, Amper allows only one product per module, so Amper-based projects already used separate modules for each application.

Goals of the new structure

Based on the points above, we created the new structure with these goals in mind:

  • Providing an initial setup for projects where each module has a clear responsibility and single purpose. It should always be clear where a given piece of code or build configuration should be placed in the project.
  • Keeping the structure as consistent as possible across the different configurations that the wizard allows: different sets of target platforms, having a server application or not, and choosing native or shared UI for clients.
  • Making it easy to modularize the project further, to go from a single multiplatform module to several ones as desired.

Adapting to other configurations

The examples above show the new project structure for a Compose Multiplatform application that shares its UI across Android, iOS, desktop, and web platforms. In other configurations, the structure will adapt as required, with minimal changes.

Configurations with native UI

Kotlin Multiplatform supports using native UI on top of shared Kotlin code. For example, you can choose to use SwiftUI for your iOS app while using Compose Multiplatform for other platforms. In this case, you’ll write shared business logic code that’s used by all platforms, and shared UI code that’s only used by certain platforms.

In this configuration, the new structure will have two shared modules instead of one: sharedLogic and sharedUI. While sharedLogic is consumed by all applications and doesn’t have Compose dependencies, sharedUI is only consumed by those that use Compose Multiplatform for their UI.

It’s still easy to decide which module to write your shared code in: If all your platforms will use it, including those with native UI implementations, it should go in sharedLogic. If only platforms using Compose Multiplatform need that code, it should go in sharedUI.

Configurations with a server included

For projects that also target server-side Kotlin, the new structure adds a server module and moves all client-side modules into a nested app folder. An additional core module in the project root lets you share code between server-side and client-side code, such as models and validation logic.

Updating existing projects

While we’re changing the default structure for newly created projects, existing projects aren’t required to adopt the same exact structure. If you want to migrate an existing project to match this new default structure, you can use the migration guide, which shows you how to introduce new modules for each entry point.

Note that the changes related to Android Gradle Plugin 9.0, however, are mandatory for all existing multiplatform projects that target Android. You can learn more about these changes and how to update your projects in this blog post.

To get support for AGP 9.0 in IntelliJ IDEA, update to 2026.1.2 or newer, and use the latest version of the Android plugin.

Get started with KMP today

To create a new project with the updated structure, go to kmp.new or use the Kotlin Multiplatform wizard in your IDE (available in both IntelliJ IDEA and Android Studio with the Kotlin Multiplatform plugin installed).

If you’re looking for examples of the new structure in action, take a look at kotlinconf-app, KMP-App-Template, or RSS Reader.

IntelliJ IDEA 2026.1.2 Is Out!

IntelliJ IDEA 2026.1.2 has arrived with several valuable fixes.

You can update to this version from inside the IDE, using the Toolbox App, or using snaps if you are a Ubuntu user. You can also download it from our website.

Here are the most notable updates included in this version:

  • Projects can now be opened correctly via .ipr files generated by the Gradle idea task. [IJPL-242321]
  • The indentation for Java ternary expressions with chained method calls has been fixed. [IDEA-387867] 
  • Pressing the Alt+Enter key combination on Windows no longer opens the context menu unexpectedly. [IJPL-47743] 
  • Live templates with groovyScript now work as expected again. [IJPL-241581] 
  • Dragging and dropping selected code with the mouse onto its original position no longer causes the code to disappear. [IJPL-235895] 
  • It is once again possible to open a diff in an external tool by double-clicking a file in the Commit tool window. [IJPL-241256] 
  • The MCP Server no longer reports illegal character errors for projects with spaces in their paths [IJPL-241803] 
  • Workspaces once again function as expected. [IDEA-388445] 
  • Several IDE freezes have been resolved. [IJPL-235455] [IJPL-224542] [IJPL-203153]

To find out more details about the issues resolved, please refer to the release notes.

If you encounter any bugs, please report them to our issue tracker.

Happy developing!

Compose Multiplatform 1.11.0 Is Now Available

A new release of Compose Multiplatform has landed, with improvements to the iOS and web experience and a refreshed approach to UI testing. Read on for the highlights, or for the complete list of changes, check out the What’s New.

Get Started with Compose Multiplatform

Native text input on iOS

If you’ve wanted text fields in your Compose iOS app to feel a little more native, this one’s for you. Compose Multiplatform 1.11.0 introduces an experimental native text input implementation built on top of UIView.

This makes caret movement more precise, offers native gestures and selection handles, and provides the familiar system context menu – including Autofill, Translate, and Search. The existing text input remains the stable, cross-platform choice, but if you want the most native feel on iOS, you can now opt in.

Native text input on iOS

Another iOS improvement: Concurrent rendering, introduced as an opt-in feature in version 1.8.0, is now enabled by default. Rendering tasks are now offloaded to a dedicated render thread out of the box, so your apps get the performance benefits without any extra configuration.

Compose UI testing, v2

Testing on non-Android targets gets an upgrade with support for the v2 ComposeUiTest APIs. The default dispatcher is now StandardTestDispatcher, so coroutines run in the order in which they’re queued. This makes tests more predictable and brings them closer to production behavior.

The v2 APIs also accept an effectContext parameter for passing a custom coroutine context into your compositions – useful for things like overriding the motion duration scale or supplying your own test dispatcher:

The previous APIs (runComposeUiTest, runSkikoComposeUiTest, and runDesktopComposeUiTest) are now deprecated in favor of their v2 counterparts.

Smoother scrolling on web targets

Scrolling performance on Compose web has been trailing that of native targets for a while. With 1.11.0, touch processing has been substantially reworked, and scrolling in Compose web apps now feels much closer to what you get on other platforms.

You can see it in action in the latest web version of the KotlinConf App. For all the gritty details, as well as demos and the list of fixes, head over to CMP-9727.


That’s the overview of 1.11.0. Update your dependencies, try out the new APIs, and let us know what you think. For everything that didn’t make it into this post, check out the full release notes or What’s New.

The SpaceX-Anthropic Deal Shows AI Is Becoming a Fight Over GPUs and Power

The SpaceX-Anthropic Deal Shows AI Is Becoming a Fight Over GPUs and Power

Note: I originally wrote this post in Korean on May 7, 2026. This is a lightly edited English version for dev.to.

TL;DR

SpaceX and Anthropic have signed a large-scale compute infrastructure deal.

By gaining access to SpaceX’s computing capacity, Anthropic can raise usage limits for Claude Code and the Claude API. This is not just a routine product update. It shows a broader shift in AI competition: from model performance alone to GPU access, power capacity, and the ability to run AI systems reliably at scale.

1. A Usage Limit Announcement With an Unusual Backstory

In the early hours of May 7, 2026, I came across a short announcement about Claude.

The summary was simple: Claude’s usage limits were going up.

But what caught my attention was not just the limit increase. It was the reason behind it.

Anthropic had announced a new compute partnership with SpaceX.

Anthropic’s official announcement explained that the company had raised Claude’s usage limits and agreed to a new compute deal with SpaceX to substantially increase capacity in the near term.

According to the announcement, Claude Code’s 5-hour usage limit would double for Pro, Max, Team, and seat-based Enterprise plans. Peak-hour limit reductions for Pro and Max accounts would be removed. API rate limits for Claude Opus would also increase significantly.

My first reaction was simple:

Why is SpaceX showing up in a Claude announcement?

On the surface, this looks like a normal capacity upgrade notice. Claude Code gets higher limits. Claude API gets better rate limits. Users get more room to work.

But underneath that announcement is something much bigger: a large-scale infrastructure deal that gives Anthropic access to SpaceX’s compute capacity.

This is not really a product collaboration. SpaceX is not suddenly building Claude features. Anthropic is not launching rockets.

It is a compute partnership.

And that distinction matters.

Because it shows that AI competition is no longer just about who has the best model. It is also about who can secure enough GPUs, power, and data center capacity to actually run that model for millions of users.

2. What Actually Changes for Users

The practical impact is pretty clear.

According to Anthropic’s May 6 announcement, Claude Code’s 5-hour usage limit doubles for Pro, Max, Team, and seat-based Enterprise plans.

For Pro and Max users, the peak-hour reductions also disappear. If you have ever felt like your Claude usage limit drained suspiciously fast during busy hours, this is the kind of change you would actually notice.

The Claude Opus API also gets a significant rate limit increase.

In other words, this is not just “we bought more servers.”

For people who use Claude Code every day, or developers who rely on the Opus API, these are immediate quality-of-life improvements.

There is one caveat: the announcement does not directly say that free-tier limits are increasing.

So free users may not see a dramatic change right away. But infrastructure expansions like this can still matter over time. More compute capacity can improve service stability, reduce pressure during peak hours, and make future limit increases more realistic.

Whether free-tier users will eventually benefit directly remains unclear.

3. Why Claude Needed More Compute

This announcement makes one thing very clear:

Anthropic’s challenge was not only building a smarter model. It was also running that model at scale.

That sounds obvious, but it becomes much more important when you look at Claude Code.

Claude Code is not just a simple autocomplete tool that suggests one or two lines of code. It can read a codebase, understand multiple files, edit code, follow instructions, and assist with longer development workflows.

That kind of tool needs much more context and much more compute than a short chatbot conversation.

When you use AI tools seriously, this becomes very visible.

Model quality matters, of course. But usability matters too.

A model is not very helpful if:

  • the usage cap is too tight,
  • peak-hour limits interrupt your workflow,
  • long tasks get cut off halfway through,
  • or API rate limits make the system hard to rely on.

For a coding tool like Claude Code, this friction adds up quickly.

Developers do not just need a smart model. They need a model that stays available long enough to finish the task.

That is why this deal feels important. It looks like Anthropic’s direct answer to one of the biggest bottlenecks in AI products today: compute.

4. The Unexpected Partner: SpaceX

The most interesting part of this story is the partner.

SpaceX is not the first company people usually associate with Claude.

Anthropic and Elon Musk have not exactly had a simple public relationship. Musk had previously criticized Anthropic, including comments about the company’s values and direction. CNBC covered some of those remarks in its reporting on the deal.

CNBC report

Then, around the time the deal was announced, Musk said he had spent time with senior Anthropic team members and came away deeply impressed.

And now SpaceX’s computing infrastructure is helping power Claude.

Several outlets covered the partnership as an unexpected pairing.

Business Insider report

What makes this interesting is not just the drama.

It is what the situation reveals.

No matter how intense the public criticism or competition gets in AI, large-scale AI services still need compute.

Philosophy does not run inference.

GPUs do.

According to reporting, Anthropic is gaining access to SpaceX’s Colossus 1 compute capacity, including more than 300 megawatts of power and over 220,000 NVIDIA GPUs. That additional capacity is expected to support Claude availability and usage improvements.

This also changes how we think about SpaceX.

Most people think of SpaceX as a rocket and satellite company. But in this context, SpaceX is also becoming a compute infrastructure provider for AI companies.

That is a huge shift.

AI may look like software on the surface. We interact with it through chat windows, APIs, code editors, and web apps.

But behind those interfaces is a very physical industry:

  • GPUs
  • power
  • cooling
  • land
  • data centers
  • network infrastructure

Every Claude Code session, every API request, and every long-context coding task depends on that physical infrastructure.

The SpaceX-Anthropic deal makes that reality hard to ignore.

5. Cursor Went the Same Route

This is not only a Claude story.

In April 2026, Cursor also announced a model training partnership with SpaceX.

Cursor’s official announcement

In its blog post, Cursor explained that compute had become a bottleneck for its model training ambitions. By partnering with SpaceX and using xAI’s Colossus infrastructure, Cursor said it could scale up its model intelligence more aggressively.

When you put the Claude and Cursor cases together, a pattern becomes clear.

AI coding tools are no longer small side utilities.

They are becoming deeply embedded in how developers work.

That means they need:

  • stronger models,
  • longer context windows,
  • more inference capacity,
  • more training capacity,
  • and more stable usage quotas.

A few years ago, the main question was:

Who has the better model?

Now the question is becoming:

Who can actually run the better model at scale?

That second question is becoming just as important as the first one.

6. The Further-Out Story: Orbital AI Infrastructure

There is one part of this announcement that sounds almost like science fiction.

Anthropic also mentioned interest in developing gigawatt-scale orbital AI computing capacity with SpaceX.

In simpler terms, this means that long-term discussions may even include AI compute infrastructure in space.

To be clear, this is not the same as saying that SpaceX and Anthropic are definitely building orbital data centers right now.

It sounds more like an open door than a confirmed construction plan.

But the idea is not completely random either.

AI infrastructure is becoming increasingly tied to physical constraints:

  • power supply,
  • cooling,
  • land availability,
  • local regulation,
  • grid capacity,
  • and data center expansion.

As models grow larger and AI tools become more widely used, the bottlenecks are not only algorithmic.

They are physical.

More intelligence requires more compute. More compute requires more chips. More chips require more power and cooling.

So even if orbital AI data centers still sound distant, the direction makes sense.

AI competition is no longer confined to what happens on a screen.

It is moving into energy systems, physical infrastructure, and maybe eventually even beyond Earth.

Closing: A Good AI Has to Be Usable

Reading this news, I kept coming back to one thought:

The center of gravity in AI competition is shifting.

At first, the conversation was mostly about model quality.

Which model writes better?
Which model codes better?
Which model reasons better?
Which model feels more creative?

Those things still matter.

But from a user’s perspective, performance alone is not enough.

A good AI model has to be usable.

It has to be available when you need it. It has to last through long tasks. It should not stop halfway through a coding session because a limit was hit. For developers using an API, rate limits and usage caps need to be predictable.

The SpaceX-Anthropic deal is a concrete example of that reality.

The next phase of AI competition is not only about building better models.

It is also about securing the infrastructure needed to run those models.

That is why this story does not end at “Anthropic signed a deal with SpaceX.”

AI is becoming a massive physical industry.

Every time we ask Claude to work on a codebase, ask ChatGPT to summarize a document, or ask Gemini to analyze a spreadsheet, enormous computational resources are moving in the background.

What it takes to build great AI is no longer just algorithms.

It is GPUs, power, data centers, and maybe, eventually, orbit.