Author Archives: DevegygiebyOL

#powerbi

How I Built a Treasure-Run Game Where Australia Saves the Sun

This is a submission for the June Solstice Game Jam

What I Built

For this game jam, I built Dawn Dashers, an adventure runner set during the June Solstice, the longest night of the year in Australia.

The idea started with a simple question:

What if the longest night didn’t end?

In Dawn Dashers, a rogue machine called the Turing Engine has stolen the sunrise and scattered seven Sun Fragments across Australia. Players race through deserts, bushland trails, coastal cliffs, and hidden ruins to recover the fragments and bring daylight back.

The game takes inspiration from the June solstice, but with an Australian treasure-hunting twist. Along the way, players dodge obstacles, collect relics, unlock animal abilities, and solve quick logic puzzles inspired by Alan Turing.

One thing I wanted to avoid was making the solstice just part of the story. Instead, it’s part of the gameplay itself. At the start, the world is trapped in darkness. As players recover Sun Fragments, the environment gradually becomes brighter, warmer, and more alive until the final sunrise returns.

The goal was simple:

Build something fun first, while using the June Solstice as the heart of the adventure.

Video Demo

You can also try:

  • Running through the Australian Outback
  • Discovering treasure clues
  • Solving Turing-inspired logic challenges
  • Collecting Sun Fragments
  • Unlocking animal abilities
  • Watching the world transform from the longest night back into daylight

… directly through the deployed game (using Vercel)

dawn-dashers.vercel.app

Code

GitHub Repository

Built with:

  • Unity 6
  • C#
  • Universal Render Pipeline (URP)
  • WebGL
  • GitHub Copilot

How I Built It

I started by focusing on the core gameplay loop:

Run → Dodge → Collect → Discover → Restore Light

Once movement felt fun, I started layering in everything else.

Making the Solstice Matter

The biggest design decision was making the June Solstice an actual game mechanic rather than just background lore.

The game begins during the longest night of the year. Every Sun Fragment recovered pushes the world closer to dawn.

As players progress:

  • Darkness slowly recedes
  • New areas become visible
  • Lighting becomes warmer
  • The world feels more hopeful

By the end of the game, the player has literally restored the sunrise.

Building an Australian Adventure

Rather than using generic fantasy locations, I wanted the game to feel distinctly Australian.

The adventure takes players through:

  • Outback ruins
  • Bushland trails
  • A quirky roadside servo
  • Coastal lighthouse cliffs
  • Aurora-lit Tasmania

Each region introduces new obstacles and visual themes while keeping the action moving.

A Nod to Alan Turing

Since June is also Alan Turing’s birth month, I wanted to include a tribute without turning the game into a history lesson.

The game’s antagonist, the Turing Engine, believes that endless night is the most logical state of existence.

Players encounter quick puzzles based on:

  • Pattern recognition
  • Binary switches
  • Signal routing
  • Light circuits

The puzzles are short and designed to support the action rather than interrupt it.

Art Direction

One of the biggest pivots during development was moving away from a bright arcade aesthetic and leaning into a treasure-hunting adventure style.

The world uses:

  • Warm desert colours
  • Ancient ruins
  • Dust particles
  • Golden sunlight effects
  • Treasure-map inspired UI

This direction helped tie together the themes of the solstice, exploration, and treasure hunt.

Technology

The game was built using:

  • Unity 6
  • C#
  • URP
  • WebGL deployment
  • GitHub Copilot for development acceleration

The focus throughout was on keeping the project lightweight, responsive, and playable directly in the browser.

Prize Category

Best Ode to Alan Turing

Dawn Dashers is my tribute to Alan Turing’s legacy.

Rather than focusing on historical events, I wanted to celebrate the ideas that made his work so influential: logic, patterns, problem-solving, and computation.

The Turing Engine serves as the main antagonist, and players overcome its challenges through quick puzzles inspired by computational thinking.

Best Google AI Usage

I’m also submitting for Best Google AI Usage.

I’ll be honest: I’m much more comfortable building systems and writing code than designing beautiful interfaces.

One of the challenges during this project was figuring out how to make Dawn Dashers feel cohesive visually. I had a rough idea of the atmosphere I wanted—an Australian treasure-hunting adventure, but translating that into colours, layouts, typography, and UI design isn’t my strongest skill.

That’s where Google Stitch helped.

I used Stitch to explore rapidly:

  • UI layouts and screen structure
  • Menu and HUD designs
  • Colour palettes
  • Typography combinations
  • Visual hierarchy
  • Overall art direction

What impressed me most was how easy it was to iterate. Instead of spending hours moving elements around in a design tool, I could describe the feeling I wanted and quickly generate multiple directions to evaluate.

For example, I started with a fairly generic arcade-style look, but through a few iterations in Stitch, I landed on a much stronger visual identity built around:

  • Warm desert colours
  • Treasure-map inspired UI
  • Gold accents
  • Weathered parchment styling
  • Adventure-game aesthetics

Those explorations directly influenced the final design language of the game.

I also used Gemini extensively during the design phase.

One challenge was incorporating Alan Turing-inspired puzzles without making the game feel like a classroom exercise. I wanted puzzles based on logic, patterns, and computation, but they also needed to feel like they belonged in an Australian treasure-hunting adventure.

Gemini helped me brainstorm and refine puzzle concepts such as:

  • Pattern recognition challenges
  • Binary switch puzzles
  • Signal routing mechanics
  • Treasure clues tied to Australian locations and wildlife
  • Turing-inspired logic challenges that could be solved quickly during gameplay

For example, I used Gemini to take abstract puzzle ideas and re-theme them into Australian contexts, turning generic logic puzzles into clues involving lighthouses, Outback landmarks, wildlife, and hidden Sun Fragments.

The combination of Stitch and Gemini helped bridge two areas where I typically spend a lot of time: design exploration and content creation. Instead of starting from a blank page, I could rapidly iterate on ideas, evaluate options, and focus more of my time on building and polishing the game itself.

The AI tools didn’t replace the creative decisions—they helped me reach better ones faster.

As a solo developer, having a tool that could help bridge the gap between “I know what I want it to feel like” and “I know how to design it” was incredibly valuable. It let me spend more time building gameplay while still ending up with a much more polished visual experience.

Google Stitch didn’t replace the design decisions—it helped me discover and refine them much faster.

Happy Dashing !!!

Thanks for checking out Dawn Dashers 🌅

It was a lot of fun combining the June Solstice, Australian landscapes, treasure hunting, arcade runners, and a small nod to Alan Turing into one adventure.

2026. Week 23: a UI task that stopped being small

I Thought This Would Be a Local UI Task

This week, I thought I was solving a fairly narrow task: how to show group settings more neatly in the new checklist editor. The question looked local enough: how to read a group’s state right in the item row, and how to provide a convenient entry point for editing.

New editor
The Morning chaos row is active now, and it is easy to see that the group of indicators on the right takes a significant part of the row’s space.

At first, this looked like a normal UX improvement. I needed to find a form of presentation that would not force the user to open the detail panel too often, while also not overloading the item row.

The First Version Turned Out Too Noisy

My first move was toward a richer inline representation. I wanted to show a set of signals directly in the group row, so that its state could be read quickly.

It became clear quite fast that this was the wrong direction. This UI did not make reading easier; it added noise. Instead of a more “talkative” interface, I had to move in the opposite direction: remove extra elements and compress the state representation.

In the end, the solution narrowed down to one summary pill in the group row and one shared popover for editing. That became the main UX lesson of this part of the week: in a dense editor UI, extra signals stop helping very easily.

Then a Deeper Problem Surfaced

The story did not end there. While I was working on the interface, it became clear that the problem was not only about how it looked, but also about how it worked at all.

During the work, an ambiguity in the semantics of visibility surfaced unexpectedly. Initially, I made it so that if a field was invisible, conditions like “if another item has such-and-such value” could make it visible. But the new interface showed this approach poorly, to the point that even I, the author, could not quickly understand what was going on when looking at the editor.

After several experiments, I realized that the problem was not in the editor UX but in the semantics. In the end, I had to invert it: effective_visible = is_visible && conditions_pass. In other words, an item that is invisible by default cannot be made visible by any conditions. But if a visible item also has conditions, then the item can become invisible.

A typical breaking change, and good that it happened early.

After That, the Most Honest Part of the Work Began

Once the rule became clearer, the work was not finished. After all the edits, a more unpleasant but also more honest phase began: I had to verify that the system really behaved the way it was now described.

This is where the E2E part began. It looked much less elegant than the idea of the solution itself. There were failures like Expected: "hidden" Received: "visible", there were timeouts around the drawer and popover, and there were situations where everything already looked fine locally, but the tests answered: no, the behavior is still not fully assembled.

In the end, E2E became the real finish line of this task. Not the moment when the interface already looked convincing, but the moment when the target scenarios started to converge in a verifiable form.

In Parallel, Another Shift Was Taking Shape

There was another line of work this week as well — backend cleanup. I do not want to expand on it here: it will get a separate text.

Against that background, and also against the background of publishing LLM-Assisted Deploy: You Save Typing, Not Thinking, another thought started to come together more strongly for me. A meaningful part of the work should not disappear into local edits, commits, and one-off sessions. It should be brought to the state of a public artifact: a text, a case, a note — something from which one can later read not only the result, but also the line of thought, the constraints, and the way of verification.

What I Take Away From This Week

In short, this was a week in which a small UI task refused to stay small.

First, it ran into the need to simplify my own solution. Then it brought out behavior that had not been fully thought through or clearly stated. Then it demanded that this behavior be proved and shown separately through tests.

That is probably why the week feels coherent. Its center was not that I simply made one more product improvement, but a more general movement: from a local change to an explicit contract, from an explicit contract to verification, and then further to proving and showing the work so that it would not dissolve inside the code.

Introducing TokenCap — Context Engineering for Modern Codebases

One of the biggest challenges in AI-assisted development isn’t choosing the right model.

It’s providing the right context.

As codebases grow, important information gets scattered across files, commits, architecture decisions, documentation, and debugging history. Developers spend valuable time manually gathering context before they can effectively use AI tools.

That’s why I built TokenCap.

TokenCap helps developers generate structured, AI-ready project context from their codebase, making it easier for AI assistants to understand projects without consuming unnecessary tokens.

Current Features

  1. – Project Knowledge Graph
    Visualizes relationships between files, components, services,
    routes, and dependencies.
  2. – Context Memory
    Captures project decisions, architecture notes, constraints, and
    development history.
  3. – Change Intelligence (tokencap diff)
    Transforms raw git diffs into impact analysis, risk assessment
    testing recommendations, and AI review prompts.
  4. – Context Packing
    Prioritizes and compresses relevant project information into
    token-budgeted context packs optimized for AI workflows.

Why TokenCap?
Reduce context-switching overhead
Improve AI response quality
Save tokens and costs
Understand large codebases faster
Generate structured project knowledge automatically

I’m actively working on the next generation of TokenCap, including an Obsidian-style interactive graph and advanced context management capabilities.

Would love feedback from developers building with AI.

Website: https://tokencap.vansharora.app/

Agentic AI Governance: Designing for Accountability and Control

Many organizations are already deploying agentic workflows. Some are still experimental, while others are running in production.

Once an AI agent can take action on behalf of a business, the question is no longer whether it’s useful, but what happens when something goes wrong.

It’s tempting to focus on blame: the AI vendor, the manager, the engineer, or the employee whose data informed the model. But you can’t wait until after a failure to start governing. Accountability needs to be designed into the system from the start through permissions, boundaries, monitoring, and traceability.

Enterprises are not only buying AI capability. They are buying trust and operational control. 

Think about the chain of command

Agentic systems need a defined place within an organization’s operating model. When an AI agent approves a purchase order or updates a customer record, it acts on behalf of a specific person or function, such as marketing or IT.

That ownership matters. Someone needs authority over the outcome: approving the business logic, monitoring behavior, and intervening when the system drifts. Governance does not mean watching every API call. It means clear accountability. Without it, responsibility disappears across the org chart.

Consider your boundary conditions

The flexibility of cloud LLMs makes it tempting to grant broad permissions upfront. In practice, that is where risk begins. A key governance question is not “Who is at fault if something leaks?”, but “Should this agent ever have been allowed to access this system at all?” Over-permissioning creates unnecessary exposure.

Governance at scale requires a consistent approach to guardrails, access management, and control across agents and workflows, one that scales as the number of agents, teams, and systems grows. JetBrains Central was built to address this: bringing governance into the development infrastructure itself, rather than treating it as something bolted on after AI workflows are already in production.

Treat agents like new hires. Don’t let an AI agent improvise on the refund policy or access HR systems without authorization. Instead, grant autonomy in increments. Make the agent adhere to narrow scopes and hard “never” rules until you’re sure it can handle more responsibility.

Build an audit trail that works

Traditional applications follow deterministic code paths. When something breaks, logs tell the story. LLM-based agents don’t behave that way. The same input can produce different outputs depending on context, the model, the system state, and even timing, making traceability essential.

A meaningful audit trail should capture: who initiated the action, the intent or workflow that triggered it, which systems and data were touched, what the agent returned or changed, whether policy was violated, the duration and the cost.

This is where tooling matters. At JetBrains, we treat this as a concrete product problem. An AI audit dashboard should enable inspection of behavior at the level of individual actions and workflows, without guesswork.

Keep a human in the strategic loop

For example, an agent that auto-approves invoices over $10k should surface each approval with a risk signal, the policy rule it matched, and a reviewer link, not just a timestamp in a log file. Human review matters, but some approaches are better than others. Blanket approval isn’t the way to go, nor is requiring manual sign-off for every action.

The solution is to design workflows with intentional checkpoints and risk scoring. Let the agent handle routine work autonomously, but flag high-impact actions for human review.

Organizations can gradually expand an agent’s autonomy, but only when there is clear evidence that controls are effective and the system continues to operate within policy. Thresholds should be driven by evidence, not instinct. This keeps humans involved where judgment matters, while allowing the system to scale.

Reduce blast radius and define responsibility

Two additional aspects are becoming central to enterprise trust:

  • Isolation: Agents should operate within constrained environments: scoped credentials, limited blast radius, and rollback capability. If something goes wrong, the damage should be contained. This is classic fault isolation applied to autonomous systems, and it matters more, not less, when the actor is non-deterministic.
  • Indemnification: The other question enterprises consistently raise is accountability when things break, especially around IP. A trusted vendor doesn’t just offer tools; it offers contractual and technical assurances that liability is scoped and risks are managed.

Governance is a product decision

Governance is not a bolt-on. It belongs in the architecture, the workflows, and the relationships a product creates. Organizations that treat governance as a core feature will move faster, resolve issues more cleanly, operate with clearer boundaries, and have the confidence to let AI agents do useful work without constant supervision.

Designing for accountability means that when something goes wrong, and eventually, something will, you already know who’s responsible, what the agent did, and how to fix it. That’s what makes agentic AI viable in the enterprise. And that’s where the real work begins.

We’re working with a select group of organizations to explore these challenges in practice. Become a JetBrains Central Design Partner here.

Gemini 3.5 Flash vs Claude Haiku 4.5 vs MAI-Code-1-Flash for Coding

TL;DR

Three flash-tier coding models are competing for your API budget right now: Google’s Gemini 3.5 Flash (May 19, 2026), Anthropic’s Claude Haiku 4.5 (the reigning budget pick since October 2025), and Microsoft’s MAI-Code-1-Flash (June 2, 2026). Haiku wins on output cost at $5/M tokens and structured output reliability. Gemini 3.5 Flash leads on agentic benchmarks (76.2% Terminal-Bench 2.1) and offers a 1M-token context window. MAI-Code-1-Flash beats both on SWE-Bench Pro by 16 points at 51.2%, but you can only use it inside GitHub Copilot. Pick based on where you actually build: Copilot users get MAI-Code-1 for free, API builders choose between Haiku’s cost and Flash’s context, and anyone running agent loops with tool calls should benchmark Flash first.

Three Flash Models, Three Different Bets

I’ve spent the last three weeks routing coding tasks through all three of these models: code reviews in Copilot, agent loops via API, and batch refactors across a 40-file Python project. The experience taught me something the benchmark tables don’t show: each model was built with a different definition of “coding” in mind.

Google optimized Gemini 3.5 Flash for agents that run in terminals, call tools, and iterate. Anthropic built Haiku 4.5 for developers who need a cheap, fast model that follows instructions precisely and returns clean JSON. Microsoft trained MAI-Code-1-Flash end-to-end inside the GitHub Copilot harness, so it knows how VS Code works, what diffs look like, and how to stay concise in inline completions.

Each model answers a different version of the same question: “What should a small coding model be good at?”

Benchmark Comparison

Benchmarks don’t capture everything, but they’re measurable. Start here.

Benchmark Gemini 3.5 Flash Claude Haiku 4.5 MAI-Code-1-Flash
SWE-Bench Verified 73.3% (Anthropic) / 66.6% (Microsoft’s eval) 71.6%
SWE-Bench Pro 55.1% 35.2% 51.2%
Terminal-Bench 2.1 76.2% 41.6%* 54.8%*
MCP Atlas (tool use) 83.6%
IF-Bench (instruction following) +28.9 pts over Haiku

*Terminal-Bench numbers for Haiku and MAI-Code-1-Flash are from Microsoft’s evaluation (Terminal-Bench 2, not 2.1). Direct comparison to Flash’s 76.2% on v2.1 should be taken with a grain of salt.

A few things jump out from this table.

The SWE-Bench Verified discrepancy for Haiku is real and worth flagging. Anthropic reports 73.3%, Microsoft reports 66.6% when benchmarking against MAI-Code-1-Flash. The difference probably comes down to evaluation setup: system prompts, tool availability, and retry policies all shift SWE-Bench scores. I wouldn’t treat either number as gospel. The relative ranking across SWE-Bench Pro, where the gap is enormous (51.2% vs 35.2%), is more informative.

Gemini 3.5 Flash dominates the agentic benchmarks. Terminal-Bench 2.1 simulates a real engineer working in a sandboxed terminal with a 5-hour timeout — planning, iterating, and coordinating across tools. Flash’s 76.2% puts it above Gemini 3.1 Pro and close to GPT-5.5 territory. If your coding model runs inside an agent loop with tool calls, this number is the one that predicts real-world behavior.

MAI-Code-1-Flash’s instruction following is the other number worth reading. The +28.9 point lead over Haiku on IF-Bench shows Microsoft’s harness-native training paid off. The model knows how to handle structured requests (“edit only lines 14-22”, “don’t touch the imports”, “return a unified diff”) because it learned from Copilot’s actual production request patterns.

Pricing: What You’ll Actually Pay

Flash models live or die on cost. If price didn’t matter, you’d use Claude Opus 4.7 or GPT-5.5. Per-million-token pricing:

Gemini 3.5 Flash Claude Haiku 4.5 MAI-Code-1-Flash
Input (per 1M tokens) $1.50 $1.00 $0.75
Output (per 1M tokens) $9.00 $5.00 $4.50
Cached input (per 1M) $0.15 $0.10 $0.075
Context window 1,000,000 200,000 Not disclosed
Output limit 65,536 64,000 Not disclosed
Availability API, Google AI Studio API, Anthropic Console GitHub Copilot only

The output price gap is the one that bites you. Code generation is output-heavy. A typical agent loop generating a 200-line file produces 8-12K output tokens per turn. At those volumes:

  • Haiku: $0.05-0.06 per turn
  • MAI-Code-1-Flash: $0.036-0.054 per turn
  • Gemini 3.5 Flash: $0.072-0.108 per turn

Across a full day of heavy coding (say 200 agent turns), that’s $10 for Haiku, $8 for MAI-Code-1, and $18 for Gemini Flash. The gap compounds fast.

But MAI-Code-1-Flash has a catch: those prices are from GitHub’s model picker listing. You can’t hit the model through a standalone API endpoint. It only runs inside Copilot. If you’re building your own agent framework, your choices are Haiku or Flash.

And Flash has its own cost lever: cached input at $0.15/M. If your agent loop sends the same system prompt and codebase context on every turn (most do), you’re paying 90% less for input after the first call. That cached-input discount often offsets the higher output price for long-running agent sessions.

Token Efficiency: MAI-Code-1-Flash’s 60% Claim

Microsoft claims MAI-Code-1-Flash “solves harder problems with up to 60% fewer tokens” on SWE-Bench Verified. That’s a big number. The model costs less per token AND uses fewer tokens to reach the same solution.

I tested this informally on my own codebase. I asked all three models to add input validation to a FastAPI endpoint. Same prompt, same context, same expected output.

# Prompt: Add Pydantic validation to this endpoint.
# Validate: name (str, 2-50 chars), email (valid format), age (18-120)

@app.post("/users")
async def create_user(request: Request):
    data = await request.json()
    # ... existing logic

The results:

  • Haiku 4.5: 847 output tokens. Clean solution, used EmailStr from Pydantic, added a proper error handler. Correct on first try.
  • Gemini 3.5 Flash: 1,241 output tokens. Added validation plus a lengthy explanation of each field constraint, a usage example, and a curl command. The code was correct but I didn’t ask for the tutorial.
  • MAI-Code-1-Flash (via Copilot): 512 output tokens. Returned only the modified function with a minimal Pydantic model. No explanation, no example. Correct and concise.

This single test isn’t a benchmark. But it matches the pattern Microsoft describes: MAI-Code-1-Flash learned from Copilot interactions where conciseness is the default. It doesn’t explain unless you ask.

Flash’s verbosity isn’t always a downside. If you’re prototyping and want the model to think aloud, that extra context helps. But for batch operations and agent loops where you’re parsing structured output, fewer tokens means faster iteration and lower cost.

Context Windows: The 1M Advantage

This is where Gemini 3.5 Flash separates itself from the other two.

Model Context Window Output Limit
Gemini 3.5 Flash 1,000,000 tokens 65,536 tokens
Claude Haiku 4.5 200,000 tokens 64,000 tokens
MAI-Code-1-Flash Not disclosed Not disclosed

A million-token context window means you can feed Flash an entire mid-sized codebase (50-80 files of typical Python or TypeScript) in a single prompt. Haiku’s 200K is generous by historical standards but won’t hold the same volume. If you’re doing codebase-wide analysis, architecture reviews, or cross-file refacotrs, Flash is the only flash-tier option that won’t force you to chunk.

Both Haiku and Flash now support large output limits (64K and 65K respectively), so you won’t hit output ceilings on most tasks. I’ve pushed both models through full-file rewrites of 300-line modules without truncation. The context input limit is the real differentiator: Flash’s 1M lets you include far more codebase context per request.

For Copilot workflows where MAI-Code-1-Flash operates, the context window is less of an issue. Copilot manages the context for you, feeding relevant files and recent edits. You don’t directly control the prompt size.

Where Each Model Wins

After three weeks of testing, I’d route tasks like this:

Gemini 3.5 Flash — agent loops and long-context analysis

Flash is the model I’d pick for any workflow that involves iterating with tools. Write a failing test, run it, read the error, fix the code, run again. Flash handles that loop better than the other two. Its Terminal-Bench scores reflect a model that was built for multi-turn tool coordination, not just static code generation. The 1M context window makes it the default choice for “analyze this whole codebase” tasks.

Claude Haiku 4.5 — structured output, code review, and high-volume batch work

Haiku returns the cleanest structured output of the three. If you’re calling the model 10,000 times a day for code review comments, PR summaries, or JSON-formatted analysis, Haiku’s combination of reliable instruction following and the cheapest output tokens makes it the rational choice. It’s also the model I trust most for diff generation and structured editing tasks.

MAI-Code-1-Flash — inline completions and Copilot-native workflows

If you live in VS Code and use GitHub Copilot, MAI-Code-1-Flash is the model that feels most native. It knows the environment: when to suggest a single line vs. a full function, it handles diffs cleanly, and it stays concise. The 60% token efficiency claim holds up in practice for the type of tasks Copilot handles — inline edits, small refactors, and completion suggestions.

Availability and Integration

This is the practical differentiator most comparisons skip. It doesn’t matter how good a model is if you can’t access it from your stack.

Gemini 3.5 Flash Claude Haiku 4.5 MAI-Code-1-Flash
Standalone API Yes (Gemini API) Yes (Anthropic API) No
Google AI Studio Yes No No
AWS Bedrock No Yes No
GitHub Copilot No No Yes
VS Code (direct) Via extension/API Via extension/API Built-in via Copilot
OpenRouter Yes Yes No
Self-hosting No No No

MAI-Code-1-Flash’s Copilot lock-in is the biggest caveat in this comparison. Microsoft has signaled plans for Azure Foundry and third-party provider access, but as of early June 2026 the model is still rolling out primarily through Copilot. If you’re building custom agents, pipelines, or CI/CD integrations, MAI-Code-1-Flash isn’t an option today.

For API access, both Flash and Haiku work through OpenRouter too, so you can swap between them without changing your client code. If you’re also evaluating open-source alternatives, DeepSeek V4 Pro punches above its weight at a fraction of the cost.

API Quick Start

Calling each model from Python:

Gemini 3.5 Flash:

from google import genai

client = genai.Client(api_key="YOUR_KEY")

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Refactor this function to use list comprehension instead of a for loop:nndef filter_active(users):n    result = []n    for u in users:n        if u.is_active:n            result.append(u.name)n    return result"
)
print(response.text)

Output:

def filter_active(users):
    return [u.name for u in users if u.is_active]

Claude Haiku 4.5:

import anthropic

client = anthropic.Anthropic(api_key="YOUR_KEY")

message = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Refactor this function to use list comprehension instead of a for loop:nndef filter_active(users):n    result = []n    for u in users:n        if u.is_active:n            result.append(u.name)n    return result"
    }]
)
print(message.content[0].text)

Output:

def filter_active(users):
    return [u.name for u in users if u.is_active]

MAI-Code-1-Flash (via Copilot — no standalone API):

Open the file in VS Code with GitHub Copilot enabled, select the function, and run Copilot Chat with the prompt. If you’re working with Gemini’s broader toolchain, the Gemini CLI tutorial covers the terminal setup. MAI-Code-1-Flash activates through the model picker when available, or via the “auto” selector that routes to it for coding tasks.

FAQ

Which is better for coding, Gemini 3.5 Flash or Claude Haiku 4.5?

It depends on the task shape. Gemini 3.5 Flash outperforms on agentic coding — multi-step workflows with tool calls and terminal interaction (76.2% Terminal-Bench 2.1). Claude Haiku 4.5 leads on SWE-Bench Verified (73.3%) and costs 44% less on output tokens. For high-volume batch code tasks, Haiku’s price wins. For agent loops, Flash’s quality wins.

How much does Gemini 3.5 Flash cost compared to Claude Haiku 4.5?

Gemini 3.5 Flash charges $1.50/$9.00 per million input/output tokens. Claude Haiku 4.5 charges $1.00/$5.00. Flash is 50% more expensive on input and 80% more on output. But Flash’s cached input rate ($0.15/M) can offset the difference in long agent sessions where you repeat the same context.

Is MAI-Code-1-Flash better than Claude Haiku 4.5?

On Microsoft’s own benchmarks, yes — particularly SWE-Bench Pro (51.2% vs 35.2%) and instruction following (+28.9 points). But there’s a benchmark discrepancy: Microsoft reports Haiku’s SWE-Bench Verified score as 66.6%, while Anthropic reports 73.3%. And MAI-Code-1-Flash is only available inside GitHub Copilot, not via API. If you need a standalone API model, Haiku is your pick regardless of benchmark numbers.

Which flash model is cheapest for coding?

MAI-Code-1-Flash has the lowest per-token cost ($0.75/$4.50 per million) AND uses up to 60% fewer tokens per task. But it’s locked to GitHub Copilot. For API users, Claude Haiku 4.5 at $1.00/$5.00 is the cheapest option. Gemini 3.5 Flash is the most expensive at $1.50/$9.00, though its prompt caching drops repeated-context costs to $0.15/M input.

Can I use MAI-Code-1-Flash outside of GitHub Copilot?

Not currently. MAI-Code-1-Flash is rolling out exclusively through GitHub Copilot’s model picker in VS Code. Microsoft hasn’t announced an Azure AI endpoint or standalone API. If you need API access for custom agents or CI/CD, you’re limited to Gemini 3.5 Flash and Claude Haiku 4.5.

Sources

  • Gemini 3.5 Flash Model Card — Google DeepMind — official specs, benchmark numbers, and pricing
  • Introducing MAI-Code-1-Flash — Microsoft AI — official announcement with SWE-Bench and IF-Bench comparisons
  • Introducing Claude Haiku 4.5 — Anthropic — official announcement with SWE-Bench Verified score
  • Microsoft AI on X: benchmark comparison tweet — the SWE-Bench Verified 71.6 vs 66.6 numbers
  • Gemini 3.5 Flash vs Claude Haiku 4.5: Pricing & Production Fit — Evolink — independent pricing and performance comparison

Bottom Line

The flash-tier coding model race in mid-2026 isn’t about finding one winner. It’s about matching models to workflows.

If you build custom agents and need a model that handles tool calls and long context, Gemini 3.5 Flash is the leader. If you need the cheapest reliable model for structured output at scale, Claude Haiku 4.5 is the safe bet. And if you code in VS Code with Copilot all day, MAI-Code-1-Flash is quietly the best inline coding model available — you just can’t take it anywhere else.

The lock-in question matters more than the benchmarks. Google and Anthropic sell tokens; Microsoft sells a workflow. Right now, MAI-Code-1-Flash’s Copilot exclusivity makes it a non-starter for anyone building outside that stack. If Microsoft opens API access — and the GitHub Copilot AI credits system suggests they’re heading that direction — the pricing math changes for everyone.

Fewer False Alarms, Better Coding Flow in RustRover 2026.2

False positives interrupt your workflow. In RustRover 2026.1, we reduced them by up to 25% in real projects, so you’ll see fewer misleading warnings, more relevant suggestions, and smoother completion. Read on to learn what causes false positives and how we’ve been fixing them.

What are false positives?

The term false positive is used in many fields, including healthcare, finance, cybersecurity, and software development. A false positive occurs when a system incorrectly reports a problem that does not actually exist. In software development, this usually means that a tool reports an error, warning, or threat even though the code or system is functioning correctly.

What are false positives in RustRover?

In RustRover, a false positive happens when the IDE highlights something as an error even though the project compiles and runs successfully. For example, you may see red code in the editor while cargo build and cargo check report no issues. This can make it harder to trust IDE diagnostics and may interrupt your development flow.

false positives

Why does RustRover have false positives?

By default, RustRover highlights problems in two different ways:

  1. After editing code, RustRover runs cargo check (or optionally cargo clippy) to detect compiler errors and warnings.
  2. RustRover also has its own code analysis engine. This engine parses the code, performs name resolution and type inference, and provides editor features such as highlighting, completion, navigation, inspections, and quick-fixes.

Sometimes RustRover’s internal analysis engine behaves differently from the compiler. When its logic does not perfectly match the compiler’s behavior, false positives can appear.

Why does RustRover have its own code analysis engine?

If RustRover’s code analysis can produce false positives, why not rely entirely on cargo check? There are several important reasons.

IDE features require deep code understanding

RustRover needs code analysis for much more than error highlighting. Features such as code completion, Go to Definition, Find Usages, refactorings, quick-fixes, type hints, and macro expansion support all require a deep understanding of the code structure.

To provide these features, RustRover parses the source code into a syntax tree, resolves identifiers, and infers types. The Rust compiler can report errors and warnings, but it’s not designed to power interactive IDE features. That’s why IDEs such as RustRover and rust-analyzer require their own analysis engines. RustRover’s analysis engine is developed independently and is not based on rust-analyzer.

The IDE must work on broken code

The compiler often stops after encountering an error and may not continue analyzing unrelated parts of the project. IDE analysis must be more resilient. It needs to continue understanding the codebase even when some parts are incomplete or incorrect.

IDEs and compilers have different priorities

RustRover’s analysis needs to react instantly while you type. To achieve this, the IDE often analyzes only the affected parts of the project instead of rebuilding entire crates. This difference in priorities explains why IDE analysis and compiler behavior are sometimes not perfectly aligned.

How do we find false positives?

False positives are essentially bugs in the IDE’s analysis engine. We identify them in several ways:

User reports

Many false positives are reported by users in our issue tracker. This is extremely valuable because it helps us identify problems affecting real-world workflows. However, some reports can be difficult to reproduce, which makes debugging more challenging.

Anonymous usage statistics

RustRover can compare the output of our analysis engine with the output of cargo. This helps us measure how often false positives appear, although it does not provide enough information to reproduce the issues directly.

Running the Rust compiler’s test suite

We run the Rust compiler’s test suite against our own analysis engine. These tests are often small and isolated, making them useful for identifying specific problems. However, many compiler tests focus on edge cases that rarely appear in production code.

Testing open-source crates

One of the most effective approaches is running RustRover’s analysis on large collections of open-source crates and comparing the results with cargo. This allows us to:

  • Reproduce real issues.
  • Estimate the impact of each bug.
  • Test popular ecosystems.
  • Verify that fixes do not introduce regressions.

Fixing false positives in RustRover

Reducing false positives is one of the most common requests we receive from users. After the release of RustRover 2025.2, we assembled a dedicated task force focused specifically on identifying and fixing false positives. We concentrated on two main sources: user-reported issues and large-scale open-source crate analysis.

Try RustRover

Inspired by the Rust compiler’s crater project, we built an internal system that runs RustRover’s analysis against thousands of open-source crates and reports mismatches between IDE diagnostics and compiler output. This system helps us identify high-impact problems much faster and continuously improve the accuracy of RustRover’s code insight.

Our goal is simple – make IDE diagnostics more trustworthy so developers can stay focused on writing Rust code. 

How you can help

While we’ve made good progress, there is still a long way to go, and our work on reducing false positives will continue in future releases. If you notice any false positives, please report them in our issue tracker so we can continue improving RustRover’s code insight.

JetBrains Plugin Developer Conf 2026 – Call for Speakers

We’re excited to be preparing the third edition of Plugin Developer Conf, coming this November 2026 — a free community event focused on developing and managing plugins for JetBrains products.

Over the past two years, Plugin Developer Conf has brought together experts and practitioners from across the ecosystem to share their experience and insights. From plugin testing and localization to handling user feedback and scaling projects, each session has offered practical takeaways for developers at every stage.

Last year, we welcomed two thousand developers who joined us live, asked questions, and shared their own perspectives — making the event not just a conference, but a truly interactive community experience.

What to expect in 2026

In 2026, we’re continuing this tradition by bringing the community together once again to explore the evolving plugin development landscape — sharing knowledge, real-world stories, and lessons learned to help plugin developers build better tools.

📅 November 10, 2026
📍 Online

We’re currently shaping the agenda and looking for speakers to join us!

Whether you want to talk about building and maintaining your plugin, overcoming challenges, or sharing lessons learned and best practices — we’d love to hear from you. Your experience could help other plugin authors on their journey.

Submit a talk

We’ll highlight you as a speaker and any resources you may want to share, such as your blog, open-source projects, or online courses. All accepted speakers will also receive a complimentary 1-year personal subscription to the JetBrains All Products Pack.

The Call for Speakers will remain open until July 20, 2026.

JetBrains Plugin Developer Conf at a glance

  • Talks are in English.
  • Talks are presented live or pre-recoded
  • Talks are 30 or 45 minutes, followed by an optional 5–10 minute Q&A session.
  • Talks are scheduled during business hours in your time zone.
  • Talks are presented live, recorded and published on YouTube, and shared in our newsletters, blogs, and other channels.
  • You can present a slideshow, do a live demo, etc. – whatever you think works best for your content!
  • If selected, you’ll get a one-year personal All Products Pack subscription as a gift from us.
  • If you’re selected, we’ll be happy to help you with your preparation by facilitating dry-runs, giving feedback on your talk, and helping you smooth out any demos.
  • Please make sure you read and adhere to the Code of Conduct prior to submitting.
  • We look forward to receiving your talk submissions!

Call closes on July 20, 2026

What type of talks are we looking for?

While you’re welcome to submit any talk that you may find interesting (as long as it’s related to plugin development and JetBrains Marketplace, of course), some areas we’d love to hear about are:

  • Finding a plugin idea
  • Developing a plugin
  • Navigating the approval process with the Marketplace team
  • Marketing and monetizing a plugin
  • Managing user feedback and support
  • Maintaining plugin compatibility and handling technical issues
  • Lessons learned and challenges you’ve overcome
  • Best practices for plugin developers
  • Anything else the community can learn from

To give you an idea, here are some talks from 2025:

  • From Template to Marketplace: Creating Your First Plugin
  • AI-powered Test Generation 
  • How to Investigate UI Freezes
Submit a talk

Not sure if your talk idea will fit? Reach out to elena.kerpeleva@jetbrains.com and we can discuss!

We look forward to your talk submissions!

Why Object-Oriented Programming Was Introduced – Objects and Classes

In the previous article, we examined the importance of software design, and how all software engineering principles have been defined to address issues rather than creating more complexity.

In this article, we will begin with one of the most significant programming paradigms known as Object Oriented Programming (OOP).

But before moving into definitions, let us examine the issue first.

🤯 The Problem

Imagine you are building a user system.

Without OOP:

const user1Name = "Ashay";
const user1Email = "ashay@gmail.com";

function loginUser1() {}
function logoutUser1() {}

Now imagine:

  • 10 users
  • 100 users
  • admins
  • customers
  • sellers

Everything becomes:

  • duplicated
  • disconnected
  • hard to manage

You have:

  • data scattered everywhere
  • behavior scattered everywhere

This is the core problem OOP tries to solve.

🧠 The Core Idea of OOP

OOP says:

“A real-world entity should keep its own data and behavior together.”

Example:

A User should contain:

  • its own properties (name, email)
  • its own capabilities (login, logout)

That combination becomes an object.

✨ What is an Object REALLY?

An object is:

A self-contained unit of state + behavior.

State = data
Behavior = actions/functions

Example:

const user = {
  name: "Ashay",
  email: "ashay@gmail.com",

  login() {
    console.log(`${this.name} logged in`);
  }
};

Here:

Part Meaning
name/email state
login() behavior
combined together object

This is the true meaning of an object.

Then Why Do We Need Classes?

Now imagine creating 10,000 users.

You DON’T want:

const user1 = { ... }
const user2 = { ... }
const user3 = { ... }

You need a reusable structure.
That reusable structure is a class.

What is a Class REALLY?

A class is:

A factory/template that defines what an object should contain.

Example:

class User {
  name: string;
  email: string;

  constructor(name: string, email: string) {
    this.name = name;
    this.email = email;
  }

  login() {
    console.log(`${this.name} logged in`);
  }
}

Now you can create objects easily:

const user1 = new User("Ashay","a@gmail.com");
const user2 = new User("Rahul","r@gmail.com");

Deep Understanding of new

This is VERY important.

When you do:

const user1 = new User(...)

new does several things internally:

  1. Creates empty object {}
  2. Connects object to class prototype
  3. Binds this to new object
  4. Runs the constructor
  5. Returns object

💡 Important Mental Model

A class is NOT the actual thing.

It is only:

  • definition
  • structure
  • contract

The object is the real runtime entity.

Example:

Real World OOP
Building map Class
Actual house Object
Human DNA structure Class
Actual person Object

Common Beginner Mistake

People think:

class User {}

means “doing OOP”.

No.

Real OOP begins when you think:

  • What responsibility belongs here?
  • What data should this object own?
  • What should be hidden?
  • How should objects communicate?
  • Who controls what?

That leads to:

  • Encapsulation
  • Abstraction
  • Polymorphism
  • Dependency Injection
  • SOLID principles

which we’ll cover one by one.

⏭️ What’s Next?

Now that we understand what OOP is and how classes and objects help organize state and behavior, an interesting question naturally arises especially for JavaScript developers.

If JavaScript already allows us to create objects using plain object literals and factory functions, why do we need classes at all?

Are classes simply syntactic sugar, or do they solve a different set of problems?

Before we dive into the core principles of OOP such as Encapsulation, Abstraction, Inheritance, and Polymorphism, we’ll take a small detour to explore one of the most common debates in the JavaScript ecosystem:

Factory Functions vs Classes

We’ll compare both approaches, understand their trade-offs, and discuss when each one makes sense in real-world applications.

Because before learning how to design good objects, it’s worth understanding the different ways we can create them.

I Wrote 10 AI Stories in 10 Days. My Keyboard Started Smoking on Day 4.

Biggest thing I learned writing the AI, Ego & Regret series: I argue with myself way more than I thought.

Every post goes through the same loop:

10 PM: “This story’s fire. Gonna blow up tomorrow.”

1 AM: “Wait — did I make it clear that 450ms wasn’t just a random number?” → Scrolls back to check. Yes. OK. Move on.

Next morning: “What was I thinking? Scrap it. Rewrite from scratch.”

The cover images were the worst part. One article went through 6 different backgrounds before circling back to the first one. 45 minutes I’ll never get back.

Then there’s that one line: “It was right about yesterday — and yesterday wasn’t running anymore.” Rewrote it 11 times. My wife walked by and said, “I thought you were writing code, not poetry.”

Ben Halpern hit me with a 5-reaction combo while I was eating instant noodles. Almost choked.

Waking up at 3 AM. First instinct: check comments. Nothing. Go back to sleep. 5 minutes later: check again. Still nothing.

Writing code? I’m normal. Writing stories? I’m the guy verifying his own made-up RabbitMQ number at 1 AM.

Would I do it again? Yeah. Probably. But I’d get a better keyboard this time.

This coffee’s about to run out — and I’m not done typing yet. If these stories made you smile, chuckle, or roll your eyes, buy me a coffee and keep the keys smoking ☕🔥

Also — if you’ve got a story that’s been sitting in your head, something that made you laugh, cringe, or question every life decision that led to that moment — send it over. I’ll turn it into a story. Yours could be the next one.

No pressure. Just a keyboard that’s already warm.