[FabCon Atlanta 2026 Report] My Take on Fabric IQ Ontology

I attended FabCon Atlanta 2026.

I also created a few short videos that show the atmosphere of the venue, so feel free to check them out first.

FabCon Atlanta 2026 Day1-Day3 morning workshops 、KeyNone

FabCon Atlanta 2026 Day3 noon-Day5CoreNote session power hour

In this article, based on what I saw and heard at FabCon, I would like to focus especially on Ontology within Fabric IQ and share how I think we should understand it at this point in time.

Fabric IQ is described as a workload that organizes data in OneLake using business language, enabling analytics and AI agents to use that data with consistent meaning.

The Fabric IQ workload includes semantic models and Data Agents, and Ontology is one part of it.

I think many people may currently understand “Fabric IQ” as almost the same thing as “Ontology.”

That is not completely wrong. However, Fabric IQ is a broader term, so in this article I will mainly use the word “Ontology” to avoid confusion.

What is Fabric IQ (preview)?

The Atmosphere Around Fabric IQ at FabCon

At FabCon, I felt that everyone also highly interested in Fabric IQ.

At the same time, some of the questions were very basic, such as “What is IQ?”

In other words, my honest impression was that Fabric IQ is attracting a lot of attention, but even in the United States, understanding of it has not yet become widespread.

I also attended several IQ-related sessions. Based on the sessions I joined, I cannot say that I clearly saw exactly which real-world projects should use it and how.

Of course, there were Ontology demos, and there were discussions about how AI will be able to understand business meaning more easily and how the semantic layer will become more important. Officially, Ontology is also described as a way to represent a business in a machine-readable form through entities, properties, relationships, and rules.

However, to be honest, my current impression is that the concept itself is very attractive, but common implementation patterns are not yet widely understood.

My Conclusion First: I Would Still Take a Wait-and-See Approach for Production Use

My conclusion is that, at this point, I would still take a wait-and-see approach before placing Ontology at the center of a production environment.

The reason is simple.

First, it is still officially in preview.

Second, when it comes to improving the accuracy of Data Agents by giving them business context, I feel that many use cases can already be covered quite well by using semantic models.

A Common Misunderstanding

Ontology allows you to create entities as business objects and define relationships using natural language to represent business meaning.

On the other hand, based on the current specification, you cannot simply write natural-language descriptions for tables and columns inside Ontology in the same way you can with semantic model properties.

image.png

Of course, I am not saying that Ontology is unnecessary.

Rather, I believe Microsoft will continue to invest heavily in this area, and I personally have high expectations for it.

However, at least for now, I think the right stage is:

  • Development teams should try it in a test environment
  • Organizations should watch it as a future architecture option

On the other hand, I think it is still a little early to talk about adopting it broadly in production right away.

Semantic Models Will Continue to Be Important for AI

So, does that mean the semantic layer is still something for the future?

I do not think so.

Rather, even right now, building a well-designed semantic model is very effective. I also believe that even after Ontology becomes generally available in the future, the importance of semantic models will not disappear.

Officially, Ontology can be generated from semantic models. In other words, it feels more natural to see Ontology not as something that replaces semantic models, but as something that extends business meaning and relationships on top of semantic models as one of its foundations.

What Semantic Models Can Already Do Today

With the arrival of Data Agent, semantic models are no longer just models for BI.

You can specify a semantic model as a data source for a Data Agent, and through Data Agent customization, you can provide business metadata to AI.

For example:

  • Semantic model

    • Use the “Prep for AI” feature
    • Write the business meaning of tables and columns in properties such as table names, column names, table descriptions, and column descriptions
    • Predefine calculations and business logic with DAX
  • Data Agent

    • Clarify the role of the agent through instructions
    • Add descriptions for data sources so that the agent can choose the right source depending on the question
    • Use example query sets for expected questions
      • Note: this is not available for semantic models

For more details, I recommend starting with the following documentation.

Semantic model best practices for data agent

Best practices for configuring your data agent

Also, a Data Agent does not necessarily need to have only one data source.

When the data volume is large, or when you want to use example query sets, combining a semantic model with a lakehouse or warehouse can be a very realistic design.

For example:

  • Store large volumes of data in a lakehouse or warehouse
  • Organize the metrics and definitions you want AI to use in a semantic model

If you want to add business metadata to each table or column, my personal recommendation at this point is to write it in the semantic model properties.

Data Agent can refer to semantic model properties.

Related article:

Editing Semantic Model Metadata Properties from a Notebook with Semantic Link in Fabric

When Would Ontology Become Necessary?

At this point, you might think, “Then isn’t a semantic model enough?”

In fact, I think semantic models can cover a large part of many use cases.

That said, based on my current understanding, I feel that Ontology becomes especially useful in the following two scenarios.

In other words, if your use case does not fall into these two patterns, a semantic model may be enough for now.

1. When You Want to Query Across Multi-Layered Relationships Like a Graph

The first case is when you want to ask questions that go across multiple layers of relationships.

Semantic models can also express relationships. However, as the relationships become more complex, the thinking tends to become more JOIN-oriented.

Ontology, on the other hand, uses a graph-based approach, so it seems better suited to graph-like operations such as path exploration.

For example, imagine you have the following tables:

  • Customers
  • Orders
  • Products
  • Contracts
  • Support history
  • Responsible organizations
  • Related events

If you want to ask, “What is related to this customer?” across multiple business domains, Ontology seems like a more natural way to express that.

In other words, Ontology becomes meaningful when the relationships themselves are valuable, rather than when you only need simple aggregations or KPI questions.

2. When You Want to Treat Historical Data and Real-Time Data as One Business Entity

The second case is when you want to treat historical data and real-time data not as separate systems, but as the same business object.

Officially, Fabric IQ is described as a way to unify data in OneLake using business language and give consistent meaning to analytics and AI agents.

For example:

  • Recent order events stored in Eventhouse
  • Historical order data accumulated in Lakehouse

If you want to handle these together in the context of a single business entity such as “Order,” the idea of Ontology seems to be a very good fit.

This feels less like a simple BI model, or physical model, and more like a foundation that helps AI understand the meaning structure of the business, in other words, a logical model.

We Do Not Need to Rush Ontology. For Now, This Is a Preparation Phase

As I have written so far, I believe Ontology has great potential.

However, I personally do not think it is something that must be introduced as the highest priority right now.

Ontology can be seen as a mechanism for strengthening the business meaning layer afterward.

Therefore, rather than seeing it as a foundation that must be introduced from the beginning, it feels more natural to think of it as something that organizations can add after their data platform and semantic organization have reached a certain level of maturity.

In fact, even if you want to use Ontology, there will likely be many cases where the organization’s data itself is not yet ready.

For example:

  • Required tables do not exist
  • Key definitions and meanings differ across systems
  • Tables that should be related cannot be connected cleanly through relationships

In such a state, the problem exists before Ontology can even be built.

That is why I believe the most important thing right now is to prepare and organize the organization’s data so that it can take advantage of Ontology in the future.

Microsoft will likely continue to invest heavily in this area, and the concept of Ontology itself will become increasingly important.

In that sense, I think we should see the current phase not as “the time to rush Ontology into production,” but as a preparation period for creating the conditions where Ontology can be used effectively.

In addition, I also feel that building Ontology requires a surprisingly high level of skill.

It is not enough to have only data modeling knowledge.

You need both:

  • An understanding of the business meaning behind the organization’s operations and data
  • The data modeling knowledge required to turn that meaning into a structure

In other words, Ontology cannot be built only by the IT department.

At the same time, it also cannot be fully defined only by the business department.

Collaboration between IT and business will be important, and people who understand both sides to some extent will become increasingly valuable.

Bonus 1: Foundry IQ Already Feels More Practical

As a side note, based on my experience, Foundry IQ felt more practical at this point.

For example, use cases such as the following are relatively easy to imagine even now:

  • Using OneLake as a knowledge source
  • Using SharePoint as a knowledge source

Fabric Ontology still looks like something that may become very interesting in the future.

On the other hand, Foundry IQ already feels easier to connect to concrete use cases.

Of course, these two are not competitors. I believe they will become more connected over time.

Bonus 2: Data Agent Development Works Well with CI/CD and Should Use Git Integration

This is slightly separate from Ontology, but through FabCon, I was reminded again that Data Agent works very well with CI/CD.

Are you using Git integration in Fabric?

As mentioned earlier, when developing a Data Agent, you define items such as instructions, data source descriptions, and example query sets.

Among these, data source descriptions may not change very frequently.

However, I feel that instructions and example query sets are things that will continue to evolve once the agent starts being used.

For example, in actual operation, the following situations are likely to happen:

  • A user asks an unexpected question, and you want to add a query set for that pattern
  • You adjust the instruction prompt, but the accuracy becomes worse
  • You want to roll back to a previous version and check the behavior
  • You want to compare the previous version and the latest version while testing

In other words, a Data Agent is not something you configure once and forget.

It is something that should be continuously improved during operation.

That is why it works very well with Git integration, where you can manage change history, track differences, and roll back when necessary.

If you want to use Data Agent seriously in Fabric, I believe it is important not only to create the agent, but also to grow it with Git integration in mind.

Related articles:

Microsoft Fabric Git Integration × Azure DevOps: How to Release Fabric Items Across Different Tenants

How to Reflect Changes to Another Repository with Azure DevOps Pipeline: A Minimal Memo for Repo A → Repo B

Summary

Finally, here is my current understanding.

  • Expectations for Fabric IQ / Ontology are high
  • However, it is still in preview, so I would be cautious about using it in production at this stage
  • In many cases, the combination of semantic models and Data Agent is already quite effective
  • Ontology will become especially useful in scenarios such as:
    • Queries across multi-layered relationships
    • Use cases where accumulated data and real-time data need to be handled in one business context

I believe this is definitely an area where Microsoft will continue to invest.

Therefore, now is a good time to catch up on Ontology and prepare your organization’s data platform so that you can adopt it quickly when the right timing comes.

Thank you for reading this long article!

I Also Have a YouTube Channel!

https://www.youtube.com/@msfabricreijiotake

Introducing Cossmology: a Map of the Commercial OSS Universe

Chinstrap Community is proud to introduce Cossmology, a comprehensive, worldwide directory of over 1,000 commercial open source software (COSS) companies.

If you’re working on an OSS project around which you’ve built, or plan to build, a commercial offering, tell us about it by using our Submit feature.

We’ve also launched COSS Weekly, a newsletter that delivers all the latest COSS news, funding rounds, acquisitions, and other headlines to your inbox. No sales pitches, no ads, just all of the week’s most relevant news from the COSS universe (check out our COSS Weekly archive).

We’ve mirrored much of the Cossmology dataset on GitHub (repository, searchable index) so be sure to star us.

Feedback welcome!

Cossmology logo

Build a Secure API with Rails 8 – Part-1

Hi folks👋!

In this post I want to share something I wish I had when I started building APIs with Ruby on Rails: a practical guide that takes security seriously from the beginning.

When I built my first REST API, most tutorials I found were focused on getting something running quickly. They were great for learning the basics, but they usually skipped important topics like API versioning, authentication strategy, authorization, and security.

Even when using AI tools to generate a “secure API”, the result is often still insecure unless you already understand the threats you are trying to protect against. Security is not something you get automatically. You need to know what problems you are solving and why the protections matter.

I ended up reading API design books, OWASP documentation, and real-world breach reports before I finally felt like I understood what I was building, I’ve put all in practice. This post is the guide I wish I had back then.

In this series we are going to build a production-ready Rails 8 API with authentication, authorization, rate limiting, secure cookies, security headers, and other important protections. I also want to explain the reasoning behind each decision, not just copy-paste code without context.

Before writing any code, let’s first understand the main attack vectors we need to defend against.

The attack vectors we are defending against

1. XSS (Cross-Site Scripting)

🚨 Threat:
XSS happens when an attacker injects malicious JavaScript into content that later gets rendered in another user’s browser. In API-driven applications, one of the biggest risks is token theft. If JWTs are stored in localStorage, a malicious script can read and steal them immediately.

🛡️ Mitigation:
Avoid storing authentication tokens in localStorage or other browser-accessible storage. Instead, store them in secure HttpOnly cookies so JavaScript cannot access them. Cookies should also use the Secure and SameSite attributes. Any user-generated content rendered in the frontend should be properly escaped or sanitized.

2. SQL Injection

🚨 Threat:
SQL Injection happens when user input is inserted directly into a SQL query without proper sanitization. An attacker can manipulate the query to bypass authentication, read sensitive data, or modify the database.

🛡️ Mitigation:
Avoid interpolating user input directly into SQL queries. In Rails, prefer Active Record methods like where, find_by, and parameterized queries, which automatically sanitize input. If raw SQL is unavoidable, use bound parameters instead of string interpolation. You should also validate input, use strong parameters, and follow the principle of least privilege for database accounts.

3. CSRF (Cross-Site Request Forgery)

🚨 Threat:
CSRF happens when a malicious website tricks a logged-in user’s browser into sending authenticated requests to your application using automatically attached cookies.

This is especially important in Rails APIs using session cookies or JWTs stored in HttpOnly cookies. Even though JavaScript cannot read those cookies, the browser still sends them automatically with requests.

An attacker could potentially trigger actions like changing account settings, creating resources, or deleting data without the user realizing it.

🛡️ Mitigation:
Enable CSRF protection for any cookie-based authentication flow. In Rails, use protect_from_forgery and require valid CSRF tokens for state-changing requests like POST, PUT, PATCH, and DELETE.

Authentication cookies should also use:

  • HttpOnly

  • Secure

  • SameSite=Lax or SameSite=Strict

You should also validate Origin and Referer headers and keep CORS restricted to trusted frontend domains.

If the browser automatically sends authentication, CSRF protection still matters, even if the API itself is technically stateless.

4. Brute Force

🚨 Threat:
Brute force attacks happen when an attacker repeatedly tries large numbers of username and password combinations against your login endpoint.

This commonly targets login forms, password reset endpoints, and authentication APIs. Successful attacks can lead to account compromise, credential stuffing, and unnecessary server load.

🛡️ Mitigation:
Use rate limiting on authentication-related endpoints. In Rails, tools like Rack::Attack can throttle repeated requests by IP address, email, or both.

You should also:

  • temporarily lock accounts after repeated failures

  • require strong passwords

  • detect suspicious login activity

  • avoid revealing whether an account exists

  • consider CAPTCHA or step-up verification after suspicious behavior

5. User Enumeration

🚨 Threat:
User enumeration happens when an application reveals whether an account exists through different error messages.

For example:

  • “Email not found”

  • “Incorrect password”

An attacker can use these differences to discover valid accounts and later target them with brute force attacks, phishing, or credential stuffing.

🛡️ Mitigation:
Return consistent responses during login, password reset, and account recovery flows.

Instead of exposing whether the email exists, use generic responses such as:

  • “Invalid credentials”

  • “If an account exists, instructions have been sent”

You should also rate limit these endpoints and monitor repeated probing attempts.

6. IDOR (Insecure Direct Object Reference)

🚨 Threat:
IDOR happens when users can access resources they do not own by changing identifiers in URLs or request parameters.

For example:


User.find(params[:id])

If ownership checks are missing, changing /users/42 to /users/43 could expose another user’s data.

🛡️ Mitigation:
Always scope records through the authenticated user or an authorization policy.

Instead of:


Post.find(params[:id])

Prefer:


current_user.posts.find(params[:id])

Authorization libraries like Pundit or CanCanCan also help enforce access rules consistently across the application. I also avoid exposing raw database IDs directly to the frontend. Instead, I use Sqidsto generate less predictable public IDs, which helps reduce simple enumeration attacks.

7. Mass Assignment

🚨 Threat:
Mass assignment happens when the application accepts user input and blindly assigns it to model attributes.

An attacker could submit unexpected fields such as:


{

  "admin": true

}

If those fields are not filtered properly, the attacker may gain elevated privileges or modify protected data.

🛡️ Mitigation:
Use strong parameters in every controller.

In Rails, always whitelist allowed attributes using:


params.require(:user).permit(:email, :password)

Never pass raw params directly into create or update.

Sensitive fields like roles, permissions, ownership fields, or account status flags should never be user-assignable.

8. Excessive Data Exposure

🚨 Threat:
Excessive data exposure happens when an API returns more information than the client actually needs.

This often happens when entire Active Record objects are rendered directly into JSON responses.

Sensitive data such as password digests, internal IDs, permissions, API keys, or private metadata may accidentally leak through the API.

🛡️ Mitigation:
Only return the fields the client actually needs.

Instead of blindly rendering full objects:


render json: @user

Use serializers or custom JSON responses that explicitly define safe attributes.

Sensitive fields should never appear in API responses.

You should also regularly review serialized responses to make sure no internal data is leaking unintentionally.

9. MITM (Man-in-the-Middle)

🚨 Threat:
A Man-in-the-Middle attack happens when an attacker intercepts traffic between the client and server.

Without HTTPS, credentials, tokens, cookies, and other sensitive data can travel in plain text and be stolen or modified.

Attackers on the same network, malicious proxies, or compromised routers can hijack sessions or impersonate users.

🛡️ Mitigation:
Always enforce HTTPS.

In Rails, enable:


config.force_ssl = true

This redirects insecure requests and ensures cookies are only sent over encrypted connections.

Authentication cookies should also use the Secure and HttpOnly flags.

You should additionally enable HSTS headers and avoid loading insecure mixed-content resources.

10. Token Theft

🚨 Threat:
Token theft happens when an attacker gains access to a valid authentication token and uses it to impersonate a user.

Stolen JWTs can come from XSS attacks, insecure storage, leaked logs, browser extensions, compromised devices, or intercepted traffic.

If tokens remain valid for a long time, the attacker may keep access even after the user notices something is wrong.

🛡️ Mitigation:
Reduce token exposure and keep token lifetimes short.

Prefer storing tokens in secure HttpOnly cookies instead of localStorage.

Use:

  • short-lived access tokens

  • refresh token rotation

  • token revocation mechanisms

You should also avoid exposing tokens in logs or URLs and protect the application against XSS vulnerabilities.

11. Verbose Error Messages

🚨 Threat:
Verbose error messages expose internal application details to attackers.

Stack traces, database errors, framework versions, SQL queries, and file paths can all help attackers understand how the system works and make exploitation easier.

🛡️ Mitigation:
Production applications should return generic and safe error responses.

Instead of exposing internal exceptions, return messages such as:

  • Internal Server Error

  • Invalid request

Detailed errors should only be logged internally for debugging.

In Rails, make sure debug pages and detailed exceptions are disabled in production.

Final Thoughts

These are some of the most important security risks to think about when building APIs, and we will revisit them throughout this series as we implement each feature step by step.

In Part 2 we will start building the Rails 8 API from scratch and set up the project foundation correctly from the beginning, including authentication, secure configuration, and API structure.

Follow along if you want to get notified when the next part is published.

The week your AI coding tier got smaller

In 48 hours this week, two of the biggest AI coding platforms confirmed the same thing: your unlimited subscription was never sustainable for how you actually use it. The provider will be the one who decides when to cut you off.

Anthropic silently removed Claude Code from Pro on a “2% A/B test” (later reversed). Their Head of Growth justified it saying “usage has changed a lot and our current plans weren’t built for this.” GitHub paused new Copilot Pro signups and dropped Opus from Pro entirely.

One dev on HN said sending 3-4 messages to Opus 4.7 blew through their $20 plan limits and consumed $10 of extra usage.

Simon Willison framed the trust break: “Should I be taking a bet on Claude Code if I know that they might 5x the minimum price of the product?”

The structural takeaway for any team shipping AI features: the invoice is the governance boundary, not the plan page. The provider’s unit economics are now public. Every user is a small loss when they exceed the pricing assumption, and no vendor has found the pricing floor yet.

Teams that cannot meter their own spend per-customer, per-agent, per-task are now one pricing memo away from being unprofitable overnight.

The concrete fix:

  1. track your tokens (not the invoice’s)
  2. use per-customer attribution (so you know whose usage is killing you)
  3. implement hard budget caps at the agent level. Alerts don’t stop a runaway loop.

This is exactly what LLM Budget Guard is being built for.

Here is how a wrapper around the SDK produces per-customer token attribution without waiting for invoice day:

import { wrapOpenAI } from 'llmeter';
import OpenAI from 'openai';

const openai = wrapOpenAI(new OpenAI(), {
  projectId: 'prod-cluster',
  tenantId: 'cust_883'
});

// Cost is now tracked per customer automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Generate report' }]
});

Track your costs early. Check out LLMeter to get started with attribution.

Developer Ecosystem Survey 2026 – Take Part in One of the Largest Developer Studies

Since 2017, we’ve been checking in with developers around the world to better understand how the industry is evolving and where software development is headed next.

This year marks the tenth edition of the Developer Ecosystem Survey, and we’d love for you to take part.

When we launched the first survey, Kotlin was just emerging, and AI coding tools were still years away. Today, they are part of everyday development.

Every year, tens of thousands of developers share their experiences, helping create one of the most comprehensive pictures of the tools, technologies, and challenges shaping modern development. The survey insights are widely used across the developer community – from researchers and industry analysts to teams building developer tools.

Whether you’re building large-scale systems, mobile apps, games, or experimenting with side projects, your perspective matters.

Set aside about 30 minutes, grab a drink, get comfortable, and tell us about your experience as a developer.

TAKE THE SURVEY

Have your say and get a chance to win one of these prizes:

  • MacBook Pro 16″
  • USD 1,000 Amazon Gift Card or alternative
  • USD 150 JetBrains Merchandise Store voucher
  • One-year JetBrains All Products Pack subscription
  • A guaranteed 30% discount for an individual JetBrains license

The more developers who participate, the clearer the picture we can build of today’s software development ecosystem. When you’re done, you’ll receive a personal referral link to share with friends and colleagues. The participants who bring in the most responses via their referral link will receive an additional prize.

As always, we’ll publish the results in detailed infographics and reports, and we’ll release the anonymized raw data for anyone who wants to explore the findings further.

Thank you for helping us capture a snapshot of where development is headed in 2026 – and for being part of the global developer community that has supported this initiative for the past decade.

IntelliJ IDEA 2025.3.5 is Out!

We’ve just released IntelliJ IDEA 2025.3.5. This version includes performance improvements for Spring projects – specifically for users who haven’t yet updated to v2026.1:

  • Searches for declared Spring beans are no longer triggered during typing or completion, ensuring code completion works smoothly in Spring-based projects. [IDEA-378966]

You can update to this version from inside the IDE, using the Toolbox App, or using snaps if you are a Ubuntu user. You can also download it from our website.

For a comprehensive overview of the fixes, see the release notes. If you spot any issues, let us know via the issue tracker.

Happy developing!

Rethinking The Experience Of System Tools

This article is a sponsored by MacPaw

Your grandmother’s vacuum was a trusty but ugly workhorse hidden in a dark closet. Dyson turned that practical tool into an aspirational product, one you love leaving out even when guests come over. Dish soap was just dish soap until Method put it in a glass container, and it became an addition to, not a distraction from, the aesthetics of your kitchen. Physical product brands spent the last two decades transforming mundane, practical items like soap and vacuums into must-have experiences.

But utility software — especially maintenance tools, a type of system software designed to analyze, configure, optimize, and maintain a computer — hasn’t made that leap from something you open as a chore to an experience you choose with excitement. And that means those brands are missing an interesting design opportunity: these tools are well overdue for a more intelligent, more human, and less emotionally flat approach.

“The Most Underexplored Frontier In UX Is The Maintenance Layer.”

Utility software still feels like a chore. Using it has all the excitement of pulling out that dusty old vacuum from the back of the closet. These four common software design assumptions illustrate why the category hasn’t yet transcended its chore status.

  • Assuming the user already resents the task: they’re here because something is wrong, not because they chose to open this tool. Designing accordingly means assuming they want the software to be fast, clinical, invisible, and something to get out of the way, not get into. But a design built for resentment produces tools that deserve it. If you expect your users to want to get out of the product as fast as possible, they’ll feel it in the design.
  • Assuming function is enough and feelings are for consumer apps: emotion in interface design is decoration. The maintenance layer is infrastructure, and nobody decorates infrastructure. But nobody decorated dish soap either, until Method. They didn’t change the product, just the user’s relationship to the tool they use to accomplish a task.
  • Assuming your users are not your fans because nobody cares about maintenance tools: utility tools don’t build communities, and nobody posts about running a disk cleanup. But people care deeply about tools that respect their time and make complex things simple for them to use. The MacPaw team listens to our community and implements many of the features they ask for, because we know users can be fans too, and they should shape how our products work.
  • Assuming that designers shouldn’t waste pixels on personality: you need to hide complexity and show minimal UI. Utility software should look neutral, technical, and forgettable.

But when software hides the system, people lose trust in it.

Design always starts with function — function shapes form. But if that function can’t be made completely invisible and people still have to interact with it, it inevitably becomes part of their experience. In that case, people expect it not just to work, but to match their environment, influence their mood, and contribute to their overall experience.

A good example is a watch. Its core function is simple: show the time. But because a watch occupies physical space in a person’s world, you want more from it than just functionality. It needs to play an aesthetic role and complement the environment.

“The Maintenance Layer Is A Behavioral Problem, Not Just A UX One.”

The user experience in utility software matters more than the industry tends to admit. In utility software, experience is not something added on top of function. It emerges from how the function is structured, explained, and interacted with. If you think you can design the most functional app on the market without considering how users understand and experience the process, you’re missing an opportunity to build a relationship with that user.

Part of that ignored UX element is a behavioral problem: users don’t avoid utility software because using it is hard, but instead because it produces no positive emotional signal at any point. The problem is rarely complex. It’s the absence of meaningful interaction during the process of using the app.

Another issue is focusing solely on function. The aesthetic-usability effect shows us clearly that if something looks better, it feels better — ATM screens in a 1995 study were judged easier to use if the screen layout was more attractive. Even something as purely functional as an ATM screen display needs attention to how the function is structured, presented, and perceived.

And then there’s the memory problem. People remember the emotional peak and the ending of an experience, not the average. A completed process that ends with a clear “done” is remembered more positively than one that just fades out, even if the end task is completed successfully in both cases. System tools rarely intentionally design the ending of an interaction — they just stop running.

“Thoughtful System Design Can Transform Maintenance From A Technical Chore Into A Seamless User Experience.”

What does emotional design actually mean, then, in utility UX? Here are three principles the MacPaw team follows to design its products against the category norm.

Translating system complexity into human language

Maintenance tools deal with storage, task management, and background processes. Good design explains what’s happening, avoids system jargon, and communicates outcomes clearly.

Linear’s game-changing move that illustrates this principle was agreeing on straightforward units of work, like projects and teams, that any new user can immediately understand. That helps them spend less time ramping up and more time building.

Make the process clear and show progress

System tools run complex processes. Design should show progress, impact, and system change to create trust and control.

Vercel’s deployment infrastructure is an excellent example here. When you trigger a build, the browser tab favicon changes — a spinner while building, a green checkmark when done, a red X if it fails. It’s ruthlessly functional, not visual or warm, but it’s emotionally intelligent: it exists purely to reduce the low-level anxiety of waiting for a build to finish.

Design the moment of completion

Maintenance tasks often end quietly. But completion is the emotional payoff. Design should emphasize clarity of results, a sense of resolution, and visible improvement so users remember a positive and distinct ending.

Take the new CleanMyMac by MacPaw after its 2024 major update. Unlike the maintenance utility category norm, CleanMyMac uses visual language, including color, depth, motion, icons, and 3D illustrations, to shift the focus from diagnosing problems to showing progress: space cleared, threats removed, time saved. Instead of confronting the user with what’s wrong, the interface closes with a picture of a machine that’s already working better.

The task is the same, but the ending tells a different story, giving the user a picture of a machine that’s already working better.

“Even if you don’t care about emotional design as a principle, the change is coming anyway.”

The market is forcing this issue even for those who don’t find the argument I’ve made here compelling.

That’s partly generational — designers and users who grew up with Linear, Figma, and Notion have a completely different baseline for the tools they use. Good software is not a happy accident for them, but a given. That generation is now the primary audience for maintenance software, and so the old “it’s fine, it’s just a utility” excuse doesn’t work philosophically or commercially. Just like Dyson and Method changed how entire product categories approached design, the current state of utility software is shifting for good.

And digital fatigue is the current cultural state. The resurgence of vinyl records, film cameras, and dumbphones is not merely nostalgia, but a signal that the emotional relationship between people and their tools is changing.

The question has shifted from whether your utility software should feel better to use to whether it can afford not to.

# How to Run Qwen3.6-35B on Your Mac at 77 tok/s

Level: intermediate

Estimated time: 20-40 minutes (most of it is the model download)

Minimum requirements: Mac with Apple Silicon (M1/M2/M3/M4) and 48 GB of unified RAM

What are we setting up?

A local server compatible with the OpenAI API that runs the Qwen3.6-35B-A3B model (quantized to 4 bits) using MLX, Apple’s Machine Learning framework for Silicon. When you’re done, you’ll have an endpoint at http://127.0.0.1:7979 that you can point any OpenAI-compatible client to (OpenCode, Continue, Cursor, etc.).

Metric Measured value
Generation throughput ~77 tok/s
TTFT (time-to-first-token) ~0.25 s
Context window 65 536 – 131 072 tokens
RAM required ~20 GB model + ~12 GB KV cache

Prerequisites

Hardware

  • Mac with Apple Silicon chip (M1 Pro/Max/Ultra or M2/M3/M4 equivalents)
  • Minimum 48 GB of unified RAM (the quantized model takes ~20 GB; the KV cache needs up to 12 GB additional)

Software

# Check Python version (you need 3.11+)
python3 --version

# Check that you have git
git --version

If you don’t have Python 3.11, install it with Homebrew:

brew install python@3.11

Step 1 — Create the virtual environment

From the folder where you want to install everything:

mkdir mlx-server && cd mlx-server
python3.11 -m venv .venv
source .venv/bin/activate

Step 2 — Install dependencies

pip install --upgrade pip

# MLX and the OpenAI API-compatible server
pip install mlx-lm
pip install mlx-openai-server

Verify the installation:

mlx-openai-server --help

Step 3 — Download the model

The model is automatically downloaded from Hugging Face the first time you run it. It takes approximately 20 GB of disk space.

# Optional pre-download (recommended to track progress)
python3 -c "
from mlx_lm import load
model, tokenizer = load('mlx-community/Qwen3.6-35B-A3B-4bit')
print('Model downloaded successfully')
"

Note: You need a huggingface.co account and to accept the model’s terms if the repository requires it. For this model it is not required.

Step 4 — Start the server

Option A — Direct command (simpler)

mlx-openai-server launch 
  --model-path mlx-community/Qwen3.6-35B-A3B-4bit 
  --model-type lm 
  --host 127.0.0.1 
  --port 7979 
  --tool-call-parser qwen3_coder 
  --reasoning-parser qwen3_5 
  --enable-auto-tool-choice 
  --context-length 65536 
  --temperature 0.7 
  --top-p 0.8 
  --top-k 20 
  --min-p 0.0 
  --repetition-penalty 1.05 
  --max-bytes 12884901888 
  --prompt-cache-size 3 
  --log-level INFO

Option B — Startup script (recommended)

Save the following script as start-mlx-server.sh:

#!/usr/bin/env bash
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
VENV="$SCRIPT_DIR/.venv"

# Default profile: high_context
# Change with: MLX_PROFILE=baseline ./start-mlx-server.sh
PROFILE="${MLX_PROFILE:-high_context}"

MODEL_PATH="mlx-community/Qwen3.6-35B-A3B-4bit"
HOST="127.0.0.1"
PORT="7979"

TOOL_CALL_PARSER="qwen3_coder"
REASONING_PARSER="qwen3_5"

TEMPERATURE="0.7"
TOP_P="0.8"
TOP_K="20"
MIN_P="0.0"
REPETITION_PENALTY="1.05"
MAX_CACHE_BYTES="12884901888"  # 12 GB

DRAFT_MODEL="mlx-community/Qwen3.5-0.8B-MLX-4bit"
NUM_DRAFT_TOKENS="${MLX_NUM_DRAFT_TOKENS:-4}"

case "$PROFILE" in
    baseline)
        CONTEXT_LENGTH="65536"
        PROMPT_CACHE_SIZE="3"
        EXTRA_ARGS=""
        ;;
    high_context)
        CONTEXT_LENGTH="131072"
        PROMPT_CACHE_SIZE="5"
        EXTRA_ARGS=""
        ;;
    speculative)
        CONTEXT_LENGTH="65536"
        PROMPT_CACHE_SIZE="3"
        EXTRA_ARGS="--draft-model-path ${DRAFT_MODEL} --num-draft-tokens ${NUM_DRAFT_TOKENS}"
        ;;
    speculative_high)
        CONTEXT_LENGTH="131072"
        PROMPT_CACHE_SIZE="5"
        EXTRA_ARGS="--draft-model-path ${DRAFT_MODEL} --num-draft-tokens ${NUM_DRAFT_TOKENS}"
        ;;
    *)
        echo "Unknown PROFILE: $PROFILE"
        echo "Options: baseline, high_context, speculative, speculative_high"
        exit 1
        ;;
esac

exec "$VENV/bin/mlx-openai-server" launch 
    --model-path "$MODEL_PATH" 
    --model-type lm 
    --host "$HOST" 
    --port "$PORT" 
    --tool-call-parser "$TOOL_CALL_PARSER" 
    --reasoning-parser "$REASONING_PARSER" 
    --enable-auto-tool-choice 
    --context-length "$CONTEXT_LENGTH" 
    --temperature "$TEMPERATURE" 
    --top-p "$TOP_P" 
    --top-k "$TOP_K" 
    --min-p "$MIN_P" 
    --repetition-penalty "$REPETITION_PENALTY" 
    --max-bytes "$MAX_CACHE_BYTES" 
    --prompt-cache-size "$PROMPT_CACHE_SIZE" 
    --log-level INFO 
    $EXTRA_ARGS
chmod +x start-mlx-server.sh
./start-mlx-server.sh

Usage examples:

./start-mlx-server.sh                                      # high_context (default)
MLX_PROFILE=baseline ./start-mlx-server.sh                # maximum throughput
MLX_PROFILE=speculative ./start-mlx-server.sh             # speculative decoding
MLX_PROFILE=speculative MLX_NUM_DRAFT_TOKENS=6 ./start-mlx-server.sh

Step 5 — Verify it works

In another terminal, send a test request:

curl http://127.0.0.1:7979/v1/chat/completions 
  -H "Content-Type: application/json" 
  -d '{
    "model": "mlx-community/Qwen3.6-35B-A3B-4bit",
    "messages": [{"role": "user", "content": "Hello, what is 2+2?"}],
    "max_tokens": 100
  }'

You should see a JSON response with the choices[0].message.content field.

Stopping the server

pkill -f mlx-openai-server

Or if you have the stop-mlx-server.sh script:

#!/usr/bin/env bash
pkill -f mlx-openai-server && echo "Server stopped."

Connect with your favorite client

The server exposes a 100% OpenAI-compatible API. Just point the base_url to your local server.

OpenCode

Create or edit the opencode.json file in the root of your project:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "mlx-local": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "MLX Local (Qwen3.6-35B)",
      "options": {
        "baseURL": "http://127.0.0.1:7979/v1"
      },
      "models": {
        "mlx-community/Qwen3.6-35B-A3B-4bit": {
          "name": "Qwen3.6-35B-A3B-4bit (local MLX)",
          "limit": {
            "context": 65536,
            "output": 32768
          }
        }
      }
    }
  }
}

Continue / Cursor

Base URL: http://127.0.0.1:7979/v1
API Key:  any-value  (the server does not validate it)
Model:    mlx-community/Qwen3.6-35B-A3B-4bit

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:7979/v1",
    api_key="local"
)

response = client.chat.completions.create(
    model="mlx-community/Qwen3.6-35B-A3B-4bit",
    messages=[{"role": "user", "content": "Explain what a transformer is"}]
)
print(response.choices[0].message.content)

Configuration profiles

Profile Context Cache tok/s measured When to use
baseline 65 536 3 entries 77.4 Maximum throughput
high_context 131 072 5 entries 75.7 Long documents, extended contexts (default)

The performance difference between both profiles (~2%) is within the noise margin. Use high_context if you work with large files or very long conversations.

Key parameters explained

Parameter Value Why it matters
--max-bytes 12884901888 12 GB Critical. Without this limit the model’s KV cache (MoE architecture with ArraysCache) grows unchecked until it exhausts RAM on contexts >30k tokens
--prompt-cache-size 3 3 LRU entries Limits how many conversations the prefix cache keeps in memory
--context-length 65536 64k tokens Maximum context window per request
--temperature 0.7 Balance between creativity and coherence
--repetition-penalty 1.05 Reduces repetitions in long responses

Troubleshooting

The server disconnects after 30,000 tokens

This was a known bug with the Qwen3.6-35B-A3B model due to its hybrid MoE architecture. The fix is to make sure you pass --max-bytes 12884901888. With this parameter the server works correctly up to 60,000+ tokens (verified).

Architecture notes (for the curious)

Qwen3.6-35B-A3B is a hybrid MoE (Mixture of Experts) model. Instead of activating all parameters per token, it only activates a subset of “experts”, making it efficient for its size. The 4bit version quantizes the weights to 4 bits, reducing RAM usage from ~70 GB to ~20 GB with minimal quality loss.

MLX leverages Apple Silicon’s unified memory: the GPU and CPU share the same RAM pool, eliminating the transfer bottleneck that exists in systems with a dedicated GPU. That’s why a Mac with 48 GB can run a model that on a PC would require a GPU with 80 GB of VRAM.

References

  • MLX on GitHub
  • mlx-lm
  • Model on Hugging Face
  • mlx-openai-server

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

[Day 3] I Had a Local LLM Analyze a Year of My Credit Card Statements

Intro

Day 3: I’m going to hand a year of credit card statements over to a local LLM and see what it can do.

This is experiment #3.

What I’m using today: DGX Spark + Ollama + Qwen2.5 (comparing 7B vs 72B). Ollama is the de-facto local-LLM runtime, and Qwen2.5 is a multilingual model from Alibaba (China) that handles Japanese reasonably well, apparently.

Today’s setup

  • Data: 12 months of credit card statements from a single card.
  • Volume: 383 transactions, ¥2,761,555 in total spend.
  • Goal: get the AI to spot waste patterns and propose savings.
  • Comparison axes:

    • Model size: 7B (light) vs 72B (heavy)
    • Input format: raw CSV vs pandas-aggregated summary
    • 4 patterns total

Takeaway: “If you ask an AI to aggregate raw data, the numbers come out way off.” / “If you pre-aggregate with a spreadsheet tool first and then feed the AI, you get fast and accurate results.” A small but practical finding.

1. Get the CSVs onto the DGX

Log into the credit card company’s web statements page on myPC1 (my Windows laptop), download 12 months of CSVs, then push them to the DGX.

I deliberately skipped GitHub for the transfer this time — once you push something, it’s in the history forever, and credit card data shouldn’t be there even briefly. Instead, I used direct PC-to-PC transfer over SSH (one command, finishes in seconds; details in the collapsibles at the end). The .gitignore excludes private-data/ too, so accidental commits are ruled out.

2. Install Ollama

Ollama is the de-facto runtime for local LLMs. One command should be enough.

There was a small password hiccup during install (details below), but eventually it was up and running.

The DGX Spark specs really show through:

  • Memory: 121 GB
  • Default context window: ~262,144 tokens

In other words: “throw a whole book at it, no problem” territory. Reassuring.

3. Two model sizes: Qwen2.5 7B vs 72B

The strategy: same model family, different sizes. That way the differences come from size, not architecture.

  • 7B (light): ~4.7 GB, downloads in 5 minutes. Fast.
  • 72B (heavy): ~47 GB, 25 minutes to download. Slow but smart.

What does “B” mean? Short for Billion. It’s the number of “weights” inside the AI — more weights, more it remembers, basically. So 7B has 7 billion weights, 72B has 72 billion.

Loading both onto the DGX simultaneously, memory usage looks like:

AI model Memory occupied
qwen2.5:72b 61 GB
qwen2.5:7b 8.2 GB
Total 69 GB

69 GB. Spacious!

4. Prepping the CSVs

Once I had the CSVs in hand, three small headaches before they were ready for the AI:

  • Headache 1: An older encoding (Windows Japanese flavor) → needs converting to modern UTF-8
  • Headache 2: Some merchant names contain commas, which breaks naive CSV parsing
  • Headache 3: Each file has a “monthly total” line at the end that isn’t actually data

Details in the collapsible. After cleanup, the 12 files merge into a single dataset:

Item Value
Transactions 383
Period 12 months (1 year)
Total spend ¥2,761,555
Avg per tx ¥7,210
Median per tx ¥3,000
Largest single tx ¥209,283 (overseas flight)
Smallest ¥-3,980 (refund)

Now to feed this to 7B and 72B and see what each of them says.

5. Experiment 1: Throw the raw CSV at the AI

No tricks: all 383 rows, straight at the AI. Prompt is the full ask: “As a household budget consultant, output category breakdown / monthly trend / waste patterns / savings suggestions / lifestyle hypothesis.”

7B’s answer (75 seconds)

…this is where the numbers go wildly off.

Item What 7B said Real data Match?
Amazon total ¥2,014,386 (257 tx) ¥693,663 (166 tx)
Amazon Downloads ¥2,014,386 (257 tx) ¥80,323 (50 tx)
Outdoor brand ¥495,740 ¥154,820
A local recreation venue “¥49,574” cited (a different small charge actually exists)

None of the numbers line up. Amazon total is roughly 3× off, Amazon Downloads about 25× off, and the cited venue context is a different charge entirely.

Reading 383 rows of CSV and computing totals turned out to be a heavy lift for the 7B model.

72B’s answer (12m 9s)

What if we throw size at the problem? After 12 minutes of patience:

Item What 72B said Real data Match?
Amazon total ¥635,792 (104 tx) ¥693,663 (166 tx)
AI/dev tools ¥193,629 (21 tx) ¥176,850 (24 tx)
Travel ¥487,555 (43 tx) ¥416,268 (8 tx)

Not exact, but the off-by amounts are within ~10%, and there are no fabricated venues. A real improvement.

However — when asked about the monthly trend, here’s what 72B said:

Month 1: ¥316,789 → Month 2: ¥229,600 → Month 3: ¥237,500 → … → Month 12: ¥291,500
(Gradually increasing.)

The actual range is ¥69,961 (low) to ¥493,072 (high) — a chaotic up-and-down waveform. “Gradually increasing” isn’t quite right. Even 72B isn’t great at aggregating distributed data over a long CSV.

6. Experiment 2: Aggregate first, then feed the AI

If the AI struggles with aggregation, do the aggregation in a different tool first and only hand the AI the result.

The flow:

📥 Raw CSV (22,132 chars, 383 rows)
       ↓
🔧 Pre-aggregate with a spreadsheet tool (Python's pandas)
       ↓
📋 Aggregate summary (1,884 chars, ~90% smaller)
       ↓
🤖 Hand it to the AI (let it interpret and propose)

Python’s pandas = a spreadsheet-like library, but ~10,000× more powerful than Excel functions, used for tabular data analysis.

7B + pre-aggregated input (50 seconds)

Numbers are fully accurate now.

Item What 7B said Real data Match?
Amazon total ¥693,663 ¥693,663
AI/dev tools ¥176,850 ¥176,850
Monthly max ¥493,072 ¥493,072
Monthly min ¥69,961 ¥69,961

Quoting straight from the pre-aggregated numbers, the hallucinations vanished.

And 7B did this in 50 seconds — better quality than the 72B + raw CSV at 12 minutes. Quietly remarkable.

Before (raw CSV) After (aggregated)
Time 75s 50s
Numbers wildly off exact
Verdict not usable as-is quote directly

72B + pre-aggregated input (12m 13s)

72B’s numbers also match exactly (well, since they’re being quoted from pre-aggregated data, that’s expected). The proposal quality was the strongest of the four patterns:

Reduce Amazon dependency

  • Current: online shopping (Amazon family) is 25.1% of total (¥693,663).
  • Suggestion: stick to essentials only, regular review, avoid impulse buys.
  • Expected savings: ¥57,805/month average (25% reduction) → ¥693,660/year

…wait, hold on. Annual Amazon spend was ¥693,663. The “savings” 72B suggests is ¥693,660. That’s basically the same number. So the proposal is effectively “stop buying on Amazon entirely (100%)” — definitely not 25%. Apparently 72B’s percentage arithmetic isn’t bulletproof either.

That aside, the lifestyle hypothesis section was kind of striking. Here’s what 72B observed:

  • Heavy reliance on apps and subscriptions: “App/subscription” category is 10.5% of total
  • Frequent international travel: “Travel/airline” is 15.1%, with notable overseas charges
  • Frequent online shopping: “Online (Amazon)” is 25.1% of total

It’s just one card’s data, so this isn’t a complete picture — but if I fed an AI my full household financials, the analysis and advice would probably go a lot deeper.

Summary: 4 patterns

# Model Input Time Numerical accuracy Proposal quality
1 7B Raw CSV 75s ❌ Numbers way off
2 72B Raw CSV 12m 9s △ Misread monthly trend
3 7B Aggregated 50s ✅ Exact ○ Some repetition
4 72B Aggregated 12m 13s ✅ Exact ◎ Best (mind the % math)

Quietly notable: 72B takes ~12 minutes regardless of input size (shrinking the prompt didn’t change wall-clock time much). Output generation is the bottleneck. Which strengthens the case for “small model + pre-aggregate” as the cost-effective default.

7. Cross-check: the actual graphs

Before trusting any of the AI output, let me put the real numbers on charts using the spreadsheet tool (pandas).

Monthly spending

Monthly spending

Average ¥230,130/month, but the range is ¥69,961 (lowest) to ¥493,072 (highest) — about a 7× spread. The 72B’s “gradually increasing” claim was a bit off the mark; the reality is bouncy.

Category share

Categories

“Other” being 32% is because my categorization rule is sloppy. I just wrote a simple “if the merchant name contains keyword X, bucket Y” rule, and lots of merchants didn’t match any keyword and ended up in “Other.” Reading meaning from a merchant name is exactly the kind of thing AI is good at, so next time I’ll let the AI do the categorization itself.

Top 15 merchants

Top merchants

Amazon at ¥421,978 (105 tx) is far and away #1. Amazon really is too convenient…

Weekday rhythm

Weekday pattern

Tuesday alone is ¥692,549 — way above the rest. Probably because that’s when most of the subscription auto-charges land.

8. Today’s takeaways

Separate “aggregation” from “interpretation”

AI is bad at AI is good at
Multi-row sum/average (numbers go wildly off) Categorization (interpreting fuzzy meaning)
Percentage math (saw “25% off → 100% off”) Pattern recognition / hypothesis generation
Distributed aggregation like monthly totals Narrative interpretation, savings proposals

Aggregation is the spreadsheet tool’s job; interpretation is the AI’s. When you split the work, things go fast and accurate. “Data prep matters before analysis” — yeah, that old saying really is true. Note to self.

Sometimes input quality beats raw size

“7B + pre-aggregated input in 50 seconds” outperformed “72B + raw CSV in 12 minutes”. Sometimes you don’t need a bigger model — you need cleaner input. Felt that one today.

The local-LLM angle

Feeding 12 months of raw credit card data to an AI without a single byte going to the cloud — it was surprisingly stress-free. This is one of the spots local LLMs really shine. Got personal info, or anything cloud-uncomfortable? This is the place for them.

9. Tech details (Claude explains)

The technical bits, written up by my AI pair.

  1. SCP transfer to the DGX (mDNS, no IP needed)

NVIDIA Sync auto-configures a Host alias in ~/AppData/Local/NVIDIA Corporation/Sync/config/ssh_config:

Host spark-XXXX.local
  Hostname spark-XXXX.local
  User [user]
  Port 22
  IdentityFile "...\nvsync.key"

Which means I can SSH/SCP using spark-XXXX.local without ever looking up an IP. The .local suffix uses mDNS (Multicast DNS) for hostname resolution within the LAN.

Transfer command (one line, from PowerShell on the Windows side):

scp -r "C:Users[user]Desktopdocsdgxcsv" spark-XXXX.local:/home/[user]/personal/dgx-100-experiments/private-data/credit-card-csv
  1. Ollama install + the sudo-TTY catch + GPU detection log

Ollama install:

curl -fsSL https://ollama.com/install.sh | sh

Running this through Claude Code’s Bash, it errored at the sudo password prompt — an interactive TTY is required there:

sudo: a terminal is required to read the password

Reopened a separate SSH session, ran the same command manually, and it went through.

Once installed, systemd auto-starts the service. The GPU detection log via journalctl -u ollama:

inference compute id=GPU-986c194b... name=CUDA0 description="NVIDIA GB10"
total="121.7 GiB" available="79.0 GiB"
default_num_ctx=262144
  • VRAM (DGX Spark unified memory): 121.7 GiB
  • Default context: 262,144 tokens

Compared with a typical RTX 4090 (24 GB VRAM, 8K–32K default context), the gap is significant.

  1. Loading both models simultaneously
ollama pull qwen2.5:7b   # 4.7 GB
ollama pull qwen2.5:72b  # 47 GB

After loading both, ollama ps shows:

NAME           SIZE      PROCESSOR    CONTEXT    
qwen2.5:72b    61 GB     100% GPU     32768
qwen2.5:7b     8.2 GB    100% GPU     32768

Total ~69 GB used out of 79 GB available. Both models stay resident, switching between them is instant.

  1. Custom CSV parser for the credit card data

Three quirks needed handling: CP932 encoding, no quotes (commas in some merchant names break parsing), and a trailing summary row in each file.

def parse_line(line: str) -> list[str] | None:
    fields = line.rstrip("rn").split(",")
    if len(fields) < 7 or not fields[0]:
        return None  # skip blank/summary rows
    if len(fields) > 7:
        merchant = ",".join(fields[1:-5])
        fields = [fields[0], merchant] + fields[-5:]
    return fields


def load_one(path: Path) -> pd.DataFrame:
    rows = []
    with path.open(encoding="cp932") as f:
        next(f)  # skip header (cardholder metadata)
        for line in f:
            parsed = parse_line(line)
            if parsed is not None:
                rows.append(parsed)
    df = pd.DataFrame(rows, columns=COLUMNS)
    df["利用日"] = pd.to_datetime(df["利用日"], format="%Y/%m/%d")
    df["利用金額"] = df["利用金額"].astype(int)
    return df
  1. Japanese fonts in matplotlib

japanize-matplotlib doesn’t work on Python 3.12 — it imports distutils, which was removed from the standard library.

The modern replacement is matplotlib-fontja:

pip install matplotlib-fontja
import matplotlib_fontja  # noqa: F401  ← just importing it sets up IPAexGothic
  1. Calling Ollama from Python

The official ollama Python client is straightforward:

import ollama

client = ollama.Client()
stream = client.chat(
    model="qwen2.5:72b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt},
    ],
    options={"temperature": 0.3},
    stream=True,
)
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)

Streaming makes long generation easier to watch unfold.

Tomorrow: Day 4

Day 4 plan: let a local AI sort 20,000 iPhone photos.

The actual goal is to have a local image-recognition model (CLIP family?) clean up my photo library so I can stop paying iCloud for storage upgrades…!

100ExperimentsWithDGX #LocalLLM #Ollama

The Hidden TCO of Self-Hosting Your EC Revenue Dashboard in 2026

“If we self-host Matomo or Umami, the revenue dashboard is free, right?” That’s one of the most common questions I hear from SMB EC operators in Japan. The short answer: license-fee-free is not TCO-free. After laying four options side-by-side at Japanese freelance rates, self-hosted dashboards land at ¥460K-880K per year — 4-7x the cost of a focused SaaS like the one I’m building.

I’ve been building RevenueScope for the Japan SMB EC market, so I have a stake in this comparison. But the math here is structural, not promotional: when you account for build hours, ongoing ops, server costs, and learning curve, “free” OSS quietly becomes one of the most expensive choices on the table.

This post walks through the TCO breakdown, the three hidden cost layers most operators miss, and a 3-question framework that decides between self-hosted and SaaS in about 60 seconds.

TL;DR

  1. Self-hosting Matomo, Umami, or rolling your own GA4+Looker Studio dashboard runs ¥460K-880K per year at a ¥5,000/hour Japanese freelance rate (industry-average estimates, not measured ground truth).
  2. A focused SaaS option for SMB EC — RevenueScope Growth at ¥9,800/month (~¥117K/year) — sits 4-7x lower on TCO.
  3. The hidden cost is a three-layer stack: opportunity cost (40 build hours not spent on revenue work), learning curve (Matomo configs, GA4 event design, Looker DAX), and upgrade churn (OSS major versions, GA4 API breaks). License-fee-zero hides all three.

Why “TCO” matters more than license fees

The “OSS is free” intuition only counts software license cost. Real total cost of ownership for an EC operator pulls in at least four other line items:

  • Initial build hours — server setup, tracking install, dashboard build, first-pass QA
  • Monthly ops hours — data quality checks, tracking fixes, new metric requests, incident response
  • Server cost — VPS / cloud / storage
  • Learning curve — docs, Stack Overflow, internal wiki, knowledge transfer

Counted as labor at ¥5,000/hour (the Japanese freelance marketing/data-analyst median), self-hosted TCO climbs into the high hundreds of thousands of yen — quickly.

The second concept that operators tend to miss is opportunity cost. Forty hours spent building a Matomo dashboard is forty hours not spent on creative A/B tests, LP iterations, or email segmentation work. For a JPY-10M-monthly EC, those forty hours represent roughly 25% of a working month — directly tradeable against revenue work.

License-fee-zero and TCO-zero are different numbers. That’s the starting point for any honest comparison.

One-year TCO across four options

I lined up Matomo On-Premise, Umami v3, GA4 + Looker Studio, and RevenueScope Growth at industry-average estimates (not measured ground truth — your numbers will vary).

One-Year TCO Comparison — Matomo / Umami / GA4+Looker / RevenueScope

The annual numbers (rounded, ¥5,000/hr labor):

  • Matomo self-hosted — ~¥880K (40h build + 8h/mo ops + ¥3K/mo hosting + 16h learning)
  • Umami self-hosted — ~¥460K (20h build + 4h/mo ops + ¥2K/mo hosting + 8h learning)
  • GA4 + Looker Studio — ~¥500K (16h build + 6h/mo ops + 12h learning; product is free, your time isn’t)
  • RevenueScope Growth — ~¥117K (¥9,800/mo plan + ~0.5h/mo to actually look at the dashboard)

The “free” intuition collapses the moment you add 40 build hours and 6-8 ongoing ops hours per month at Japanese freelance rates. The license is the small line item; labor is everything else.

The three hidden cost layers

Beyond the headline numbers, three layers of hidden cost stack on top of self-hosting and account for most of the gap between OSS and SaaS economics.

Annual Operations Hours — 4-Option Comparison (the symbol of hidden labor)

Layer 1 — Opportunity cost. Forty hours building Matomo is forty hours not running creative A/B tests or shipping LP improvements. For JPY-10M-monthly EC, that’s roughly 25% of a working month redirected away from revenue work. The TCO row “build hours = ¥200K” is the direct cost; the indirect cost (campaigns not run, pages not improved) is often larger.

Layer 2 — Learning curve. Matomo’s admin surface is dense; custom report authoring is close to writing SQL by hand. GA4 demands real care around event design, custom dimensions, and the data layer. Looker Studio adds calculated-field syntax (DAX-adjacent) plus BigQuery SQL knowledge if you take the connector route. Each one has a real ramp before the dashboard becomes operational.

Layer 3 — Upgrade churn. OSS ships major versions; GA4 breaks API contracts; Looker Studio re-skins UIs. Matomo schema migrations, GA4 export schema changes that retroactively break your queries, Looker chart configs that need re-doing — these arrive a few times a year and don’t fit cleanly into the “monthly ops hours” budget. SaaS providers absorb this churn on your behalf.

Stack the three layers together and the gap between “Matomo at ¥880K” and “RevenueScope at ¥117K” stops looking like a margin choice. It looks like a structurally different cost model.

A 3-question decision framework

For SMB EC operators wondering which side they fall on, three binary questions resolve it in about 60 seconds.

Self-Build vs SaaS — Decision Flow

Q1 — Are you a JPY-10M-50M monthly Shopify / BASE / STORES / EC-CUBE operator? If yes, continue. If under JPY-10M, GA4 + Looker Studio with a hand-built dashboard is usually proportionate. If over JPY-1B, you’re in BI-tool territory (Tableau, Looker, Mode).

Q2 — Do you want engineering and ops hours pointed at revenue work, or at dashboard maintenance? If revenue work, lean toward SaaS. If dashboard work is part of how you want to spend the team, OSS makes sense — and it’s a legitimate choice when you have an in-house philosophy around tooling ownership.

Q3 — Are Revenue, AOV, RPS, CVR, plus Sessions enough? If yes — that’s RevenueScope’s deliberate scope cap (4 core metrics + Sessions = 5 KPI cards). If you need MMM, MTA, margin, LTV, inventory, or in-app ROAS computation, look at full-stack tools (Triple Whale category) instead.

Note: RevenueScope intentionally does not compute ad-spend ROAS in-app. Ad consoles (Meta, Google, TikTok) already surface ROAS natively; calculating it again in a separate tool just doubles the surface area to maintain. Delegating ROAS to the tool best positioned to compute it is a deliberate scope decision.

When self-hosting genuinely makes sense

To be clear about when OSS or DIY is the right answer:

  • Compliance-driven — when first-party customer-data residency on your own servers is a hard requirement (large enterprise, regulated industries)
  • Engineering-rich teams — when in-house engineers are already comfortable with Linux server ops and treat tooling as part of the platform
  • Bespoke metrics — when you need indicators no SaaS will model out of the box, and you want full control over the schema
  • Above JPY-1B/month — at large scale, SaaS per-event pricing can flip; self-hosting can become the cheaper option

For SMB EC with marketing teams of 1-3 and no dedicated engineer, none of these usually apply. That’s the population where the TCO gap matters most — and where a focused SaaS earns its keep by removing the three hidden cost layers entirely.

Closing

“OSS is free” is technically true and operationally misleading. License-fee-zero stops mattering once you count 40 build hours and 6-8 monthly ops hours at ¥5,000/hour. The real question for an SMB EC operator isn’t “free vs paid” — it’s “do I want my team’s hours pointed at revenue work or at dashboard maintenance?”

The full breakdown — per-option TCO math, suitability profiles, and references — is at Matomo / Umami / GA4+Looker Studio Self-Build vs RevenueScope: 1-Year TCO for EC Revenue Dashboards. For the prior post in this series (full-feature SaaS comparison, not self-build), see Triple Whale vs RevenueScope.