Why Polish Small Businesses Don’t Need Websites (And Why I’m Building Them Anyway)

I’ve spent the last month cold-prospecting nail salons and barbers in Częstochowa. 100+ businesses researched. Maybe 15 have proper websites. The rest? Booksy profiles and Instagram accounts. That’s it.

When I started building an AI system to find these businesses, I thought the lack of websites was laziness or budget constraints. It’s not. It’s a deliberate choice rooted in a very specific psychology.

Polish small business owners genuinely believe websites are unnecessary. And I’m starting to understand why.

The Booksy Fortress

Booksy owns the Polish beauty and wellness market. Not “has a presence.” Owns.

If you’re a nail salon, barber, or massage therapist in Poland, you’re on Booksy. It’s not optional. Your clients book through Booksy. They discover you through Booksy. Your calendar lives in Booksy. Your payments run through Booksy.

Why would you need a website when Booksy already handles discovery, booking, payments, and reviews? The platform does everything a website would do, except you don’t have to build it or maintain it.

From the business owner’s perspective, a website is redundant infrastructure. I’ve read this exact sentiment in my research notes at least 20 times. “Already on Booksy.”

The logic is sound. The conclusion is still wrong, but the logic is sound.

Instagram as the Second Pillar

The businesses that aren’t beauty/wellness based, restaurants and bars, live on Instagram and Facebook.

They post daily. Photos of dishes, interior shots, weekend specials. Stories with live updates. DMs for reservations. The engagement is real. People comment, tag friends, share posts.

For these owners, Instagram is their website. Why pay for something static when you can post for free and reach customers where they already spend their time?

Again, the logic holds. A restaurant doesn’t need online booking. They need people to show up. Instagram drives that better than a landing page buried on page 3 of Google.

The Trust Gap

There’s a third layer that took me longer to notice. Websites carry a credibility problem in Poland’s SMB market.

Older demographics, which dominate small business ownership here, associate websites with either big corporations or scams. A local barber with a sleek website feels suspicious. Too corporate. Not authentic.

Instagram feels personal. Booksy feels utilitarian. A website feels like someone is trying too hard or hiding something.

I didn’t expect this. In Western markets, no website is the red flag. In Poland’s local service economy, having one can raise questions. “Why do you need this? What are you selling me?”

Why They’re Wrong (But Also Right)

Here’s the thing. They’re not entirely wrong.

If you’re a nail salon with 200 regular clients who all book through Booksy, and your schedule is full most weeks, why burn money on a website? The return on investment is unclear. The effort to maintain it is real.

But here’s what they’re missing.

Booksy owns the customer relationship. Not them. If Booksy raises fees, they pay. If Booksy changes the algorithm, they adapt. If Booksy shuts down tomorrow, they lose their entire discovery channel overnight.

Instagram is even worse. You’re building an audience on rented land. Algorithm changes, shadowbans, account suspensions. You have zero control.

A website is the only piece of digital infrastructure you actually own. It’s insurance against platform dependency. It’s leverage when Booksy tries to squeeze margins. It’s the foundation for everything else, email lists, direct booking, content marketing, local SEO dominance.

Most importantly, it separates you from every other business stuck in the same Booksy/Instagram loop. When someone Googles “nail salon Częstochowa,” the businesses with proper websites win. The rest don’t even appear.

The Opportunity

This is why I’m building them anyway.

The fact that Polish SMBs don’t see the value is exactly why there’s value in showing them. The market is underserved because the market doesn’t know it needs serving.

My approach isn’t to argue. It’s to show. I build the website first, using photos from their Instagram and services from their Booksy profile. Then I show them what they could own instead of rent.

Some will ignore it. Some will dismiss it. But some will see it and realize they’ve been thinking too small.

That’s the opportunity. Not convincing skeptics. Finding the 10% who are ready to see what ownership looks like.

What I’m Learning

Prospecting these businesses taught me more about market psychology than any course or framework ever could.

People don’t resist websites because they’re uninformed. They resist because their current setup works well enough, and change introduces risk with unclear reward.

The businesses that will adopt websites aren’t the ones doing poorly. They’re the ones doing well and starting to feel the ceiling. The owner who wants to expand but realizes Booksy doesn’t scale beyond one location. The restaurant that maxed out Instagram reach and needs another channel.

Understanding why they don’t need websites is more valuable than explaining why they do. It changes how I pitch, what I build, and who I target.

Where This Goes

I’m still early in this process. The AI system I wrote about in my previous post finds the prospects. But converting them requires understanding the mindset first.

Polish SMBs aren’t behind on digital marketing. They’ve optimized for the platforms available to them. Booksy and Instagram work. Websites don’t obviously improve on that equation.

My job isn’t to fight that logic. It’s to show what becomes possible when you own your infrastructure instead of rent it.

I’ll write more as this evolves.

This article is also available on maxmendes.dev.

Questions? Reach out — I reply within 24 hours.

Microservices Architecture for Modular EdTech File Processing

If you’re building or scaling a learning management system, you’ve probably seen this: exam week arrives, thousands of students upload assignments at once, and the system starts to slow down or crash.

Video processing delays document uploads. A failed virus scan blocks everything behind it. One bad file affects other students. When everything runs inside one big system, a small problem can impact everyone.

The fix isn’t just better servers. It’s a better architecture.

With a microservices approach, each task runs independently. You can scale specific parts, prevent failures from spreading, and meet strict education compliance requirements more easily. This guide is an architectural blueprint for technical decision-makers who need to build that system.

Key Takeaways

  • EdTech platforms need a smarter architecture to handle deadline spikes, many file types, and strict privacy rules.

  • Break the system into six clear services: Ingestion, Validation, Transformation, OCR, Metadata, and Delivery.

  • Use events (like Kafka) so each service works independently, and failures don’t affect everyone.

  • Keep files secure with limited access, encryption, audit logs, and regional data controls.

  • Plan for monitoring, error handling, and build-vs-buy decisions from the start.

To understand why this architecture matters, we first need to understand what makes EdTech file processing fundamentally different from other platforms.

The EdTech File Processing Problem Is Different

Most file processing guides are written for e-commerce or general SaaS products. Education platforms operate under very different pressures, and those differences shape how the system must be designed.

Content types vary a lot. One course might include PDFs, .ipynb notebooks, MP4 lectures, DOCX essays, audio exams, and image-based lab reports. Each format needs different processing, storage rules, and delivery methods, yet they all pass through the same platform.

Traffic is unpredictable and spiky. Uploads often surge right before deadlines. A platform with 50,000 students might receive most weekly submissions in just a few hours. The system must handle these bursts smoothly without slowing down or losing data.

Compliance is foundational. Family Educational Rights and Privacy Act (FERPA) protects student education records, and the Children’s Online Privacy Protection Act (COPPA) applies to platforms serving children under 13. In many regions, data residency rules also control where student files can be stored or processed. These aren’t details to fix later, they must shape the architecture from the start.

Accessibility directly affects grading. Teachers need to clearly review student work. That may require OCR for handwritten submissions, transcription for audio responses, and alt-text for images. These steps aren’t just user experience improvements; they directly support fair evaluation and learning outcomes.

These pressures are exactly why a monolithic system struggles. The solution is to break the lifecycle into clear, independent stages.

That’s where service decomposition comes in.

Service Decomposition: Six Focused Services

The main idea behind microservices is simple: decompose the file processing lifecycle into clear stages. Each stage is handled by a separate service. Services communicate through events, and each service owns only its own data.

Below is a typical way to divide responsibilities in an EdTech file pipeline:

1. Ingestion Service

The Ingestion Service is the single entry point for all file uploads. Whether a student uploads from a web app, mobile app, or an LMS like Canvas, Blackboard, or Moodle (via LTI), every file comes through this service first.

Its job is simple: receive the file, not process it. It assigns a unique ID (UUID), stores the raw file in object storage, and sends out a file.received event so other services know a new file is ready.

Keeping this service separate has big advantages. You can scale it during deadline rush hours, change your upload provider without breaking other services, and implement handling batch student submissions without touching validation or processing logic.

Key responsibilities:

  • Handle large and chunked uploads.

  • Convert different upload sources into a consistent internal format.

  • Avoid duplicates using content hashing before saving.

  • Emit a file.received event with UUID, source metadata, and raw storage reference.

Example file.received event:

{
  "event": "file.received",
  "file_id": "f3a1b2c4-d5e6-7890-abcd-ef1234567890",
  "source": "web_upload",
  "uploader_id": "student_88421",
  "course_id": "cs101_fall_2025",
  "assignment_id": "hw3",
  "original_filename": "submission_final_v2.pdf",
  "raw_storage_ref": "s3://edtech-raw/f3a1b2c4...",
  "received_at": "2025-11-15T22:14:03Z",
  "size_bytes": 2048744
}

Once a file enters the system, the next concern isn’t formatting; it’s safety and policy enforcement.

2. Validation Service

The Validation Service listens for file.received events. Before any processing happens, it checks whether the file is safe and allowed.

It verifies the real file type (not just the extension), runs antivirus scans, checks file size limits, and ensures the format matches the assignment rules. This prevents harmful or unsupported files from moving further in the pipeline.

If a file fails validation, the service emits a file.rejected event with a reason code. The system can then quickly notify the student. Importantly, this service never edits or converts files; it only approves or rejects them.

For security implementation details on protecting student-facing upload surfaces, see protecting educational platforms from malicious uploads.

Example internal API (OpenAPI fragment):

paths:
  /validate:
    post:
      summary: Trigger validation for a received file
      requestBody:
        content:
          application/json:
            schema:
              type: object
              required: [file_id, raw_storage_ref, assignment_policy_id]
              properties:
                file_id:
                  type: string
                  format: uuid
                raw_storage_ref:
                  type: string
                assignment_policy_id:
                  type: string
      responses:
        '202':
          description: Validation accepted, result delivered via event

Only after a file is approved should heavy processing begin.

3. Transformation Service

After a file passes validation, the Transformation Service prepares it for use. Its job is to standardise and optimise files so instructors and students can access them easily.

This may include converting DOC files to PDF for consistent grading, transcoding videos into adaptive streaming formats (like HLS), compressing and resizing images, or safely running and formatting code submissions in isolated containers.

This service usually requires the most computing power, so it’s a strong candidate for horizontal scaling (adding more instances during peak load). It may also rely on external processing tools or APIs, but those should be wrapped behind an internal interface so providers can be changed without affecting the rest of the system.

One important rule: transformation should be idempotent. If a job runs twice because of a retry or temporary failure, it should produce the same result. This can be done by generating output based on the file_id and transformation settings. If the processed file already exists in storage, the service simply returns its reference instead of processing it again.

Some content requires deeper extraction beyond simple conversion.

4. OCR / Text Extraction Service

The OCR / Text Extraction Service is separate because it behaves differently from other processing steps. It’s slower, more CPU-heavy, and often needs specialised models, especially for handwritten answers, math equations, or multiple languages.

This service listens for file.validated events (for supported document types). It extracts text from the file and then emits a file.text_extracted event that includes the extracted content and a confidence score.

Other services use this output. The Metadata Service can index the text for search. Accessibility tools can improve readability. In the future, an AI grading assistant could also analyse the extracted text.

Because OCR has unique performance and reliability challenges, keeping it isolated makes scaling and troubleshooting much easier.

For a deeper look at what’s possible with modern OCR in educational contexts, including handwriting recognition and equation parsing, see modern OCR capabilities for educational content.

Now that the file has been processed and analysed, the system needs a structured record of its state.

5. Metadata Service

The Metadata Service collects information from other services and builds a complete record for each file. It listens to events from validation, transformation, and OCR, then stores details like file type, processing status, extracted text, word count, video duration, and compliance labels.

This service owns the metadata database. No other service can directly read or write to it, all queries go through its API. That’s what allows advanced searches like “all handwritten submissions for Assignment 3 that are still ungraded,” without accessing raw file storage.

It also handles sensitive student information. Fields like student name, ID, and submission data must be protected at rest. Use field-level encryption for sensitive data and ensure only authorised roles can retrieve specific metadata records.

With processing complete and metadata stored, the final step is secure delivery.

6. Delivery Service

The Delivery Service controls who can access files and for how long. It generates signed URLs for instructors reviewing submissions and time-limited links for students viewing graded work. It also handles CDN cache updates when files change or access is revoked.

This service does not store or move files. It simply creates secure access paths to files already stored in object storage.

Because it’s isolated, you can change your CDN provider or update access control rules without affecting validation, transformation, or any other processing steps.

Breaking the system into services solves one problem. But how those services communicate determines whether the architecture truly scales.

Event-Driven Communication

All services communicate through a message broker instead of calling each other directly. This keeps them loosely coupled and easier to scale.

For large EdTech platforms, Apache Kafka is often preferred over RabbitMQ. Kafka stores durable, replayable event logs. That’s important for auditing file histories, meeting compliance requirements, and replaying events after outages.

Core file processing event lifecycle:

EventProducerConsumersfile.receivedIngestionValidationfile.validatedValidationTransformation, OCR, Metadatafile.rejectedValidationIngestion (notify student)file.transformedTransformationMetadata, Deliveryfile.text_extractedOCRMetadatafile.readyMetadataDelivery, Instructor notificationsfile.processing_failedAny serviceDLQ monitor, Ops alerts

Dead Letter Queues (DLQs) are essential.

If a message fails after several retries, it should move to a DLQ with full context. Operations teams need tools to inspect failed files, retry processing, or notify students if something went wrong. During exam periods, losing a submission silently is both an academic and legal risk, so failure handling must be deliberate and visible.

Events coordinate behaviour. Storage handles the actual file bytes. Both must be designed carefully.

File Storage Strategy

All services use the same object storage system (like S3, GCS, or Azure Blob). But they don’t all get full access. Each service has limited permissions based on what it needs.

  • Ingestion Service → can write files to raw/

  • Transformation Service → can read from raw/ and write to processed/

  • Delivery Service → can create secure read links for processed/

  • No service has full access to everything

Use a UUID for every file. When a file is uploaded, it gets a unique ID (UUID). That UUID becomes the file’s main identity across the entire system.

  • Storage paths include the UUID

  • Services communicate using the UUID

  • No service depends on the storage folder paths directly

This makes it easy to switch storage providers or reorganise buckets later without breaking anything.

File retention rules:

  • Files in raw/ can be deleted after about 30 days (or once processing is confirmed).

  • Files in processed/ follow school or legal retention rules. Some content must be kept longer due to compliance requirements.

Service deployment example (Docker Compose fragment):

services:
  ingestion:
    image: edtech/ingestion-service:latest
    environment:
      - KAFKA_BROKER=kafka:9092
      - S3_BUCKET=edtech-raw
      - S3_PREFIX=raw/
    depends_on:
      - kafka
      - minio
validation:
    image: edtech/validation-service:latest
    environment:
      - KAFKA_BROKER=kafka:9092
      - ANTIVIRUS_API_URL=http://clamav:3310
      - MAX_FILE_SIZE_MB=500
    depends_on:
      - kafka
  transformation:
    image: edtech/transformation-service:latest
    environment:
      - KAFKA_BROKER=kafka:9092
      - S3_RAW_BUCKET=edtech-raw
      - S3_PROCESSED_BUCKET=edtech-processed
      - PROCESSING_API_URL=http://filestack-adapter:8080
    deploy:
      replicas: 4  # Scale horizontally for peak load
    depends_on:
      - kafka

For handling the chunked upload mechanics that make large lecture video ingestion reliable, see techniques for large educational media files.

At this point, the architecture is internally complete. The next question is: which parts should you build yourself?

Integrating External Processing APIs

Not every feature needs to be built from scratch. Things like video transcoding, OCR models, and file format conversion are complex and expensive to maintain. Instead of building and managing that infrastructure yourself, you can use specialised external APIs.

The smart way to do this is to hide external APIs behind your own internal service layer.

How the integration works:

  • Internal Service: Your Transformation or OCR service triggers processing like normal.

  • Adapter Layer: This translates your internal format (events, UUIDs, metadata) into the format the external API expects. If you ever change providers, you only update the adapter, not the whole system.

  • Circuit Breaker: If the external API becomes slow or unavailable, this prevents failures from spreading through your system. It temporarily stops sending requests.

  • Fallback Strategy: If the external service fails, you can:

  • Retry later

  • Switch to another provider

  • Mark the file as “processing delayed” and notify the student

Why this is important:

  • You avoid managing heavy infrastructure.

  • You keep your architecture clean and modular.

  • You prevent external outages from breaking your system.

  • You can swap providers without rewriting your services.

💡Rather than building every adapter yourself, Filestack’s file processing APIs are designed to plug directly into the transformation and delivery layers described above: handling format conversion, virus scanning, CDN delivery, and more through a single integration point, so your team can focus on the educational logic that actually differentiates your platform.Start your free trial with Filestack!

For pre-built processing workflows, this article explains how advanced workflows can fit into EdTech architectures.

Regardless of what you build or buy, security must wrap every layer of this system.

Security and Compliance Implementation

Security isn’t a separate service. It must be built into every layer of the system.

  • Service-to-service security: Each service should use short-lived JWT tokens to prove its identity. No request should be accepted without a valid token, even inside your private network. Adding mTLS gives extra protection between services.

  • Audit logging: Every action on a student file: view, process, deliver, delete, must be recorded with who did it, when, and why. These logs should be permanent and stored according to institutional policy. Treat audit events as structured Kafka topics, not simple app logs.

  • Encryption: Use TLS (1.2 or higher) for all communication. Store files with AES-256 encryption. For highly sensitive documents, use stronger protections like envelope encryption.

  • Data residency: If a student’s data must stay in a specific region (like the EU), that rule should be added to the file’s metadata at ingestion. Processing services must respect this tag when choosing where to store or process the file. Adding this later is difficult; it should be designed in from the start.

Even a secure system can fail. That’s why visibility is just as important as design.

Monitoring and Observability

When one student upload passes through multiple services, you need full visibility. Use distributed tracing (like OpenTelemetry) across all services, and use the file_id as the trace ID. This lets you track exactly what happened to any submission from start to finish.

Important metrics to monitor:

  • Queue depth per service: If queues grow suddenly, something downstream is slow.

  • Processing time by file type: PDFs should be quick; videos can take longer but should stay within expected limits.

  • Dead Letter Queue (DLQ) rate: Spikes mean repeated failures.

  • Validation rejection rate: A sudden jump may signal a bug or a malicious upload attempt.

  • Signed URL generation time: Delays here mean students are waiting to access their graded work.

Set alerts before peak deadlines. If submissions are due at midnight, you want warnings hours earlier, not after students start complaining.

Finally, architecture isn’t just about design; it’s about strategic decisions.

When to Build vs. Buy

You don’t need to build everything yourself. Decide based on what truly makes your platform unique.

Build it if it directly affects your educational value, things like assignment rules, LMS integration, grading workflows, or compliance logic. These are part of your core product.

Buy or integrate if it’s standard infrastructure: file conversion, virus scanning, video transcoding, OCR, CDN delivery, or object storage. These are common problems with reliable third-party solutions.

Think carefully when it’s somewhere in between. For example, general OCR is easy to integrate. But if your platform specialises in chemistry equations or music notation, a custom OCR model might be worth building.

This architecture makes that boundary clear. External tools plug in through adapter layers. If a vendor changes pricing or performance drops, you replace the adapter, not your whole system.

When all these pieces come together: service isolation, event-driven flow, secure storage, external integration, and observability, you get a system built for long-term scale.

Conclusion

A strong EdTech file processing system is built around six focused services: Ingestion, Validation, Transformation, OCR, Metadata, and Delivery. These services communicate through durable events, use shared object storage with strict per-service permissions, and keep external processing tools behind internal adapters.

The benefit is clear: each stage can scale independently, failures stay isolated, audit trails are compliance-ready, and the system can grow without needing a complete redesign every time user numbers increase.

The real challenges aren’t in the basic upload flow. They’re in handling dead letter queues, maintaining FERPA-compliant audit logs, enforcing data residency rules, and setting alerts before deadline spikes. These should be designed from the beginning, not added later.

This article was published on the Filestack blog.

What Are Autonomous AI Agents? Complete Beginner Guide for Developers, Founders, and CTOs

Software is undergoing its biggest architectural shift since the rise of cloud computing. Instead of applications that simply respond to user input, we are now entering an era where software can operate independently. These systems are known as autonomous AI agents, and they are redefining how modern software and businesses function.

For developers, founders, and CTOs, understanding autonomous AI agents is quickly becoming essential knowledge. These systems are no longer experimental concepts. They are already being deployed in production environments to automate operations, monitor infrastructure, analyze data, and execute workflows without human supervision.

To understand why autonomous AI agents are so powerful, it helps to first understand the limitations of traditional software.

Traditional software operates based on predefined logic. Developers write explicit instructions that determine how software behaves in every scenario. This model works well for predictable workflows, but it breaks down when environments become complex or unpredictable.

For example, consider a traditional monitoring system. It can detect when CPU usage exceeds a threshold and send an alert. However, it cannot investigate the cause, determine the appropriate response, or execute corrective actions on its own. It depends entirely on human intervention.

Autonomous AI agents operate differently.

Instead of simply executing predefined instructions, autonomous AI agents can interpret goals, analyze context, make decisions, and execute actions independently. This allows software systems to operate continuously without requiring constant human supervision.

At the core of an autonomous AI agent is a reasoning engine, typically powered by a large language model. This reasoning engine enables the agent to understand instructions, analyze information, and determine appropriate actions.

However, reasoning alone is not enough. Autonomous agents also require memory.

Memory allows agents to store and retrieve information across interactions. This enables agents to maintain context, learn from past actions, and improve performance over time. Memory can include short-term working memory for active tasks, as well as long-term memory stored in vector databases or structured storage systems.

Another critical component of autonomous AI agents is tool integration.

Tools allow agents to interact with external systems such as APIs, databases, cloud services, and enterprise applications. For example, an AI agent can retrieve data from a database, send requests to an API, execute scripts, or update systems automatically.

This ability transforms AI agents from passive conversational tools into active operational systems.

Autonomous agents also operate within execution loops. These loops allow agents to continuously observe their environment, analyze information, execute actions, and evaluate outcomes. This creates a feedback cycle that enables continuous operation.

This architecture enables agents to perform complex tasks such as:

Monitoring infrastructure and resolving performance issues

Analyzing business data and generating reports

Automating customer support workflows

Managing operational processes

Executing multi-step workflows across multiple systems

This capability fundamentally changes how software systems operate.

Instead of requiring humans to constantly monitor systems and execute tasks manually, organizations can deploy autonomous agents that perform these tasks continuously.

This has profound implications for businesses.

Organizations can operate more efficiently by reducing manual operational work. Engineers can focus on building new systems instead of maintaining existing ones. Founders can scale operations without increasing operational overhead.

For developers, this introduces a new software paradigm.

Instead of building static applications that execute predefined logic, developers are building dynamic systems capable of reasoning, decision-making, and autonomous execution.

This shift is similar in magnitude to the transition from on-premise infrastructure to cloud computing. Developers who understood cloud architecture early gained a significant advantage. The same is true for autonomous agent architecture today.

Autonomous agents are already being deployed across industries.

Technology companies use agents to monitor infrastructure and resolve incidents automatically.

Financial institutions use agents to analyze transactions and detect anomalies.

Customer support systems use agents to handle inquiries and resolve issues.

Marketing systems use agents to optimize campaigns and automate workflows.

This trend is accelerating rapidly as AI models become more capable and infrastructure becomes more accessible.

Understanding how autonomous AI agents work is becoming a foundational skill for modern software professionals.

However, building reliable autonomous agents requires understanding architectural patterns, memory systems, tool integration, and execution frameworks.

A complete, implementation-focused guide explaining how autonomous AI agents are designed and deployed in enterprise environments is available here:

https://gofortool.com/en/books/the-agentic-enterprise/

This guide explains real-world architecture patterns, system design strategies, and implementation approaches used by modern organizations.

As AI continues to evolve, autonomous agents will become a core component of software systems. Developers, founders, and organizations that understand and adopt this architecture early will be better positioned to build scalable, intelligent, and efficient systems.

The transition from static software to autonomous systems is already underway. Understanding this shift today provides a significant advantage for the future.

Introducing TeamCity’s New Design, Phase II: Creation Flow

This is the second part in a series that dives into how and why we’re redesigning TeamCity. In Part One, we shared navigational and admin changes. In Part Two, we’ll dive deeper into the сreation flow redesign. We’ll also introduce you to the new UI and go into detail about the steps we’re taking to revamp TeamCity.

Introducing a more cohesive UI

We’re reimagining TeamCity’s design to meet the expectations of today’s developers. The goal is simple: help teams get from setup to their first successful build faster and with less friction.

Today, creating a first build can take more than 20 clicks, and only a fraction of users explore advanced features. By rethinking this experience, we’re making TeamCity more approachable for new users and more efficient for experts.

The creation flow is a key part of how people experience a product. It’s often the first thing users see, and it helps determine whether the product feels easy and welcoming or confusing and heavy.

A well-designed flow helps people get started without overthinking and guides them smoothly so they can focus on what they want to do, not how to do it.

Concept exploration

As outlined in Part One, our product interviews uncovered a recurring pain point: new users often get stuck when creating projects in TeamCity. The flow wasn’t intuitive, and creating or reusing connections was more complex than it should have been.

Our mission became simple: to make starting a new project effortless, straightforward, and enjoyable.

The most pressing problems were:

  1. A hidden entry point
  2. A cluttered UI with broken informational hierarchy
  3. Missing functionality that required the user to find workarounds

One major factor to keep in mind from the beginning is that we are bringing pipelines to TeamCity. So, the first challenge was to clearly communicate the difference between pipelines and build configurations and help users understand the distinct value each provides.

The second goal was to remove clutter and unnecessary information, guiding the user and displaying relevant settings when needed. TeamCity is one of the strongest CI tools on the market, but many of its strengths are hidden deep within the product.

During interview sessions, users mentioned pain points that had already been solved by functionality the users were simply not aware of, such as templates or VCS reuse.

We started by drawing up a flow chart of what the new step-by-step process might look like:

Concept → Prototype → Action

The design underwent multiple iterations and rounds of guerrilla testing before being handed over to the first client for evaluation. We conducted UX prototype testing with 10 clients, iterating after each session to refine and develop the mockups.

Once the design was validated, we worked closely with the engineering team to review all existing and new scenarios, ensuring complete coverage. Finally, we structured the delivery into iterations – and the first version is now live for users to explore.

Before
Concept
After

Features

Separate flow for project creation

In the old UI, users were never sure what would happen after triggering the creation process – would it create a project, a build configuration, or both? Separating them into distinct flows brought much-needed clarity and predictability.

Create projects from an existing repository URL

During the Journey Map study, we discovered the workarounds that users employed when attaching the VCS root. To streamline this process, we’ve added the option to create a project straight from a VCS root.

Easier VCS integration setup

Connecting TeamCity to your version control system is now simpler than ever. We’ve introduced a new connection interface that guides you through linking your GitHub, GitLab, or Bitbucket account before creating a VCS root.

Once the integration is set up, TeamCity can automatically find your repositories by name and help you configure the VCS root with just a few clicks.

We have added a setting to configure a new VCS connection in the creation flow. Setting up the connection both speeds up onboarding and enables TeamCity functionality to serve the user.

Create from template

Templates have always been one of TeamCity’s hidden gems. They simplify setup, reduce repetition, and make managing builds easier.

In the new design, we’ve made templates a visible part of the setup flow. Instead of digging through menus, users can opt to use a template in the early stages of build creation. This saves time and helps you get to your first successful build faster.

Welcome to the new TeamCity

We’re excited for you to get your hands on the new TeamCity and can’t wait to hear your thoughts!

Please feel free to share them here in the comments, and don’t hesitate to contact our Support team if you have any questions. We’re always here to help!

15 Things To Do Before, During, and After KotlinConf’26

So, you’re coming to KotlinConf’26? Maybe it’s your first time in Munich, or even your first time at KotlinConf. You have your tickets, you’ve learned the schedule by heart, but it might still feel a little overwhelming. Let me show you how you can have the best possible experience by guiding you through all the things you can do.

Believe me, KotlinConf is so much more than a conference!

1. Explore Munich (before things get busy)

If you’re arriving before the conference, try to set aside some time to explore the city. Munich has great food, interesting neighborhoods, and plenty of places to slow down before the schedule gets busy.

Not sure what to see? Keep an eye out for my social media posts, I’ll definitely have some tips for you! 😉

KotlinConf'26: Explore Munich

2. Skip the line. Register early

Standing in line on the first morning, half-awake and carrying a backpack, is not how you want to start KotlinConf.

Come to the venue the day before the conference kicks off on May 20 between 2:00 and 5:00 pm. You’ll avoid large queues, save time, and keep your energy for what really matters. And believe me, you’ll need it.

When you’re already checked in, you can walk in on Day 1, grab a coffee, find your seat, and focus on the talks instead of logistics. It’s a much better way to begin.

Save your spot at KotlinConf’26

3. Start with a workshop

If you want to get even more out of KotlinConf, start with a full day of hands-on workshops on May 20.

  • Build Shared UI With Compose Multiplatform.
  • Go Deeper Into Kotlin Multiplatform Architecture.
  • Master Coroutines and Asynchronous Programming.
  • Build High-Performance Backends With Spring Boot.
  • Create AI Agents in Kotlin.
  • Refactor Toward Functional Kotlin.

Workshops are practical, focused, and led by Kotlin experts. You’ll leave with skills you can apply right away.

Seats are limited, so make sure to save yours before they’re gone!

4. Don’t miss the Keynote!

Join us for the opening Keynote to see the big picture. This is where we set the stage, introduce the year’s most ambitious ideas, and bring the entire community together. There is a specific energy to experiencing these reveals live – don’t settle for the recap.

KotlinConf'25 Keynote

5. Take a selfie with me

If you spot me somewhere around the venue, don’t hesitate, come over and say “hello”. 
A quick selfie, a short chat, a shared laugh in between sessions, whatever, I’m down!
I’m always happy to meet you, hear where you’re from, what you’re working on, and how KotlinConf is going for you. And who knows, you might even end up in one of my posts. Let’s create some memories!

KotlinConf: Take a selfie with Kodee

6. Join the Coding Challenge

Engage in our Coding Challenge. It’s a good way to test yourself, learn something new, and see how you approach problems under a bit of pressure. It gets even more interesting when you are watching others code! And if you win? Well then, you’ve earned the right to brag a little.

7. Wander around the expo 

Don’t be shy, after you’ve met everything Kotlin, go out and meet some of our partners!
This is where companies, communities, and partners set up booths to connect with all of you!

See how Kotlin is used in practice, and talk directly with engineers. Ask some questions and discover new tools, libraries, and platforms. Of course, do some networking and meet other developers. And also, get some swag for yourself and colleagues back home, it’s always nice to have some souvenirs!

KotlinConf'25: Expo

8. Go to the party

On the evening of May 21, put on your fancy clothes (maybe even your dancing shoes) and come to the party.

This is where you can reconnect with friends you haven’t seen in a while, continue conversations that started earlier on in the day, and meet new people along the way. Come for the music, stay for the conversations, and leave with a few more names in your contacts list!

KotlinConf'25: Party

9. Wake up early for Day 2

After a night of partying, I know waking up isn’t always the easiest. But you have another full day of learning ahead, starting from the Day 2 Keynote at 9:00 am, where Lena Reinhard will take the stage to set the tone for the day. Set your alarm, grab a coffee (no worries if you don’t have time for one: The team and plenty of coffee will be waiting for you at the venue), and head down to the conference. You’ll be glad you did. 

10. Learn something new (and eat well)

The main reason you’re here is to learn – new tools, ideas, or ways of thinking.
Take notes, ask questions, and give yourself time to reflect. And don’t forget to eat. Good food keeps you focused and energized, and don’t worry, KotlinConf has plenty of delicious eats on offer!

KotlinConf: Food at the conference

11. See the future! 

Make a wish, write a prediction, and place it in a time capsule. We’re going to open it and reveal what’s inside in 2031, so you’ll have to be patient and wait a bit to see if you were right! 

12. Take part in the games

After all the learning and networking, it’s always good to relax. Use what you’ve just learned and play a game built with Compose Multiplatform for web, with a chance to win a prize. If I were you, I wouldn’t miss this! 

13. Attend the Golden Kodee Awards

The Golden Kodee Awards celebrate individuals and communities who make a real impact by sharing knowledge, organizing events, and inspiring others. This year is special, because 2026 marks the very first time these awards are being presented.
 It’s a chance to recognize outstanding contributors and thank them for everything they do. And let’s be honest, gold looks great on me, don’t you agree?

14. Don’t forget to take a photo with me!

Did I already mention that?

KotlinConf'25: Photos with Kodee

15. Leave feedback and vote

Vote for all the sessions you attended. Leave your comments in the app about the talks. Reach out to us – don’t be shy, your feedback means a lot. We want to provide the best possible experience. The next KotlinConf might be even better than this one!

Save your spot at KotlinConf’26

+ 16. Mark your calendars

Before you head home and dive back into your inbox, take a moment to mark the dates for next year’s KotlinConf. It’s a small step, but it makes sure you won’t miss it once work gets busy.

I already miss you all, so let’s make sure we meet again next year.

Toolbox App 3.3: Introducing jetbrainsd, an Improved Linux Experience, and More

Toolbox App 3.3 introduces jetbrainsd – a lightweight background service that lays the foundation for cross-IDE features like protocol handling. This release also brings significant stability improvements for Linux users, smoother plugin updates, and a number of bug fixes across all platforms.

jetbrainsd

Toolbox App 3.3 ships with the jetbrainsd service – a new lightweight background process that starts automatically when you launch the Toolbox App.

What this means for you:

  • jetbrains:// links now route through the daemon instead of the Toolbox App itself, providing a more reliable experience when opening links from browsers, documentation, or external applications.
  • The service is managed automatically by the Toolbox App – it installs, starts, and updates alongside the Toolbox App.

Read more in the docs. 

Improved Linux experience

Linux users will notice several stability improvements in this release. Highlights include:

  • The Toolbox App widget no longer stays stuck on top of windows after the app restarts or updates on Linux.
  • The Toolbox App no longer shuts down instantly or fails to restart during the update process.
  • If you previously couldn’t log in to the Toolbox App on a fresh GNOME setup due to a missing secret collection, you can now authenticate successfully.
  • The .desktop file now uses relative icon paths instead of absolute ones, improving compatibility with different Linux setups.

Plugin updates without restart

Plugins can now be updated without restarting the application. Previously, after distributing a new plugin version to the plugin directory, you had to restart the Toolbox App to load the updated version. This is no longer necessary, as the Toolbox App will pick up plugin changes automatically.

Remote development fixes

  • The Toolbox App remote agent now properly stops when you select Close and Stop and disconnect via SSH, instead of remaining active in the background.
  • The app no longer crashes with an RpcClient was cancelled error when navigating to SSH remote development targets after restarting your machine.
Download Toolbox App

We’d love to hear your thoughts on Toolbox App 3.3! Your feedback helps us improve the product, so please share your experience in the comments.

The JetBrains Toolbox App team

Hardening the Open VSX Registry: Keeping it reliable at scale

Hardening the Open VSX Registry: Keeping it reliable at scale

Denis Roy, Head of Information Technology, Eclipse Foundation

As the Open VSX ecosystem continues to grow, keeping the registry stable is a top priority. Behind the scenes, we are strengthening the infrastructure so that even during peak loads or major provider outages, developer workflows remain uninterrupted.

In recent posts, we shared how the Open VSX Registry is strengthening supply-chain security with pre-publish checks and introducing operational guardrails through rate limiting to scale responsibly. As adoption and usage increase, the underlying infrastructure behind those improvements becomes just as important. This post focuses on that work: improving availability, reducing single points of failure, and making recovery faster and more predictable when incidents occur.

A hybrid, fail-safe architecture

We are currently transitioning to a hybrid infrastructure model, moving core services to AWS as our primary environment, while keeping our on-premise infrastructure fully operational as a secondary site.

This is deliberate architectural diversity. AWS provides scale and flexibility. Our on-premise environment provides an independent fallback. If a cloud region experiences an outage, services can shift to infrastructure under our direct control.

The objective is simple: keep the registry online even when part of the underlying environment is not.

High-availability storage

Compute alone does not keep a registry running. The data must be available wherever the service is active.

As part of our infrastructure improvement plan, we are adding a dedicated fallback storage cluster and synchronizing extension binaries and metadata across locations. This reduces reliance on any single storage layer and prevents situations where one environment is healthy but lacks the data it needs. 

If one storage layer becomes unreachable, the other is ready to step in.

Seeing issues before they become outages

Reducing downtime starts with visibility.

We are modernizing our observability stack across both cloud and on-prem environments, strengthening monitoring, centralized logging, and real-time alerting. This makes it easier to detect slowdowns, rising error rates, or unusual traffic patterns before they impact users.

Earlier detection leads to faster resolution and fewer user-visible incidents.

Faster recovery through clearer process

Technology improves reliability. Process makes it consistent.

We are formalizing incident response and recovery procedures for our multi-site architecture. Updated runbooks and rehearsed failover scenarios reduce mean time to recovery and remove uncertainty during high-pressure events.

When something does go wrong, clarity and speed make all the difference.

Why this work matters

The Open VSX Registry now supports a rapidly expanding ecosystem of developer platforms, CI systems, and AI-enabled tools. Growth brings higher expectations for uptime and reliability.

These infrastructure improvements are a long-term investment in keeping the Open VSX Registry stable, secure, and dependable as it scales.

Security builds trust. Operational guardrails support sustainability. Infrastructure upgrades ensure the service remains available when it matters most.

The Open VSX Registry is shared public infrastructure. Keeping it reliable requires continuous investment, thoughtful architecture, and disciplined operations. This work strengthens the registry so developers, publishers, and platform providers can rely on it with confidence, today and as the ecosystem continues to evolve.

It’s a team effort

This work reflects the effort of many people across the Eclipse Foundation and the broader Open VSX community. From the IT teams to Software Development, Security and beyond, including our community of users, developers, testers and integrators, all have contributed to making Open VSX a world‑class, high‑value extension registry that continues to grow through focused stewardship, open collaboration, and a commitment to empowering developers everywhere.

We also appreciate the collaboration of our cloud and infrastructure partners who continue to support the reliability and performance of the Open VSX Registry.

Denis Roy


Building Skill Align – Part 6 – Project Staffing Assistant(Backend)

I started with the first feature in this project: Project Staffing Assistant.

Project Staffing Assistant helps managers decide which candidates are suitable for a project based on actual project requirements.

I began with the backend, building the intelligence layer in Apex.

The Core Service – SkillEvaluatorService

public with sharing class SkillEvaluatorService 

Two important design decisions:

  • public → Required because LWC will call this Apex class

  • with sharing → Ensures record-level security is respected

I had previously configured roles, OWD, and sharing rules (Refer here).
Using with sharing ensures this evaluation logic follows those configurations.

Apex Sharing Behavior:

  • Apex runs in system context by default. Object-level and field-level permissions are not automatically enforced.

  • with sharing enforces record-level sharing rules only, ensuring queries and DML respect the current user’s access.

  • with sharing does not enforce object or field permissions. You must explicitly handle CRUD/FLS (e.g., WITH SECURITY_ENFORCED or Security.stripInaccessible()).

  • If no sharing keyword is defined, the class inherits sharing from its caller, so behavior may vary depending on depending on how it is invoked.

  • Triggers run in system context. Even if a helper class is marked with sharing, the trigger executes in system mode.

Designing Data Transfer Objects

Instead of returning raw Employee__c or Employee_Skill__c records, I created Data Transfer Objects or DTOs.

DTOs define the structured connection between backend and UI. They wrap only the fields required by the frontend, preventing unnecessary exposure of internal data.

For this feature, the UI needed:

  1. Detailed skill gap information (for manager-level decision making)

  2. Candidate-level summary information

Note: @AuraEnabled is required for LWC(UI) to access Apex properties and methods.

Skill-Level DTO

public class SkillGapDetail {
    @AuraEnabled public String skillName;
    @AuraEnabled public Integer requiredLevel;
    @AuraEnabled public Integer impact;
}

Represents a single skill gap for a candidate.

Advantages:

  • All evaluation logic runs in Apex, so UI performs no calculations

  • Business logic stays in the backend

  • UI remains lightweight

  • Future logic changes don’t affect frontend code

Candidate-Level DTO

public class CandidateResult {
    @AuraEnabled public String employeeName;
    @AuraEnabled public Decimal gapScore;
    @AuraEnabled public Boolean isProjectReady;
    @AuraEnabled public SkillGapDetail detail;
}

For each evaluated employee, the UI receives:

  • Employee name

  • Final gap score

  • Ready / Not Ready flag

  • Skill Gap Detail

This keeps the response clean and structured.

Entry Point – evaluateProject()

@AuraEnabled
public static List<CandidateResult> evaluateProject(Id projectId, Integer topN)

Responsibilities:

  • Accept a Project Id

  • Evaluate unallocated employees

  • Rank them

  • Return top N candidates

  • Persist evaluation results

Guard Clause

  • Guard clauses help prevent unnecessary processing and avoid unexpected or confusing UI behavior.
if (projectId == null) return new List<CandidateResult>();

If no project is provided, evaluation stops.

Prevents:

  • Null pointer exceptions

  • Unexpected UI errors

  • Wasted governor limits

Load Project Requirements

List<Project_Skill_Requirement__c> reqs = [
    SELECT Skill__c, Required_Level__c,
           Importance__c, Weight__c
    FROM Project_Skill_Requirement__c
    WHERE Project__c = :projectId
];

Each requirement contains:

  • Skill

  • Required Level

  • Importance (Required / Nice-to-have)

  • Weight

After fetching, I converted them into Maps for fast access.

Why Maps?

Governor limits restrict queries per transaction. Querying inside loops risks hitting limits. By storing data in Maps:

  • Avoid repeated SOQL calls

  • Ensure constant-time lookups (O(1))

  • Keep code bulk-safe

Maps are essential in Apex for this reason.

Weighted Impact Formula

This is the heart of the evaluation engine.

I first compute the deficit to rank candidates:
deficit = requiredLevel – employeeLevel;

By itself, this treats all skills equally. To make evaluations more realistic, I introduced weighted scoring:

Integer impact = deficit * importanceMultiplier * weight;

Where:

  • Required skill → multiplier = 2

  • Nice-to-have → multiplier = 1

  • Weight → configurable per skill

From this I ensured,

  • Missing a critical skill has higher impact

  • Minor skills don’t disproportionately penalize a candidate

The result is a system that is realistic and flexible rather than rigid.

Effective Level – Making It Smarter

Raw skill levels aren’t always reliable. To improve accuracy, I introduced two adjustments:

  1. Confidence adjustment
  2. Staleness adjustment

1. Confidence Adjustment

Boolean isTrusted = (src == 'Manager-assessed');
Integer confidenceAdjust = isTrusted ? 0 : 1;
Integer afterConfidence = rawLevel - confidenceAdjust;
  • Self-assessed → reduce slightly
  • Manager-assessed → keep unchanged

2. Staleness Adjustment

Date staleCutoff = Date.today().addMonths(-12);

if (lastVerified == null) {
    stalenessAdjust = 2;
} else if (lastVerified <= staleCutoff) {
    stalenessAdjust = 1;
}

Never verified → larger reduction

Verified >12 months ago → slight reduction
Finally, the effective level is computed as:

Integer effectiveLevel = afterConfidence - stalenessAdjust;
if (effectiveLevel < 0) effectiveLevel = 0;

This makes the evaluation time and credibility aware, preventing outdated or inflated skill ratings from misleading staffing decisions.

Ranking Candidates

results.sort(new CandidateComparator());

Custom comparator:

private class CandidateComparator implements System.Comparator<CandidateResult> {
    public Integer compare(CandidateResult x, CandidateResult y) {
        if (x.gapScore != y.gapScore) {
            return (x.gapScore < y.gapScore) ? -1 : 1;
        }
        return x.employeeName.toLowerCase()
               .compareTo(y.employeeName.toLowerCase());
    }
}

Sorting priority :

  1. Lowest gap score

  2. Alphabetical order as tie-breaker

Using this comparator ensures deterministic sorting, providing consistent results across repeated evaluations.

Project Ready Logic

cr.isProjectReady = (requiredImpact == 0);

If all required skills have zero impact, the candidate is ready.

Nice-to-have gaps don’t block readiness, preventing unnecessary hiring when existing employees are suitable.

Persisting Recommendations

The evaluation results are stored in the Project_Candidate__c object.

A composite key is used to uniquely identify each candidate for a project:

pc.Project_Employee_Key__c =
    String.valueOf(projectId) + '|' + String.valueOf(employeeId);

Note: – The Project_Employee_Key__c is a Text field marked Unique and Required.

The records are then saved using:

upsert candidates Project_Employee_Key__c;

upsert ensures:

  • Insert if record doesn’t exist

  • Updates the record if it already exists

  • Prevents duplicate records

  • Allows re-evaluation to update previous scores

[AutoBe] We Built an AI That Writes Full Backend Apps — Then Broke Its 100% Success Rate on Purpose with Weak Local LLMs

TL;DR

Z-AI GLM v5

  • Github Repository: https://github.com/wrtnlabs/autobe
  • Generated Examples: https://github.com/wrtnlabs/autobe-examples

AutoBe is an open-source AI agent that generates complete backend applications (TypeScript + NestJS + Prisma) from natural language.

  • We adopted Korean SI methodology (no code reuse) and hit 100% compilation + near-100% runtime success
  • Real-world use exposed it as unmaintainable, so we rebuilt everything around modular code generation
  • Success rate cratered to 40% — we clawed it back by:
    • RAG optimization for context management
    • Stress-testing with weak local LLMs (30B, 80B) to discover edge cases
    • Killing the system prompt — replacing prose instructions with strict function calling schemas and validation feedback
  • A 6.75% raw function calling success rate becomes 100% through validation feedback alone
  • With GLM v5 (local LLM), we’re back to 100% compilation success
  • AutoBe is no longer a one-shot prototype builder — it now supports incremental feature addition, removal, and modification on completed projects
  • Runtime success (E2E tests) has not recovered yet — that’s next

1. The Original Success (And Its Hidden Problem)

We achieved 100% compilation success. Every generated application compiled without errors, every E2E test passed, every API returned correct results. By every metric, the system was perfect.

Then we threw it all away and rebuilt from scratch.

AutoBe is an open-source AI agent, developed by Wrtn Technologies, that generates production-ready backend applications from natural language. You describe what you need in a chat interface, and AutoBe produces a complete TypeScript + NestJS + Prisma codebase — database schema, API specification, E2E tests, and fully typed implementation code.

With GLM v5 — a local LLM — we’ve clawed our way back to 100%. Smaller models aren’t there yet. This is the story of why we broke it, and what it took to start recovering.

When we first built AutoBe, we looked at how Korean SI (System Integration) projects are developed — government SI, financial SI, healthcare SI.

Their methodology is strict waterfall, and it enforces one distinctive principle: each API function and test function must be developed completely independently.

This means:

  • No shared utility functions
  • No code reuse between API endpoints
  • Every operation is self-contained
flowchart LR
  subgraph "Original Architecture"
    API1["POST /users"] --> Impl1["Complete Implementation A"]
    API2["GET /users/:id"] --> Impl2["Complete Implementation B"]
    API3["PUT /users/:id"] --> Impl3["Complete Implementation C"]
  end

We considered this the most orthodox, battle-tested approach to backend development — and adopted it wholesale.

And it worked. We achieved 100% compilation success and near-100% runtime success — meaning not only did every generated application compile without errors, but the E2E tests actually passed and the APIs returned correct results.

Each API had its own complete implementation. No dependencies. No shared code. The AI generated each function in isolation, and the compiler validated them independently.

E2E Test Code Example

Generated E2E test results showing all tests passing

Every API and test function was written independently. And it worked surprisingly well.

1.1. Why This Methodology Exists

The logic behind this approach isn’t arbitrary. In Korean SI projects:

  • Separation of responsibility: Each developer is accountable for their specific functions
  • Regulatory compliance: Auditors need to trace exactly which code handles which data
  • Conservative stability: Changing shared code risks cascading failures

I once reviewed code written by bank developers. They had a function to format numbers with thousand separators (e.g., 3,000,000) — duplicated identically across dozens of API endpoints.

From their perspective, this was correct: no shared dependencies means no shared risk.

1.2. The Real-World Problem

Then we tried to use AutoBe for actual commercial projects.

Requirements changed.

In a waterfall approach, changing requirements should be handled at the specification phase. But reality doesn’t follow textbooks. Clients change their minds. Market conditions shift. What seemed like a final specification evolves.

And with our “no code reuse” architecture, every small change was amplified across the entire codebase.

“Can you add a created_by field to track who created each record?”

Simple request. But with 50 endpoints that handle record creation, we had to regenerate 50 completely independent implementations. Each one needed the exact same change. Each one had to be validated independently.

It was hell.

But the deeper problem wasn’t just the cost of changes — it was that AutoBe had no concept of maintenance at all. It was a one-shot prototype builder. You described what you wanted, it generated a complete application, and that was it.

Want to add a notification system three weeks later? Start over. Want to remove the comment feature? Start over. Want to change how user permissions work? Start over.

We had built an impressively thorough generation pipeline — requirements analysis, database design, API specification, E2E tests, implementation — but it produced disposable code.

In the real world, software is never finished. Requirements evolve continuously. An AI agent that can’t evolve with them is a toy, not a tool.

We understood why SI development enforces these patterns. But we weren’t building applications for 20-year maintenance cycles with teams of specialized maintainers.

We needed an agent that could grow with a project — and our architecture made that fundamentally impossible.

flowchart
subgraph "Backend Coding Agent"
  coder("Facade Controller")
end
subgraph "Functional Agents"
  coder --"Requirements Analysis"--> analyze("Analyze")
  coder --"ERD"--> database("Database")
  coder --"API Design"--> interface("Interface")
  coder --"Test Codes" --> test("Test")
  coder --"Main Program" --> realize("Realize")
end
subgraph "Compiler Feedback"
  database --"validates" --> prismaCompiler("Prisma Compiler")
  interface --"validates" --> openapiValidator("OpenAPI Validator")
  interface --"generates" --> tsCompiler("TypeScript Compiler")
  test --"validates" --> tsCompiler("TypeScript Compiler")
  realize --"validates" --> tsCompiler("TypeScript Compiler")
end

2. The Decision: Embrace Modularity

We made a radical choice: rebuild AutoBe to generate modular, reusable code — not just for cleaner output, but because modularity is the prerequisite for maintainability.

If the generated code has stable module boundaries, then adding a feature means generating new modules and updating affected ones. Not starting over.

flowchart TB
  subgraph "New Architecture"
    subgraph "Reusable Modules"
      Collector["Collectors<br/>(DTO → Prisma)"]
      Transformer["Transformers<br/>(Prisma → DTO)"]
    end
    subgraph "Operations"
      POST["POST /users"]
      GET["GET /users/:id"]
      PUT["PUT /users/:id"]
    end
    POST --> Collector
    POST --> Transformer
    GET --> Transformer
    PUT --> Collector
    PUT --> Transformer
  end

The new architecture separates concerns into three layers:

  1. Collectors: Transform request DTOs into Prisma create/update inputs
  2. Transformers: Convert Prisma query results back to response DTOs
  3. Operations: Orchestrate business logic using collectors and transformers

When requirements change, you update the collector or transformer once, and all dependent operations automatically get the fix.

2.1. The Immediate Consequence

Compilation success dropped to under 40%.

The moment we introduced code dependencies between modules, everything became harder:

  • Circular dependency detection
  • Import ordering validation
  • Type inference across module boundaries
  • Interface compatibility between generated modules

Our AI agents, optimized for isolated function generation, suddenly had to understand relationships. They had to know that one module’s output is compatible with another module’s input. They had to understand that interfaces between modules must match exactly.

The margin for error vanished.

The self-healing feedback loops we relied on — compiler diagnostics feeding back to AI agents — were overwhelmed by cascading errors. Fix one module, break three others.

3. The Road Back to 100%

We spent months rebuilding. Here’s what it took.

3.1. RAG Optimization for Context Management

The first breakthrough was realizing our AI agents were drowning in context. With modular code, they needed to understand:

  • The database schema
  • All related collectors
  • All related transformers
  • The OpenAPI specification
  • Business requirements

Passing all of this in every prompt was noisy. The AI couldn’t find the relevant information in the sea of context.

Commercial models like GPT-4.1 or Claude could muscle through a bloated context window — their sheer capacity compensated for the noise. Local LLMs couldn’t. A 30B model fed the entire specification would lose track of what it was generating and hallucinate wildly.

We implemented a hybrid RAG system combining vector embeddings (cosine similarity) with BM25 keyword matching. Now, when generating a module, the system retrieves only the relevant requirement sections — not the entire 100-page specification.

Local LLMs that previously failed on anything beyond a toy project started handling complex, multi-entity backends — the same tasks that used to require commercial API calls.

3.2. Stress-Testing with Intentionally Weak Models

AutoBe’s core philosophy is not about making smarter prompts or more sophisticated orchestration — it’s about hardening the schemas and feedback loops that surround the LLM.

The AI can hallucinate, misinterpret, or produce malformed output. Our job is to catch every failure mode and feed precise diagnostics back so the next attempt succeeds.

The question was: how do you find edge cases you don’t know exist?

Our answer: use intentionally weak models as stress testers. A strong model like GPT-4.1 papers over ambiguities in your schemas — it guesses what you meant and gets it right. A weak model exposes every gap mercilessly.

We ran two local LLMs against the same generation tasks:

Model Success Rate What It Exposed
qwen3-30b-a3b-thinking ~10% Fundamental AST schema ambiguities, malformed output structures, missing required fields
qwen3-next-80b-a3b-instruct ~20% Subtle type mismatches and edge cases that only surface in complex nested relationships

The ~10% success rate with qwen3-30b-a3b-thinking was the most valuable result. Every failure pointed to a place where our AST schema was ambiguous, our compiler diagnostics were vague, or our validation logic had a blind spot.

Each fix didn’t just help the weak model — it tightened the entire system. When a schema is precise enough that even a 30B model can’t misinterpret it, a strong model will never get it wrong.

This is also why local LLMs matter for cost reasons: discovering these edge cases requires hundreds of generation-compile-diagnose cycles. At cloud API prices, that’s prohibitive.

Running locally, we could iterate relentlessly until every failure mode was catalogued and addressed.

3.3. Killing the System Prompt

We made a counterintuitive decision: minimize the system prompt to almost nothing.

Most AI agent projects pour effort into elaborate system prompts — long, detailed instructions telling the model exactly how to behave. Inevitably, this leads to prohibition rules: “do NOT generate utility functions,” “NEVER use any type,” “do NOT create circular dependencies.”

The problem is that prohibition rules often backfire. When you tell a language model “do not do X,” you’re placing X front and center in its attention. The model now has to represent the forbidden pattern to avoid it — and in practice, this increases the probability of producing exactly what you prohibited.

It’s the “don’t think of a pink elephant” problem, baked into token prediction.

We went the opposite direction. To build an agent that works consistently across different LLMs, we stripped the system prompt down to bare essentials: only the minimum rules and principles, stated with maximum clarity and brevity. No verbose explanations. No prohibition lists.

Instead, we moved the “prompting” into two places where ambiguity doesn’t survive — and where prohibition rules simply aren’t needed:

1. Function calling schemas — strict type definitions with precise annotations on every type and property. A JSON Schema with a well-named field and a clear description is unambiguous in a way that natural language instructions never are.

AutoBe defines dedicated AST types for every generation phase. The AI doesn’t produce raw code — it fills in typed structures that our compilers convert to code:

  • Database schema AST — Prisma models, fields, relations, indexes
  • API specification AST — OpenAPI schemas, endpoints, DTOs
  • Test function AST — E2E test expressions, assertions, random generators
// DTO types: the AI defines request/response schemas from a closed set of AST nodes
export namespace AutoBeOpenApi {
  export type IJsonSchema =
    | IJsonSchema.IConstant
    | IJsonSchema.IBoolean
    | IJsonSchema.IInteger
    | IJsonSchema.INumber
    | IJsonSchema.IString
    | IJsonSchema.IArray
    | IJsonSchema.IObject
    | IJsonSchema.IReference
    | IJsonSchema.IOneOf
    | IJsonSchema.INull;
}

// Test functions: 30+ expression types forming a complete test DSL
export namespace AutoBeTest {
  export type IExpression =
    | IBooleanLiteral   | INumericLiteral    | IStringLiteral
    | IArrayLiteralExpression   | IObjectLiteralExpression
    | ICallExpression   | IArrowFunction     | IBinaryExpression
    | IArrayMapExpression       | IArrayFilterExpression
    | IFormatRandom     | IPatternRandom     | IIntegerRandom
    | IEqualPredicate   | IConditionalPredicate
    | ...  // 30+ variants in total
}

Every variant is a discriminated union with annotated properties. The model can’t produce an invalid shape — the type system physically prevents it, and validation catches anything that slips through.

2. Validation feedback messages — when the compiler catches an error, the diagnostic message itself becomes the guide. Each message is crafted to tell the model exactly what went wrong and what the correct form looks like.

To put this in perspective: qwen3-coder-next‘s raw function calling success rate for DTO schema generation is just 15% on a Reddit-scale project. For a shopping mall backend, where the project is larger and more complex, that drops to 6.75%.

That means roughly 93 out of 100 function calls produce invalid output.

Yet the interface phase finishes with 100% success. Every single DTO schema is generated correctly.

Validation feedback turns a 6.75% raw success rate into 100% — not 92%, not 96%, but 100%. Every failed call gets a structured diagnostic — exact file, exact field, exact problem — and the model corrects itself on the next attempt.

This is the loop we hardened by stress-testing with local LLMs: every edge case we discovered became a more precise feedback message, and every more precise message pushed the correction rate higher.

Qwen3-Coder-Next

Qwen3-Coder-Next’s function calling success rate for constructing DTO schema drops as low as 6.75%. Yet validation feedback turns that abysmal 6.75% into a 100% completion rate.

You could say the system prompt didn’t disappear — it migrated from free-form text into schemas and feedback loops.

The result surprised us. When instructions live in type definitions and validation messages rather than prose, model variance nearly vanishes.

We didn’t need to write different prompts for different models. A type is a type. A schema is a schema. Every model reads them the same way.

How strong is this effect? On more than one occasion, we accidentally shipped agent builds with the system prompt completely missing — no instructions at all, just the bare function calling schemas and validation logic.

Nobody noticed. The output quality was indistinguishable.

That’s when we knew: types and schemas turned out to be the best prompt we ever wrote, and validation feedback turned out to be better guidance than any orchestration logic.

4. The Results

After months of work, here’s where we stand — local LLMs only.

Every model passes all prior phases (requirements analysis, database schema, API specification, E2E tests) with 100% success. The only remaining errors occur in the final realize phase, where the generated code must compile. The scores below show the compilation success rate (error-free functions / total generated functions):

Model Backend todo bbs reddit shopping
z-ai/glm-5 ✅ 100 ✅ 100 ✅ 100 ✅ 100
deepseek/deepseek-v3.1-terminus-exacto ✅ 100 🔴 87 🟢 99 ✅ 100
qwen/qwen3-coder-next ✅ 100 ✅ 100 🟡 96 🟡 92
qwen/qwen3-next-80b-a3b-instruct 🟡 95 🟡 94 🔴 88 🟡 91
qwen/qwen3-30b-a3b-thinking 🟡 96 🟡 90 🔴 71 🔴 79

To be honest: runtime success has not recovered yet. The original architecture achieved near-100% E2E test pass rates. With the new modular architecture, we’re not there.

Compilation is a necessary condition, not a sufficient one — code that compiles doesn’t guarantee correct business logic. Runtime recovery is our next frontier.

But more importantly, the generated code is now maintainable:

// Before: 50 endpoints × duplicated logic
// After: 1 collector, 1 transformer, 50 thin operations

// When requirements change:
// Before: Modify 50 files
// After: Modify 1 file

4.1. Developer Experience

We felt the difference firsthand when building an administrative organization management system. Requirements changed constantly — not just field additions, but structural changes.

The client restructured the entire department hierarchy from a flat list to a tree. Then they bolted on a multi-level approval workflow that cut across departments. Then they changed permission scopes from role-based to position-based — twice.

With the old architecture, each of those changes would have meant regenerating the entire application from scratch.

With the modular architecture, restructuring the department hierarchy meant regenerating only the modules responsible for department data — every API that consumed them just worked with the updated structure. Adding the approval workflow meant generating new modules without touching existing ones.

The system grew incrementally instead of being rebuilt from zero each time.

4.2. From Prototype Builder to Living Project

There’s another result that doesn’t show up in the benchmark table.

Remember the core problem from Section 1: the old AutoBe was a one-shot prototype builder. Generation was impressive, but the moment you needed to change anything, you started over. That made AutoBe a demo, not a development tool.

With the modular architecture, that limitation is gone. AutoBe now supports incremental development on completed projects:

  • Add a feature: “Add a notification system” → AutoBe generates new notification collectors, transformers, and operations. Existing user, article, and comment modules stay untouched.
  • Remove a feature: “Remove the comment system” → AutoBe removes comment-related modules and updates the operations that referenced them. Everything else remains intact.
  • Modify behavior: “Change permissions from role-based to attribute-based” → AutoBe regenerates the permission modules and the operations that depend on them. The rest of the codebase is unaffected.

This is possible because the generated modules form stable boundaries. Each module has a well-defined interface.

When requirements evolve, AutoBe identifies which modules are affected, regenerates only those, and validates that the updated modules still integrate correctly with the rest.

The old AutoBe generated code. The new AutoBe maintains code. That’s the difference between a toy and a tool.

5. Lessons Learned

5.1. Success Metrics Can Mislead

We had 100% compilation success. By every metric, the system was working. But metrics don’t capture maintainability. They don’t measure how painful it is to change things.

The willingness to sacrifice a “perfect” metric to solve a real problem was the hardest decision.

5.2. Weak Models Are Your Best QA Engineers

Not for production — but for hardening your system. A strong model compensates for your mistakes. A weak model refuses to. Every edge case we discovered with qwen3-30b-a3b-thinking was a gap in our schemas or validation logic that would have silently degraded output quality for all models.

If you’re building an AI agent, test it with the worst model you can find.

5.3. Types Beat Prose

We spent months perfecting system prompts. Then we stripped them to almost nothing and moved the instructions into function calling schemas and validation feedback messages.

The result was better — and model-agnostic. Natural language is ambiguous. Types are not. If you can express a constraint as a type, don’t express it as a sentence.

5.4. RAG Isn’t Just About Retrieval

Our RAG system doesn’t just retrieve documents. It curates context. The AI needs to see the right information at the right time, not everything all at once.

5.5. Modularity Compounds

The short-term cost of modularity (40% success rate, months of rebuilding) was high. But modularity compounds. Each improvement to our compilers, our schemas, our validation logic benefits every module generated from now on.

6. What’s Next

We’re not done. Current goals:

  • 100% runtime success: Compilation success doesn’t guarantee business logic correctness. Runtime recovery is our top priority.
  • Multi-language support: The modular architecture makes this feasible. Collectors and transformers can compile to different target languages.
  • Incremental regeneration: Only regenerate modules affected by requirement changes, not the entire codebase.

7. Conclusion

The journey from 100% → 40% → and climbing back taught us something important: the right architecture matters more than the right numbers.

We could have kept our original success rates. The code would compile. The tests would pass. But every requirement change would be painful, and the generated code would remain disposable — use once, throw away, regenerate from scratch.

The rebuild cost us months and a perfect scorecard.

What it gave us was stronger schemas, model-agnostic validation loops, and an architecture where the agent can grow with a project instead of starting over every time.

We’re not at 100% across all models yet. But the gap is small, the trajectory is clear, and every fix we make to our schemas and validation logic closes it for every model at once.

That’s the power of building on types instead of prompts.

Sometimes you have to break what works to build what’s actually useful.

In the next article, we’ll break down exactly how validation feedback turns a 6.75% raw success rate into 100% — how to design function calling schemas for structures as complex as a compiler’s AST with 30+ node types, and how to build the feedback loops that make even weak models self-correct.

We’ll make it practical enough that you can apply it to your own AI agents.

About AutoBe: AutoBe is an open-source AI agent developed by Wrtn Technologies that generates production-ready backend applications from natural language.

Through strict type schemas, compiler-driven validation, and modular code generation, we’re pushing compilation success toward 100% across all models — while producing maintainable, production-ready code.

https://github.com/wrtnlabs/autobe