Ujorm3: A New Lightweight ORM for JavaBeans and Records

“Do the simplest thing that could possibly work.”
— Kent Beck, creator of Extreme Programming and pioneer of Test-Driven Development.

I believe the Java language architects didn’t exactly hit the mark when designing the API for the original JDBC library for database operations. As a result, a significant number of various libraries and frameworks have emerged in the Java ecosystem, differing in their approach, level of complexity, and quality. I would like to introduce you to a brand new lightweight ORM library, Ujorm3, which I believe beats its competitors with its simplicity, transparent behavior, and low overhead. The goal of this project is to offer a reliable, safe, efficient, and easy-to-understand tool for working with relational databases without hidden magic and complex abstractions that often complicate both debugging and performance. The first release candidate (RC1) is now available in the Maven Central Repository, released under the free Apache License 2.0.

The library builds on the familiar principles of JDBC but adds a thin layer of a user-friendly API on top of them. It works with clean, stateless objects and native SQL, so the developer has full control over what is actually executed in the database. Ujorm3 deliberately avoids implementing SQL dialects and instead uses native SQL complemented by type-safe tools for mapping database results to Java objects. It does not cache the results of any user queries. To achieve maximum speed, however, Ujorm3 retains certain metadata.

Application API Classes

The core class for database operations is SqlQuery (originally named SqlParamBuilder), which acts as a facade over PreparedStatement. The object supports named parameters for SQL statements, eliminates checked exceptions, and provides the result of a SELECT operation as an efficient Stream<ResultSet>. The mapping of data from a ResultSet to domain objects is then handled by a separate class called ResultSetMapper<DOMAIN>. Its instance prepares the mapping model upon first use and subsequently reuses it, which significantly reduces the overhead when processing a large volume of queries.

Mapping class attributes to database columns can be specified using annotations from the jakarta.persistence package (@Table, @Column, @Id), but the library can infer some properties even without them. Both mutable JavaBeans and immutable Records are fully supported. Ujorm3 only works with M:1 relations—1:M collections are intentionally omitted to prevent the generation of hidden queries and N+1 problems. Relational attributes of a SELECT statement can be mapped using column labels in the format "city.name", or preferably using a type-safe metamodel.

Automatically generated Meta* classes enable safe column mapping without the use of typo-prone text strings. The use of a SELECT statement can then look like this, for example:

static final ResultSetMapper<Employee> EMPLOYEE_MAPPER =
        ResultSetMapper.of(Employee.class);

void select() {
    var sql = """
            SELECT ${COLUMNS}
            FROM employee e
            JOIN city c ON c.id = e.city_id
            LEFT JOIN employee b ON b.id = e.boss_id
            WHERE e.id > :employeeId
            """;

    var employees = SqlQuery.run(connection(), query -> query
            .sql(sql)
            .column("e.id", MetaEmployee.id)
            .column("e.name", MetaEmployee.name)
            .column("c.name", MetaEmployee.city, MetaCity.name)
            .column("c.country_code", MetaEmployee.city, MetaCity.countryCode)
            .column("b.name", MetaEmployee.boss, MetaEmployee.name)
            .bind("employeeId", 0L)
            .streamMap(EMPLOYEE_MAPPER.mapper())
            .toList());
}

Please note that the domain class does not need to be registered anywhere in advance. For efficient work, however, I recommend creating a static mapper, whose implementation is prepared for multithreaded access. The column() method adds a database column with a label to the SQL template at the position of the ${COLUMNS} placeholder. An alternative label() method is also supported, allowing you to explicitly declare only column labels, thereby keeping the SQL query in the Java code closer to its native notation. However, these two approaches cannot be combined in a single query.

The EntityManager is used for working with entities, providing simple CRUD operations—including batch commands—through a Crud object. An interesting feature is the possibility of partial updates—the developer can specify an enumeration of columns to be updated, or pass the original object to the library, from which it will infer the changes itself. The mentioned classes are illustrated in a simplified class diagram. All listed methods are public:

Class diagram

Performance

Ujorm3 achieves very good results in benchmark tests, where it is compared with some popular ORM libraries. The mechanism of writing values to domain objects also contributes to the good score. Instead of the traditional approach using Java reflection, the library generates and compiles its own classes at runtime. Such an approach generally reduces memory requirements, minimizes overhead, and saves work for the Garbage Collector. The library has no dependencies on external libraries, and the compiled benchmark module (including the Ujorm3 library itself) is less than 3 MB, which is advantageous for microservices and embedded environments. However, it is good to keep in mind that in a production environment, in conjunction with slower databases, the differences in performance may partially blur.

Getting Started

To try the library in your Java 17+ project, simply add the dependency to your Maven configuration:

<dependency>
    <groupId>org.ujorm</groupId>
    <artifactId>ujorm-orm</artifactId>
    <version>3.0.0-RC1</version>
</dependency>

To automatically generate metamodel classes, add the optional APT configuration to the build element:

<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.14.1</version>
        <configuration>
            <annotationProcessorPaths>
                <path>
                    <groupId>org.ujorm</groupId>
                    <artifactId>ujorm-meta-processor</artifactId>
                    <version>3.0.0-RC1</version>
                </path>
            </annotationProcessorPaths>
        </configuration>
    </plugin>
</plugins>

The Ujorm module from the Benchmark project can be used as a template for a sample implementation. The library’s codebase is currently covered by JUnit tests that utilize an in-memory H2 database (in addition to mocked objects). Before releasing the final version, I plan to add integration tests for PostgreSQL, MySQL, Oracle, and MS SQL Server databases.

When to Choose the Ujorm3 Library?

If you are working for a corporate client expecting standards or portability of abstractions between databases, use JPA/Hibernate instead. If you have already found an ORM framework that meets your expectations and needs, stick with it. However, if you are looking for a fast and transparent alternative without hidden mechanisms for your new project, the Ujorm3 library is definitely worth a try.

Useful Links:

  • Project Homepage
  • More Examples as a JUnit Test
  • Benchmark Tests

Reduce errores y costos de tokens en agentes con seleccion semantica de herramientas

Cuando los agentes de IA tienen muchas herramientas similares, a menudo seleccionan la incorrecta y consumen tokens excesivos al procesar todas las descripciones de herramientas.

Este articulo demuestra como la seleccion semantica de herramientas filtra tools antes del procesamiento del agente, mejorando la precision y reduciendo costos de tokens. La demostracion utiliza Strands Agents y FAISS para filtrar 29 herramientas y quedarse con las 3 mas relevantes.

Esta demo utiliza Strands Agents. Patrones similares se pueden aplicar en LangGraph, AutoGen u otros frameworks de agentes.

Vision general de la serie

Esta es la Parte 2 de una serie sobre como detener las hallucinations en agentes de IA:

  1. RAG vs Graph-RAG — Knowledge graphs para prevenir hallucinations
  2. Seleccion semantica de herramientas (este articulo) — Filtrado de tools basado en vectores
  3. Validacion multi-agente — Deteccion de hallucinations basada en equipos
  4. AI Agent Guardrails — Aplicacion de razonamiento simbolico
  5. Runtime Guardrails — Controles autocorrectivos

Configuracion

git clone https://github.com/aws-samples/sample-why-agents-fail
cd stop-ai-agent-hallucinations/02-semantic-tools-demo
uv venv && uv pip install -r requirements.txt

El problema dual: errores + desperdicio de tokens

Escenario del problema

Un agente de viajes con 29 herramientas similares (search_hotels, search_flights, search_hotel_reviews, etc.) recibe la consulta: “How much does Hotel Marriott cost?”

El agente puede seleccionar get_hotel_details() en lugar de get_hotel_pricing() — una seleccion incorrecta de tool. Este error consume 4,500 tokens procesando las descripciones de las 29 herramientas.

Causas raiz

  • Nombres de herramientas similares causan confusion
  • Las herramientas genericas se usan en exceso
  • Mas herramientas aumentan la probabilidad de hallucination

Contexto de investigacion

“Las hallucinations en la seleccion de herramientas aumentan con la cantidad de tools. Sistemas en produccion reportan una reduccion del 89% en tokens con seleccion semantica de herramientas.”

La solucion: seleccion semantica de herramientas con FAISS

Modos de falla de agentes a escala

La investigacion identifica cinco modos criticos de falla:

  1. Errores de seleccion de funcion — Llamar a tools inexistentes
  2. Errores de parametros — Argumentos mal formados
  3. Errores de completitud — Parametros requeridos faltantes
  4. Comportamiento de bypass de tools — Generar respuestas en lugar de llamar herramientas
  5. Desbordamiento de contexto — Desperdicio de tokens al procesar todas las descripciones

Analisis costo-beneficio

  • Cada llamada al LLM envia las descripciones de las 29 herramientas
  • En un flujo de 50 pasos: 29 tools x 50 llamadas
  • Genera un desperdicio significativo de tokens y retrasos en el procesamiento

La investigacion muestra hasta un 86.4% de precision previniendo hallucinations en la seleccion de herramientas en sistemas de produccion.

Demostracion: tres enfoques

Prueba 1: Enfoque tradicional (las 29 herramientas)

from strands import Agent
# Using OpenAI-compatible interface via Strands SDK (not direct OpenAI usage)
from strands.models.openai import OpenAIModel
from enhanced_tools import ALL_TOOLS

for query, expected in TESTS:
    agent = Agent(tools=ALL_TOOLS, system_prompt=PROMPT, model=MODEL)
    tools, tokens = run_and_capture_with_tokens(agent, query)

Resultado: ~1,557 tokens promedio por consulta con precision variable.

Prueba 2: Enfoque semantico (top-3 herramientas filtradas)

El agente recibe unicamente las 3 herramientas mas relevantes por consulta.

Construccion del indice FAISS

import faiss
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def build_index(tools):
    """Build FAISS index from tool docstrings"""
    texts = [f"{t.__name__}: {t.__doc__}" for t in tools]
    embeddings = model.encode(texts)

    index = faiss.IndexFlatL2(embeddings.shape[1])
    index.add(embeddings.astype('float32'))
    return index

def search_tools(query: str, top_k: int = 3):
    """Find most relevant tools using FAISS"""
    emb = model.encode([query])
    _, indices = index.search(emb.astype('float32'), top_k)
    return [tools[i] for i in indices[0]]

El modelo all-MiniLM-L6-v2 es ligero (22M parametros, 384 dimensiones), optimizado para similitud semantica y funciona de manera eficiente en CPUs.

Filtrado en tiempo de ejecucion

from strands import Agent
# Using OpenAI-compatible interface via Strands SDK (not direct OpenAI usage)
from strands.models.openai import OpenAIModel

for query, expected in TESTS:
    selected = search_tools(query, top_k=3)
    selected_names = [t.__name__ for t in selected]

    agent = Agent(tools=selected, system_prompt=PROMPT, model=MODEL)
    tools, tokens = run_and_capture_with_tokens(agent, query)

Resultado: ~275 tokens promedio por consulta — una reduccion desde 1,557 tokens.

Dato clave: el agente nunca procesa las otras 26 herramientas. Estas tools permanecen en el sistema pero nunca entran en la ventana de contexto del agente.

Prueba 3: Semantico + memoria (agente unico)

Para conversaciones multi-turno en produccion, manteniendo la memoria mientras se intercambian herramientas dinamicamente.

Intercambio dinamico de herramientas

def swap_tools(agent, new_tools):
    """Swap tools in a live agent without losing conversation memory."""
    reg = agent.tool_registry
    reg.registry.clear()
    reg.dynamic_tools.clear()
    for t in new_tools:
        reg.register_tool(t)

Implementacion completa

from strands import Agent
# Using OpenAI-compatible interface via Strands SDK (not direct OpenAI usage)
from strands.models.openai import OpenAIModel
from enhanced_tools import ALL_TOOLS
from registry import build_index, search_tools, swap_tools

initial_tools = search_tools(TESTS[0][0], top_k=3)
memory_agent = Agent(tools=initial_tools, system_prompt=PROMPT, model=MODEL)

for query, expected in TESTS:
    selected = search_tools(query, top_k=3)
    swap_tools(memory_agent, selected)
    tools, tokens = run_and_capture_with_tokens(memory_agent, query)

Resultado: Mas tokens que el enfoque solo semantico (debido al contexto acumulado), pero significativamente menos comparado con el metodo tradicional, manteniendo el historial completo de la conversacion.

Por que funciona

Strands llama a tool_registry.get_all_tools_config() en cada ciclo del event loop, detectando automaticamente los cambios en tiempo de ejecucion.

Ventajas:

  • Cero perdida de conversacion: agent.messages se preserva entre intercambios de tools
  • No requiere recrear el agente
  • Flexibilidad en tiempo de ejecucion para necesidades dinamicas de herramientas
  • Listo para produccion en conversaciones largas

Contexto de investigacion y rendimiento en el mundo real

La demo controlada logro precision perfecta en 13 consultas. Sin embargo, los sistemas de produccion reales con cientos de herramientas y consultas ambiguas muestran resultados diferentes.

“La investigacion muestra que en sistemas de produccion con cientos de herramientas, la seleccion semantica de tools alcanza hasta un 86.4% de precision en la deteccion y prevencion de hallucinations en la seleccion de herramientas.”

Esto reduce significativamente los errores comparado con los enfoques tradicionales (que caen por debajo del 50% de precision con mas de 100 tools), pero sigue siendo un desafio en dominios con semantica de herramientas superpuesta.

Siguiente paso

La seleccion semantica de herramientas reduce los errores de seleccion de tools y los costos de tokens. Sin embargo, los agentes aun pueden generar hallucinations sobre el exito de operaciones (por ejemplo, confirmar reservaciones sin procesar pagos o ignorar reglas de negocio).

Parte 3: Validacion multi-agente demuestra como equipos especializados de agentes (Executor -> Validator -> Critic) detectan hallucinations antes de que lleguen a los usuarios.

Esta capacidad esta disponible de forma nativa a traves de Amazon Bedrock Agentcore Gateway.

Puntos clave

  1. Problema dual: Errores en la seleccion de herramientas Y desperdicio de tokens
  2. Reduccion significativa de errores: Menos herramientas = menos selecciones incorrectas
  3. 89% de ahorro en tokens: Reduciendo de 29 a 3 herramientas por llamada
  4. Implementacion sencilla: ~20 lineas de codigo con FAISS
  5. Integracion con Strands: El decorador @tool y la carga dinamica permiten el filtrado semantico con intercambio de herramientas en tiempo de ejecucion

Ejecutalo tu mismo

git clone https://github.com/aws-samples/sample-why-agents-fail
cd stop-ai-agent-hallucinations/02-semantic-tools-demo
uv venv && uv pip install -r requirements.txt
uv run test_semantic_tool_selection.py

Puedes cambiar a cualquier proveedor soportado por Strands — consulta Strands Model Providers para la configuracion.

Referencias

Investigacion

  • Internal Representations as Indicators of Hallucinations
  • Solving Context Window Overflow — Reduccion de tokens 7x
  • Semantic Tool Selection in Practice — Reduccion del 89%

Strands Agents

  • Strands Agents Documentation
  • Strands Meta-Tooling
  • Strands Model Providers

Codigo

  • Repositorio de codigo

Gracias!

🇻🇪🇨🇱 Dev.to Linkedin GitHub Twitter Instagram Youtube

We Gave Our Coding Tutor a Thinking Style Detector — This Is What We Found

By Aman | Team Clarion | Code Mentor AI

“Six of us. Same loop problem. Same wrong answer. And the platform LeetCode, Scrimba, take your pick gave us all the same hint. The issue wasn’t just that these platforms forgot our mistakes. It was that they never tried to understand how we think. Code Mentor AI started as our attempt to fix that. My job was building the AI that actually models how a student approaches a problem not just what they got wrong.”

Beyond Mistake Logging: Understanding Thinking Patterns
Most AI tutoring features focus on outputs wrong answers, failed tests, error messages. I worked on something earlier in the pipeline: the Thinking Pattern Report, which analyses how a student approaches a problem based on their behaviour in the editor.

Using Monaco editor event listeners, we capture keydown/keyup events, detect pauses longer than 3 seconds, and track deletion-to-keystroke ratios. These raw events get batched and sent to a FastAPI endpoint every few seconds, then aggregated into features that describe thinking style not just correctness.

The NLP Pipeline
I designed and built the full NLP pipeline that runs across Code Mentor AI’s AI features. Every mistake classification, every Socratic hint, every skill gap summary runs through Groq’s API. The choice of Groq was deliberate latency matters in a live coding environment, and Groq’s inference speed made the difference between hints that feel responsive and hints that feel like the system is thinking too hard.

The 120-token cap on hints was a deliberate choice. Longer responses became mini-lectures. Short, pointed questions kept students in problem-solving mode.

Dynamic AI Models
One of the more interesting engineering decisions was making the AI model selection dynamic. Different features have different latency and quality requirements the Socratic hint needs to be fast and sharp, the Skill Gap Summary can be slower and more thorough. We built a model routing layer that selects the right Groq model variant per feature based on these requirements.

The Thinking Pattern That Surprised Us
When we ran the Thinking Pattern Report on simulated sessions, the most common pattern wasn’t ‘trial-and-error’ as we expected. It was what we called ‘hesitant-systematic’ students who paused long before writing anything, wrote carefully, but then deleted large chunks when they hit an error. That pattern correlated strongly with logic errors, not syntax errors. We hadn’t expected a thinking style to map so cleanly to a mistake type.

The Lesson
Groq’s speed changed what was possible. We had initially planned to run hint generation asynchronously — show the hint after a delay. With Groq, we could make it feel synchronous. Never underestimate how much latency shapes user behaviour in a learning product.

The Hindsight agent memory layer is what made past mistake context available to every Groq call. Without Hindsight, each LLM call would be stateless. With it, every call is grounded in the student’s real history.

Team Clarion
• Aanchal & Pranati — Backend architecture & database
• Lakshay — Full frontend integration with Next.js
• Aman, Kinjal & Priyanshu — Dynamic AI models & AI assistance features

How to Generate an Audit Trail for AI Agent Actions (With Visual Proof)

How to Generate an Audit Trail for AI Agent Actions (With Visual Proof)

You’ve deployed an AI agent to handle customer refunds. It works perfectly in testing.

But your compliance officer asks: “How do we prove what the agent actually did in the browser?”

You show them text logs from LangSmith or Langfuse. They’re not satisfied.

Text logs tell you what the agent claimed to do. Visual proof shows what it actually did.

This is the gap between logs and compliance.

The Problem: Text Logs Aren’t Audit Proof

Observability platforms (LangSmith, Langfuse, OpenTelemetry) capture:

  • Agent decisions
  • Tool calls and responses
  • Token usage
  • Latency metrics

But they don’t capture what the agent actually saw or clicked.

Example: Your agent logs say “clicked refund button.” But did it? What was on screen? Did the page load correctly?

For compliance (HIPAA, SOC 2, PCI-DSS, EU AI Act), you need visual evidence.

The Solution: Screenshot After Each Agent Step

Add a screenshot after every agent action:

import anthropic
from pathlib import Path
from datetime import datetime

client = anthropic.Anthropic()

def agent_with_visual_proof(task: str):
    """Run agent and capture screenshot proof after each step."""

    audit_trail = {
        "task": task,
        "timestamp": datetime.now().isoformat(),
        "steps": []
    }

    # Define tools with screenshot capture
    tools = [
        {
            "name": "take_screenshot",
            "description": "Capture current page state for audit trail",
            "input_schema": {
                "type": "object",
                "properties": {
                    "url": {"type": "string", "description": "URL to screenshot"},
                    "reason": {"type": "string", "description": "Why this screenshot matters"}
                },
                "required": ["url", "reason"]
            }
        },
        {
            "name": "process_refund",
            "description": "Process customer refund",
            "input_schema": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string"},
                    "amount": {"type": "number"}
                },
                "required": ["order_id", "amount"]
            }
        }
    ]

    messages = [
        {
            "role": "user",
            "content": task
        }
    ]

    step_count = 0

    while True:
        response = client.messages.create(
            model="claude-opus-4-5-20251101",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # Check if agent is done
        if response.stop_reason == "end_turn":
            break

        # Process tool calls
        if response.stop_reason == "tool_use":
            step_count += 1

            for content_block in response.content:
                if content_block.type == "tool_use":
                    tool_name = content_block.name
                    tool_input = content_block.input

                    print(f"Step {step_count}: {tool_name}")
                    print(f"  Input: {tool_input}")

                    # Capture screenshot for audit trail
                    if tool_name == "take_screenshot":
                        screenshot_result = capture_screenshot(
                            tool_input["url"],
                            f"step-{step_count}",
                            tool_input["reason"]
                        )
                        tool_result = screenshot_result

                    elif tool_name == "process_refund":
                        # Process refund and capture proof
                        refund_result = {
                            "status": "approved",
                            "order_id": tool_input["order_id"],
                            "amount": tool_input["amount"],
                            "reference": f"REF-{step_count}-{tool_input['order_id']}"
                        }

                        # Screenshot after refund
                        screenshot_path = capture_screenshot(
                            "https://app.example.com/refunds",
                            f"step-{step_count}-refund-proof",
                            f"Refund {refund_result['reference']} processed"
                        )

                        refund_result["proof_screenshot"] = screenshot_path
                        tool_result = refund_result

                    # Record in audit trail
                    audit_trail["steps"].append({
                        "step": step_count,
                        "action": tool_name,
                        "input": tool_input,
                        "result": tool_result,
                        "timestamp": datetime.now().isoformat()
                    })

                    # Add tool result to conversation
                    messages.append({
                        "role": "assistant",
                        "content": response.content
                    })

                    messages.append({
                        "role": "user",
                        "content": [
                            {
                                "type": "tool_result",
                                "tool_use_id": content_block.id,
                                "content": str(tool_result)
                            }
                        ]
                    })

    return audit_trail

def capture_screenshot(url: str, step_id: str, reason: str) -> dict:
    """Capture screenshot via PageBolt API."""
    import requests

    response = requests.post(
        "https://api.pagebolt.dev/v1/screenshot",
        headers={
            "Authorization": f"Bearer {os.getenv('PAGEBOLT_API_KEY')}",
            "Content-Type": "application/json"
        },
        json={
            "url": url,
            "format": "png",
            "width": 1280,
            "height": 720,
            "fullPage": True,
            "blockBanners": True
        }
    )

    if response.status_code != 200:
        return {"error": f"Screenshot failed: {response.status_code}"}

    # Save screenshot
    filename = f"audit-trail/{step_id}-{datetime.now().timestamp()}.png"
    Path(filename).parent.mkdir(parents=True, exist_ok=True)

    with open(filename, "wb") as f:
        f.write(response.content)

    return {
        "screenshot_path": filename,
        "reason": reason,
        "url": url
    }

# Run agent with visual audit trail
if __name__ == "__main__":
    import os

    audit = agent_with_visual_proof(
        "Process refund for order ORDER-12345 with amount $50"
    )

    # Save audit trail as JSON with screenshot references
    import json
    with open("audit-trail.json", "w") as f:
        json.dump(audit, f, indent=2)

    print(f"Audit trail saved with {len(audit['steps'])} steps")
    for step in audit["steps"]:
        print(f"  Step {step['step']}: {step['action']}{step.get('result', {}).get('screenshot_path', 'N/A')}")

Real Use Case: Autonomous Customer Service Refund

Customer initiates refund request. Agent:

  1. Screenshot initial state — customer data page
  2. Retrieve order details — agent calls order API
  3. Screenshot order confirmation — verify customer info
  4. Process refund — submit refund form
  5. Screenshot refund confirmation — proof of success
  6. Send customer notification — email with refund ID

Each step has:

  • Tool call with input
  • Result (API response or form submission)
  • Screenshot evidence of what happened on screen

This creates a complete visual audit trail for compliance audits.

Compliance Frameworks: What They Require

Framework Requirement Solution
HIPAA Audit logs with evidence Screenshots of patient data access
SOC 2 Detailed access logs Before/after screenshots of changes
PCI-DSS Transaction proof Screenshots of payment processing
EU AI Act Decision transparency Screenshots of agent actions/reasoning
GDPR Data handling proof Screenshots of data deletion/handling

Visual proof satisfies all of them.

Architecture: Visual Audit Trail System

┌─────────────────────────────────────────────────────────┐
│ AI Agent (Claude)                                       │
│ ├─ Observability (LangSmith/Langfuse)                 │
│ ├─ Text logs: "clicked refund button"                 │
│ └─ Visual proof: screenshot after each step            │
└──────────┬──────────────────────────────────────────────┘
           │
           ├─ Store logs in observability platform
           │
           └─ Capture screenshots via PageBolt
              ├─ Screenshot after tool calls
              ├─ Screenshot after form submissions
              └─ Screenshot after navigation
                 │
                 ▼
           ┌──────────────────────────┐
           │ Audit Trail Storage      │
           ├─ step-1.png             │
           ├─ step-2.png             │
           ├─ step-3.png             │
           └─ audit-trail.json       │
                 │
                 ▼
           ┌──────────────────────────┐
           │ Compliance Report        │
           │ - Text logs              │
           │ - Screenshots            │
           │ - Timeline               │
           │ - Decision points        │
           └──────────────────────────┘

Generating Compliance Reports

def generate_audit_report(audit_trail: dict) -> str:
    """Generate HTML report with screenshots for auditors."""

    html = f"""
    <html>
    <head><title>AI Agent Audit Trail</title></head>
    <body>
        <h1>Audit Trail Report</h1>
        <p><strong>Task:</strong> {audit_trail['task']}</p>
        <p><strong>Timestamp:</strong> {audit_trail['timestamp']}</p>

        <h2>Agent Actions</h2>
    """

    for step in audit_trail["steps"]:
        html += f"""
        <div style="border: 1px solid #ccc; margin: 20px 0; padding: 10px;">
            <h3>Step {step['step']}: {step['action']}</h3>
            <p><strong>Input:</strong> {step['input']}</p>
            <p><strong>Result:</strong> {step['result']}</p>
            <p><strong>Time:</strong> {step['timestamp']}</p>
        """

        if isinstance(step['result'], dict) and 'screenshot_path' in step['result']:
            html += f"""
            <h4>Visual Proof</h4>
            <img src="{step['result']['screenshot_path']}" style="max-width: 100%; border: 1px solid #ddd;">
            """

        html += "</div>"

    html += "</body></html>"
    return html

# Generate report
report_html = generate_audit_report(audit)
with open("audit-report.html", "w") as f:
    f.write(report_html)

print("Audit report generated: audit-report.html")

Pricing

Plan Requests/Month Cost Best For
Free 100 $0 Testing, low-volume agents
Starter 5,000 $29 10–50 agent runs/month
Growth 25,000 $79 100–500 agent runs/month
Scale 100,000 $199 1000+ agent runs/month

At 5 screenshots per agent run, Starter covers 1,000 agent executions.

Summary

  • ✅ Text logs from LangSmith/Langfuse document agent decisions
  • ✅ Screenshots from PageBolt document agent actions
  • ✅ Together they create compliance-ready audit trails
  • ✅ Visual proof satisfies HIPAA, SOC 2, PCI-DSS, EU AI Act
  • ✅ Generate HTML reports with embedded screenshots
  • ✅ Store alongside observability logs for complete evidence

Get started free: pagebolt.dev — 100 requests/month, no credit card required.

IP address and Subnet

What does an IP address mean?
Every device that is connected to a computer network has a unique number called an IP address (Internet Protocol address). An IP address is like a home address in that it tells the internet exactly where to send data, just like a postal address tells the delivery person exactly where to drop off your package.
There are two versions that people use today:

IPv4 is the older and most common format. Written as four groups of numbers with dots between them, like 192.168.1.1. There are about 4.3 billion possible addresses because each number can be between 0 and 255.
IPv6 is the newer format that was made to fix the problem of IPv4 running out. Written in hexadecimal, like this: 2001:0db8:85a3:0000:0000:8a2e:0370:7334. It can handle an almost infinite number of addresses.
Public IP Address vs. Private IP Address
IP Address
Given by the Internet Service Provider (ISP)
Used online
One of a kind around the world
IP for private use
Used in networks that are close by
Not directly reachable from the web

Private IP ranges:

10.0.0.0 to 10.255.255.255
172.16.0.0 to 172.31.255.255
192.168.0.0 to 192.168.255.255
The Parts of an IPv4 Address
There are 32 bits in an IPv4 address, and they are grouped into four octets, each with 8 bits.

  1. 168. 1. 10
    11000000 10101000 00000001 00001010
    There are two parts to every IP address:

Network part: This tells you which network the device is on.
Host part: Identifies the exact device on that network
What is a Subnet?

A Subnet (Subnetwork) is a smaller division of a large network.

Instead of having one large network with thousands of devices, we divide it into smaller networks

The Mask for the Subnet
A subnet mask is a 32-bit number that tells devices which part of an IP address belongs to the network and which part belongs to the host.
For example:
192.168.1.10 is the IP address.
255.255.255.0 is the subnet mask.
In binary:
IP: 11000000.10101000.00000001.00001010
Mask:11111111.11111111.11111111.00000000
The 1s in the mask are the network part.
The mask’s 0s stand for the host part.
The first three octets (192.168.1) tell you which network you’re on, and the last octet (.10) tells you which host you’re on.

CIDR Notation
CIDR (Classless Inter-Domain Routing) notation is a way to write the subnet mask in a short way by counting the number of 1-bits.
192.168.1.0/24 → 255.255.255.0 (the mask has 24 ones)
10.0.0.0/8 → 255.0.0.0 (the mask has 8 ones in it)
172.16.0.0/16 → 255.255.0.0 (the mask has 16 ones in it)

Note: Two addresses in every subnet are always reserved — one for the network address and one for the broadcast address — which is why usable hosts = total addresses minus 2.

Network Address — First address of a subnet (not assignable to a host)
Broadcast Address — Last address of a subnet (sends data to all hosts in the subnet)

OpenAI Acquires Astral: What It Means for PyCharm Users

On March 19, OpenAI announced that it would acquire Astral, the company behind uv, Ruff, and ty. The Astral team, led by founder Charlie Marsh, will join OpenAI’s Codex team. The deal is subject to regulatory approval.

First and foremost: congratulations to Charlie Marsh and the entire Astral team. They shipped some of the most beloved tools in the Python ecosystem and raised the bar for what developer tooling can be. This acquisition is a reflection of the impact they’ve had.

This is big news for the Python ecosystem, and it matters to us at JetBrains. Here’s our perspective.

What Astral built

In just two years, Astral transformed Python tooling. Their tools now see hundreds of millions of downloads every month, and for good reason:

  • uv is a blazing-fast package and environment manager that unifies functionality from pip, venv, pyenv, pipx, and more into a single tool. With around 124 million monthly downloads, it has quickly become the default choice for many Python developers.
  • Ruff is an extremely fast linter and formatter, written in Rust. For many teams it has replaced flake8, isort, and black entirely.
  • ty is a new type checker for Python. It’s still early, and we’re already working on it with PyCharm. It’s showing promise.

This is foundational infrastructure that millions of developers rely on every day. We’ve integrated both Ruff and uv into PyCharm because they substantially make Python development better.

The risks are real, but manageable

Change always carries risk, and acquisitions are no exception. The main concern here is straightforward: if Astral’s engineers get reassigned to OpenAI’s more commercial priorities, these tools could stagnate over time.

The good news is that Astral’s tools are open-source under permissive licenses. The community can fork them if it ever comes to that. As Armin Ronacher has noted, uv is “very forkable and maintainable.” There’s no possible future where these tools go backwards.

Both OpenAI and Astral have committed to continued open-source development. We take them at their word, and we hope for the best.

Our commitment hasn’t changed

JetBrains already has great working relationships with both the Astral and the Codex teams. We’ve been integrating Ruff and uv into PyCharm, and we will continue to do so. We’ve submitted some upstream improvements to ty. Regardless of who owns these tools, our commitment to supporting the best Python tooling for our users stays the same. We’ll keep working with whoever maintains them.

The Python ecosystem is stronger because of the work Astral has done. We hope this acquisition amplifies that work, not diminishes it. We’ll be watching closely, and we’ll keep building the best possible experience for Python developers in PyCharm.

The New Role of Data Teams in the Agentic Analytics Era

Last week, in the first part of this series, we explained why two analysts can produce two different answers from the same data, and why that problem gets worse with AI agents. Without a shared semantic layer defining metrics and business logic, AI will generate answers faster, but not more reliably. Today we’ll focus on the practical consequences of this shift.

The data analyst role is changing fast. Data analytics as a discipline won’t disappear, but the center of gravity is shifting.

In the dashboards era, being a good data analyst meant writing queries, building charts, and pulling numbers quickly. In modern times, excelling in the field means defining metrics clearly, building semantic contracts, setting governance and versioning standards, designing guardrails, and controlling a system that delivers reliable and repeatable results.

You won’t be writing the story anymore – you’ll be defining the rules of the universe in which the story takes place. And by doing so, you’ll stop paying the trust tax.

This isn’t about buying a smarter model. It’s about building a foundation strong enough that even a weak model can’t produce a weak outcome.

Because here’s the truth: An AI system is only as trustworthy as the meaning you give it. 

The 2026 must-have agentic analytics stack (if you want to keep your sanity)

To ensure AI-driven analytics are reliable, you need three foundational elements in place:

Metrics as code

Your metric definitions can’t live in someone’s head, a screenshot, or a generic dashboard that everyone uses.

They need to be standardized in code, in a system designed to define and enforce metrics consistently. Examples include dbt’s or Cube’s semantic layer approaches, LookML-style modeling, and similar patterns. The point is the methodology, not the vendor. The business definition must be executable.

Git-based everything

If you can’t answer questions like “When did we see a change in revenue?”, “Who’s responsible for this change?”, and “What else did it affect?”, then this isn’t a system you can trust – it’s guesswork.

Put metric definitions in Git, and require every change to go through pull requests and reviews. Yes, it may feel bureaucratic and tedious – until the day it saves you from presenting the wrong numbers to the board.

Hard guardrails

Agents need boundaries. They require real guardrails, not vague instructions like “Please follow the rules.”

Only use approved metrics and joins. If a metric doesn’t exist, don’t invent it! Escalate it to the data team for review. That’s what guardrails are for. They force LLM systems to operate within defined constraints, not improvise.

The next model is small teams of agents, not one big chatbot

A pattern is emerging in systems that actually work. They are building not a single agent that does everything, but rather a small set of agents, each with a clear role, checking one another. Examples of such agents include:

The discovery agent
(Or: “Wait – what do you mean by revenue?”)

This agent speaks with the business user and clarifies intent before touching the data. It asks the questions humans often forget to ask:

“Booked or recognized?”
“Gross or net?”
“Include refunds?”
“Which currency?”
“Which region?”
“Which date field?”
“Month-to-date or full month?”

The semantic layer authoring agent 
(Or: “Find the official meaning.”)

This agent consults the semantic layer and maps the request to approved metrics.

If the metric exists, it selects it. If it doesn’t, it proposes a new metric based on available data and metadata. It never silently invents it. It produces a diff to be reviewed by a human.

The auditor agent 
(Or: “Try to break it.”)

This agent acts as an independent reviewer. It inspects the generated query or metric usage and looks for missing filters, incorrect joins, double counting, time zone errors, or mismatches between the requested and delivered meaning.

In other words, it reads adversarially. This alone can prevent a surprising number of “looks right” failures.

The human-in-the-loop
(Human is still the boss)

And then a human signs off – not on every exploratory question, but on anything that becomes a metric or a shared report.

The workflow becomes simple: AI proposes the semantic layer change, and a data team member approves it (or rejects). The system moves faster without losing control.

That’s how you reduce the trust tax without turning your analytics team into full-time babysitters.

Market reality: Text-to-insight is splitting into camps

Today, you can already see the industry dividing into three broad directions:

End-to-end conversational analytics platforms

They aim to do everything: connect data, define meaning, answer questions, and generate insights.

While fast, these systems are risky if you need deep customization or strict governance. And before long, you may find yourself vendor-locked – a risk not to underestimate.

Enterprise BI + AI add-ons

You already have a BI ecosystem, so you bolt AI onto it. This approach works well if your semantic layer is mature. However, it’s painful if your definitions are fragmented.

Headless semantic infrastructure

You build a semantic layer as an engine, then plug in different UIs and different agents. This requires more upfront work, but it gives you control, portability, and the ability to evolve your truth layer without being locked into a single frontend.

If you care about trust at scale, this third path becomes more compelling over time, because it treats meaning as infrastructure, not a feature.

Databao is built on this paradigm, providing the building blocks for modern semantic infrastructure and enabling company-wide self-service analytics.

The big shift to watch: The Open Semantic Interchange (OSI)

A clear signal that the industry has finally acknowledged the problem in semantics is the launch, in September 2025, of the Open Semantic Interchange (OSI) – an initiative led by Snowflake and other industry partners to define a vendor-neutral standard for describing and exchanging semantic models (metrics, dimensions, relationships) across tools.

If OSI succeeds, we’ll move toward a world where your definition of revenue is portable, flowing from BI to agents to warehouses without being rewritten or trapped in a single tool’s private model format.

You can debate standards all day (and people will), but the direction is what’s significant. The meaning layer is becoming a first-class citizen.

This matters because agents aren’t going away, and dashboards are no longer the only consumers of analytics. Agents are consumers too, and they require a shared language even more than humans do.

About Databao

Databao is a new data product from JetBrains that helps data teams create and maintain a shared semantic context and build their own data agents on top of it. Our goal is to provide an AI-native analytics experience that business users can trust, enabling them to query and analyze data in plain language.

Databao’s modular components, the context engine and data agent, can run independently, either locally or within your existing infrastructure, using your own API keys.

We are also inviting data teams to build a proof of concept with us: we’ll explore your use case, define a context-building process, and grant agent access to a selected group of business users. Together, we will then evaluate the quality of responses and the overall value.

TALK TO THE TEAM

KotlinConf’26 Speakers: In Conversation with Josh Long

“There’s never been a better time to be a JVM or Spring developer.”

KotlinConf’26 Speakers: In Conversation with Josh Long

Josh Long, Spring Developer Advocate

Josh Long is the first Spring Developer Advocate, starting in 2010. Josh is a Java Champion, author of 7 books (including Reactive Spring) and numerous bestselling video trainings (including Building Microservices with Spring Boot Livelessons with Spring Boot co-founder Phil Webb), and an open-source contributor (Spring Boot, Spring Integration, Axon, Spring Cloud, Activiti, Vaadin, and others), a YouTuber (Coffee + Software with Josh Long and his Spring Tips series), and a podcaster (A Bootiful Podcast).

The Spring ecosystem has evolved dramatically over the past decade, from traditional enterprise applications to microservices, distributed systems, and now AI-powered services. Few people have witnessed that evolution as closely as Josh Long, who has served as Spring’s first Developer Advocate since 2010.

Ahead of KotlinConf’26, we spoke with Josh about how the Spring community has grown, why Kotlin has become such a natural fit for Spring developers, and why he believes there’s never been a better time to build on the JVM.

Meet Josh Long at KotlinConf’26

Q: You were the first Spring Developer Advocate, starting in 2010. How has the community around Spring changed during that time?

Josh Long: Back then, most of the things people built were basically web applications. Nowadays, there are web services and backend server-side applications, and those applications are expected to do many more things.

So, the use cases that people introduce into their applications have grown. Before, Spring was very narrowly focused on the enterprise server-side world. Today, we talk about microservices, distributed computing systems, batch processing, integration, and all kinds of security.

And now we talk about AI.

These used to be different jobs and different career paths, but today they can all be done with Spring very naturally – quite elegantly in a lot of cases, compared to some of the alternatives.

So, the community has changed accordingly. The kinds of things people are doing have expanded, and the community around that has grown as well.

Put another way, it’s not that the people who were doing things in 2010 stopped doing those things. It’s more that people who were doing other kinds of work joined the community.

You see this represented in open-source projects, in language choices, Kotlin, for example, and across the whole ecosystem. It’s this wonderful open-source diaspora.

The galaxy of things people need to do has grown, and so has the community.

KotlinConf’26 Speakers: In Conversation with Josh Long

Q: As you mentioned, the Spring and Kotlin teams have worked hard to make sure that Kotlin and Spring Boot are a first-class experience. From your perspective, what makes a language truly first-class within a framework ecosystem?

Josh: Spring is a framework built on top of the JVM. Most of Spring itself is written in Java, because Java was the only language people used when we created Spring back in 2001.

But we’ve always tried to excel at integration. We want Spring to be a well-behaved citizen on top of the JVM and the languages that run on it.

If you’re a Java developer, we want Spring to feel natural and idiomatic. Someone who understands Java should look at Spring code and immediately understand what’s going on.

The same is true for all integrations. Spring works with dozens of libraries and technologies, and we want those integrations to feel coherent and consistent.

The same principle applies to languages.

If we support a language, we want Spring to feel natural for people who already use that language. That’s definitely true for Kotlin.

KotlinConf’26 Speakers: In Conversation with Josh Long

For the longest time, our goal was simply to be a good citizen on top of these languages. We didn’t expect the languages to adapt to us.

When the relationship between the Spring and Kotlin teams began developing more than ten years ago, we discovered that they were incredibly pragmatic and collaborative. They genuinely wanted Kotlin to work well for Spring developers.

That partnership has been a real honor.

One of my favorite examples is the Kotlin all-open plugin.

In Kotlin, classes are final by default. But frameworks like Spring and Hibernate rely on subclassing.

So normally you’d have to declare everything as open. The Kotlin team solved this by creating a compiler plugin. When you use Spring annotations, the classes are implicitly open behind the scenes.

Developers don’t have to change anything – if you go to start.spring.io, it’s already configured.

It’s a thousand small changes like this that make it clear the language wants to make Spring developers feel comfortable. I feel warm, grateful, and happy thinking about this wonderful teamwork.

KotlinConf’26 Speakers: In Conversation with Josh Long

Q: When you’re actually building a Spring application in Kotlin, where does it feel noticeably different from building it in Java?

Josh: Spring has DSLs. These DSLs are about as elegant as they can be in Java, but Kotlin has a much more expressive language for designing DSLs. That’s not a controversial thing to suggest – it’s just empirically true.

The Spring team has embraced Kotlin. We actually have Kotlin code in Spring itself. We’ve written parts of Spring in Kotlin.

There are several DSLs that we provide in Java that also have sister DSLs written in Kotlin, and those Kotlin DSLs are much nicer.

For example, Spring Cloud Gateway, functional HTTP routes, and the new BeanRegistrar API in Spring Framework 7. There are lots of them. Spring Security has one as well. They’re everywhere.

It’s just a really nice, elegant little language.

And we’ve essentially done the work of building DSLs twice – once for Java and then again in Kotlin – because we wanted the Kotlin version to feel nice and idiomatic and natural. It feels really good.

Join us at KotlinConf’26

Q: For Kotlin developers who are new to Spring, what’s one misconception they often have, and what’s one feature that usually wins them over? For those who haven’t tried Kotlin yet but are big fans of Spring, why should they give it a shot?

Josh: I imagine the misconceptions those developers might have are the same ones anyone might have.

If you’re using Spring, you certainly don’t have to use just Java. Spring has always tried to embrace different languages.

Kotlin is by far the best story we’ve had there.

KotlinConf’26 Speakers: In Conversation with Josh Long

People may not realize this, but we had a Spring for Scala project about fifteen years ago. We also tried Groovy. You can still use Groovy today, although I personally never do.

Kotlin is just a really natural fit.

The only language where Spring has actually added that language to the Spring Framework itself, as part of the codebase, is Kotlin.

So, I use Kotlin all the time.

Q: You’ve spent years helping developers navigate new technologies. What excites you most right now about building on the JVM?

Josh: First of all, the languages we have on the JVM today are very competitive.

If you’re building something today, languages like Kotlin are just as concise, small, and efficient as many other modern languages. They’re easy to reason about, but they also come with a lot of additional benefits because they run on the JVM.

The JVM itself is incredibly fast and very scalable. It’s one of the few places where you can have a program that is both very small and very fast.

There used to be a kind of trade-off. If you wanted to write something quickly, you used a scripting language like Python or Perl, at least when I first started working. If you wanted performance, you used something like C++.

But now, with languages like Kotlin running on the JVM, you can have both. You can write programs that are as concise as scripting languages while performing close to native languages.

We live in an amazing time.

There’s a second part to this. Because we have this amazing runtime infrastructure and language ecosystem, people are building incredible tools and frameworks on top of it to support new kinds of applications.

For example, I spend a lot of time talking to people about AI. Spring AI is a really nice way to build AI integrations, agentic systems, and integrations with AI models.

If you had told me 10 or 15 years ago that we would be writing five-line Spring applications in Kotlin that talk to AI models and do interesting things, I would have laughed. That would have sounded impossible. The world is very different now. There has also never been a better time to be a JVM developer or a Spring developer.

KotlinConf’26 Speakers: In Conversation with Josh Long

Q: With rapid growth in AI-driven applications, what does building AI-powered systems on the JVM look like today, and where do Kotlin and Spring play a role?

Josh: I think people are sometimes misguided about AI.

When people talk about AI, they often mix up two very different use cases. 

One involves building and training models. That kind of work often uses tools that don’t really exist on the JVM today.

Building your own models, training them, and doing data science is a very rare and a small use case compared to what 99% of the ecosystem will be doing, which is integrating these models into their business applications. Most of these models are just REST APIs, which means that this is an integration problem.

KotlinConf’26 Speakers: In Conversation with Josh Long

For most applications, leveraging AI is about integrating those models into existing systems, and the JVM has always been extremely good at that.

That’s why enterprises use it – it can talk to anything.

Today, we have Spring AI, which makes it easier to build these integrations. Of course, there are other ecosystems on the JVM that have their own approaches to building AI-based applications.

There are lots of good options.

But the important thing is that the JVM is not just as good as something like Python or TypeScript for building these systems. In many cases, it’s actually much better.

There was a benchmark that came out recently looking at the performance of Model Context Protocol implementations. The JVM came out on top. Spring Boot and Spring AI had the best performance.

They compared implementations in Go, Python, and TypeScript, and the JVM performed the best.

So it’s not just a question of whether you can do this work on the JVM. In many cases, it’s much more performant. We also have better security and stronger integration with existing systems.

It’s a really big opportunity for developers in this ecosystem.

Another thing people often miss is that many AI projects fail because they don’t integrate properly with existing systems.

There was an MIT study that suggested something like 90% of AI integrations fail.

That’s not surprising – many teams build AI workflows as completely separate systems, often in Python, that don’t integrate well with the rest of their infrastructure.

But if you extend the systems where the business logic already lives, which is often on the JVM, things tend to work much better.

If you extend those existing services with tools like Spring AI and Kotlin, you’ll usually have a much better experience.

So it’s not just about being as good as other ecosystems. In many cases, the JVM is simply better for this kind of work.

As Josh notes, many new technologies, including AI, ultimately come down to how well they integrate with the systems developers already use.

Josh will dive deeper into how Kotlin and Spring Boot work together to create a cleaner, more productive developer experience in his KotlinConf’26 talk “Bootiful Kotlin.”

Don’t miss Josh Long at KotlinConf’26!

Join us at KotlinConf’26