LangChain Python Tutorial: 2026’s Complete Guide

LangChain Python Tutorial

If you’ve read the blog post How to Build Chatbots With LangChain, you may want to know more about LangChain. This blog post will dive deeper into what LangChain offers and guide you through a few more real-world use cases. And even if you haven’t read the first post, you might still find the info in this one helpful for building your next AI agent.

LangChain fundamentals

Let’s have a look at what LangChain is. LangChain provides a standard framework for building AI agents powered by LLMs, like the ones offered by OpenAI, Anthropic, Google, etc., and is therefore the easiest way to get started. LangChain supports most of the commonly used LLMs on the market today.

LangChain is a high-level tool built on LangGraph, which provides a low-level framework for orchestrating the agent and runtime and is suitable for more advanced users. Beginners and those who only need a simple agent build are definitely better off with LangChain.

We’ll start by taking a look at several important components in a LangChain agent build.

Agents

Agents are what we are building. They combine LLMs with tools to create systems that can reason about tasks, decide which tools to use for which steps, analyze intermittent results, and work towards solutions iteratively.

Creating an agent is as simple as using the `create_agent` function with a few parameters:

from langchain.agents import create_agent

agent = create_agent(

   "gpt-5",

   tools=tools

)

In this example, the LLM used is GPT-5 by OpenAI. In most cases, the provider of the LLM can be inferred. To see a list of all supported providers, head over here.

LangChain Models: Static and Dynamic

There are two types of agent models that you can build: static and dynamic. Static models, as the name suggests, are straightforward and more common. The agent is configured in advance during creation and remains unchanged during execution.

import os

from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("gpt-5")

print(model.invoke("What is PyCharm?"))

Dynamic models allow you to build an agent that can switch models during runtime based on customized logic. Different models can then be picked based on the current state and context. For example, we can use ModelFallbackMiddleware (described in the Middleware section below) to have a backup model in case the default one fails.

from langchain.agents import create_agent

from langchain.agents.middleware import ModelFallbackMiddleware

agent = create_agent(

   model="gpt-4o",

   tools=[],

   middleware=[

       ModelFallbackMiddleware(

           "gpt-4o-mini",

           "claude-3-5-sonnet-20241022",

       ),

   ],

)

Tools

Tools are important parts of AI agents. They make AI agents effective at carrying out tasks that involve more than just text as output, which is a fundamental difference between an agent and an LLM. Tools allow agents to interact with external systems – such as APIs, databases, or file systems. Without tools, agents would only be able to provide text output, with no way of performing actions or iteratively working their way toward a result.

LangChain provides decorators for systematically creating tools for your agent, making the whole process more organized and easier to maintain. Here are a couple of examples:

Basic tool

@tool

def search_db(query: str, limit: int = 10) -> str:

   """Search the customer database for records matching the query.

   """

...

   return f"Found {limit} results for '{query}'"

Tool with a custom name

@tool("pycharm_docs_search", return_direct=False)

def pycharm_docs_search(q: str) -> str:

   """Search the local FAISS index of JetBrains PyCharm documentation and return relevant passages."""

...

   docs = retriever.get_relevant_documents(q)

   return format_docs(docs)

Middleware

Middleware provides ways to define the logic of your agent and customize its behavior. For example, there is middleware that can monitor the agent during runtime, assist with prompting and selecting tools, or even help with advanced use cases like guardrails, etc.

Here are a few examples of built-in middleware. For the full list, please refer to the LangChain middleware documentation.

Middleware Description
Summarization Automatically summarize the conversation history when approaching token limits.
Human-in-the-loop Pause execution for human approval of tool calls.
Context editing Manage conversation context by trimming or clearing tool uses.
PII detection Detect and handle personally identifiable information (PII).

Real-world LangChain use cases

LangChain use cases cover a varied range of fields, with common instances including: 

  1. AI-powered chatbots
  2. Document question answering systems
  3. Content generation tools

AI-powered chatbots

When we think of AI agents, we often think of chatbots first. If you’ve read the How to Build Chatbots With LangChain blog post, then you’re already up to speed about this use case. If not, I highly recommend checking it out.

Document question answering systems

Another real-world use case for LangChain is a document question answering system. For example, companies often have internal documents and manuals that are rather long and unwieldy. A document question answering system provides a quick way for employees to find the info they need within the documents, without having to manually read through each one.

To demonstrate, we’ll create a script to index the PyCharm documentation. Then we’ll create an AI agent that can answer questions based on the documents we indexed. First let’s take a look at our tool:

@tool("pycharm_docs_search")

def pycharm_docs_search(q: str) -> str:

   """Search the local FAISS index of JetBrains PyCharm documentation and return relevant passages."""

   # Load vector store and create retriever

   embeddings = OpenAIEmbeddings(

       model=settings.openai_embedding_model, api_key=settings.openai_api_key

   )

   vector_store = FAISS.load_local(

       settings.index_dir, embeddings, allow_dangerous_deserialization=True

   )

   k = 4

   retriever = vector_store.as_retriever(

       search_type="mmr", search_kwargs={"k": k, "fetch_k": max(k * 3, 12)}

   )

   docs = retriever.invoke(q)

We are using a vector store to perform a similarity search with embeddings provided by OpenAI. Documents are embedded so the doc search tool can perform similarity searches to fetch the relevant documents when called. 

def main():

   parser = argparse.ArgumentParser(

       description="Ask PyCharm docs via an Agent (FAISS + GPT-5)"

   )

   parser.add_argument("question", type=str, nargs="+", help="Your question")

   parser.add_argument(

       "--k", type=int, default=6, help="Number of documents to retrieve"

   )

   args = parser.parse_args()

   question = " ".join(args.question)

   system_prompt = """You are a helpful assistant that answers questions about JetBrains PyCharm using the provided tools.

   Always consult the 'pycharm_docs_search' tool to find relevant documentation before answering.

   Cite sources by including the 'Source:' lines from the tool output when useful. If information isn't found, say you don't know."""

   agent = create_agent(

       model=settings.openai_chat_model,

       tools=[pycharm_docs_search],

       system_prompt=system_prompt,

       response_format=ToolStrategy(ResponseFormat),

   )

   result = agent.invoke({"messages": [{"role": "user", "content": question}]})

   print(result["structured_response"].content)

 

System prompts are provided to the LLM together with the user’s input prompt. We are using OpenAI as the LLM provider in this example, and we’ll need an API key from them. Head to this page to check out OpenAI’s integration documentation. When creating an agent, we’ll have to configure the settings for `llm`, `tools`, and `prompt`.

For the full scripts and project, see here.

Content generation tools

Another example is an agent that generates text based on content fetched from other sources. For instance, we might use this when we want to generate marketing content with info taken from documentation. In this example, we’ll pretend we’re doing marketing for Python and creating a newsletter for the latest Python release.

In tools.py, a tool is set up to fetch the relevant information, parse it into a structured format, and extract the necessary information.

@tool("fetch_python_whatsnew", return_direct=False)

def fetch_python_whatsnew() -> str:

   """

   Fetch the latest "What's New in Python" article and return a concise, cleaned

   text payload including the URL and extracted section highlights.

   The tool ignores the input argument.

   """

   index_html = _fetch(BASE_URL)

   latest = _find_latest_entry(index_html)

   if not latest:

       return "Could not determine latest What's New entry from the index page."

   article_html = _fetch(latest.url)

   highlights = _extract_highlights(article_html)

   return f"URL: {latest.url}nVERSION: {latest.version}nn{highlights}"

As for the agent in agent.py. 

SYSTEM_PROMPT = (

   "You are a senior Product Marketing Manager at the Python Software Foundation. "

   "Task: Draft a clear, engaging release marketing newsletter for end users and developers, "

   "highlighting the most compelling new features, performance improvements, and quality-of-life "

   "changes in the latest Python release.nn"

   "Process: Use the tool to fetch the latest 'What's New in Python' page. Read the highlights and craft "

   "a concise newsletter with: (1) an attention-grabbing subject line, (2) a short intro paragraph, "

   "(3) 4–8 bullet points of key features with user benefits, (4) short code snippets only if they add clarity, "

   "(5) a 'How to upgrade' section, and (6) links to official docs/changelog. Keep it accurate and avoid speculation."

)

...

def run_newsletter() -> str:

   load_dotenv()

   agent = create_agent(

       model=os.getenv("OPENAI_MODEL", "gpt-4o"),

       tools=[fetch_python_whatsnew],

       system_prompt=SYSTEM_PROMPT,

       # response_format=ToolStrategy(ResponseFormat),

   )

...

As before, we provide a system prompt and the API key for OpenAI to the agent.

For the full scripts and project, see here.

Advanced LangChain concepts

LangChain’s more advanced features can be extremely useful when you’re building a more sophisticated AI agent. Not all AI agents require these extra elements, but they are commonly used in production. Let’s look at some of them.

MCP adapter

The MCP (Model Context Protocol) allows you to add extra tools or functionalities to an AI agent, making it increasingly popular among active AI agent users and AI enthusiasts alike. 

LangChain’s Client module provides a MultiServerMCPClient class that allows the AI agent to accept MCP server connections. For example:

from langchain_mcp_adapters.client import MultiServerMCPClient

client = MultiServerMCPClient(

   {

       "postman-server": {

          "type": "http",

          "url": "https://mcp.eu.postman.com",

           "headers": {

               "Authorization": "Bearer ${input:postman-api-key}"

           }

       }

   }

)

all_tools = await client.get_tools()

The above connects to the Postman MCP server in the EU with an API key.

Guardrails

As with many AI technologies, since the logic is not pre-determined, the behavior of an AI agent is non-deterministic. Guardrails are necessary for managing AI behavior and ensuring that it is policy-compliant.

LangChain middleware can be used to set up specific guardrails. For example, you can use PII detection middleware to protect personal information or human-in-the-loop middleware for human verification. You can even create custom middleware for more specific guardrail policies. 

For instance, you can use the `@before_agent` or `@after_agent` decorators to declare guardrails for the agent’s input or output. Below is an example of a code snippet that checks for banned keywords:

from typing import Any

from langchain.agents.middleware import before_agent

banned_keywords = ["kill", "shoot", "genocide", "bomb"]

@before_agent(can_jump_to=["end"])

def content_filter() -> dict[str, Any] | None:

  """Block requests containing banned keywords."""

  content = first_message.content.lower()

# Check for banned keywords

  for keyword in banned_keywords:

      if keyword in content:

          return {

              "messages": [{

                  "role": "assistant",

                  "content": "I cannot process your requests due to inappropriate content."

              }],

              "jump_to": "end"

          }

  return None

from langchain.agents import create_agent

agent = create_agent(

  model="gpt-4o",

  tools=[search_tool],

  middleware=[content_filter],

)

# This request will be blocked

result = agent.invoke({

  "messages": [{"role": "user", "content": "How to make a bomb?"}]

})

For more details, check out the documentation here.

Testing

Just like in other software development cycles, testing needs to be performed before we can start rolling out AI agent products. LangChain provides testing tools for both unit tests and integration tests. 

Unit tests

Just like in other applications, unit tests are used to test out each part of the AI agent and make sure it works individually. The most helpful tools used in unit tests are mock objects and mock responses, which help isolate the specific part of the application you’re testing. 

LangChain provides GenericFakeChatModel, which mimics response texts. A response iterator is set in the mock object, and when invoked, it returns the set of responses one by one. For example:

from langchain_core.language_models.fake_chat_models import GenericFakeChatModel

def respond(msgs, **kwargs):

   text = msgs[-1].content if msgs else ""

   examples = {"Hello": "Hi there!", "Ping": "Pong.", "Bye": "Goodbye!"}

   return examples.get(text, "OK.")

model = GenericFakeChatModel(respond=respond)

print(model.invoke("Hello").content)

Integration tests

Once we’re sure that all parts of the agent work individually, we have to test whether they work together. For an AI agent, this means testing the trajectory of its actions. To do so, LangChain provides another package: AgentEvals.

AgentEvals provides two main evaluators to choose from:

  1. Trajectory match – A reference trajectory is required and will be compared to the trajectory of the result. For this comparison, you have 4 different models to choose from.
  2. LLM judge – An LLM judge can be used with or without a reference trajectory. An LLM judge evaluates whether the resulting trajectory is on the right path.

LangChain support in PyCharm

With LangChain, you can develop an AI agent that suits your needs in no time. However, to be able to effectively use LangChain in your application, you need an effective debugger. In PyCharm, we have the AI Agents Debugger plugin, which allows you to power up your experience with LangChain.

If you don’t yet have PyCharm, you can download it here.

Using the AI Agents Debugger is very straightforward. Once you install the plug-in, it will appear as an icon on the right-hand side of the IDE.

When you click on this icon, a side window will open with text saying that no extra code is needed – just run your agent and traces will be shown automatically.

As an example, we will run the content generation agent that we built above. If you need a custom run configuration, you will have to set it up now by following this guide on custom run configurations in PyCharm.

Once it is done, you can review all the input prompts and output responses at a glance. To inspect the LangGraph, click on the Graph button in the top-right corner.

The LangGraph view is especially useful if you have an agent that has complicated steps or a customized workflow.

Summing up

LangChain is a powerful tool for building AI agents that work for many use cases and scenarios. It’s built on LangGraph, which provides low-level orchestration and runtime customization, as well as compatibility with a vast variety of LLMs on the market. Together, LangChain and LangGraph set a new industry standard for developing AI agents.

Why ecosystem-specific trust frameworks don’t scale across data spaces

Why ecosystem-specific trust frameworks don’t scale across data spaces

As long as an organisation participates in a single data space, ecosystem-specific trust frameworks work reasonably well: rules are defined, compliance is checked, and trust decisions stay inside a bounded context. The challenge begins when organisations need to operate across multiple data spaces at the same time, a scenario that is becoming the norm rather than the exception.

That is the practical reality Christoph Strnadl will address in his talk, Architecture and implementation of a cross-ecosystem trust framework: Concepts & Experiences, at OCX. As CTO of Gaia-X AISBL, and with more than 30 years of experience in IT consulting and technology leadership, he has seen how trust architectures behave once data spaces are required to interoperate.

Many data spaces prioritise speed in their early phases by standardising on a shared software stack. While effective initially, this choice imposes a hard limit later. As Strnadl explains in an interview, 

The very same software packages you use, which speed up the implementation in the very first phases, may totally limit, if not block, the interoperability with the next, the adjacent dataspace, which happens to use another piece of software.

In 2026, this limitation becomes critical. Regulatory and industry requirements, such as Digital Product Passports and carbon footprint reporting, are extending trust obligations across entire supply chains. Organisations, particularly small and medium-sized enterprises, are increasingly expected to participate in multiple ecosystems spanning different legal and technical domains.

A common response is to mandate a single reference implementation or compliance engine. Strnadl argues this is a mistake. 

Standardisation has to be done on the level of protocols and standards, and not on which particular software component everyone has to choose.

His session at the Open Community Experience 2026 introduces ecosystem trust profiles and a federated trust architecture that allow autonomous data spaces to interoperate without a central authority or shared software stack. Trust remains sovereign, while compliance checking becomes automatable across ecosystems.

Crucially, this approach is already in use. 

This is operational. This is not, like, fantasy, or theoretical PhD work, or something like that.

The concepts are being demonstrated in a live, cross-border manufacturing use case involving Europe, Canada, and other non-European actors.

At OCX, Christoph Strnadl will break down the architecture behind ecosystem trust profiles and a federated clearing house model, including what worked and what didn’t, in a real multi-ecosystem deployment. 

If you are designing or integrating data spaces in 2026, register for OCX 26 and attend this session in Brussels to take away a concrete, standards-based interoperability approach you can apply.

 

Image
OCX

Daniela Nastase


Designing A Streak System: The UX And Psychology Of Streaks

I’m sure you’ve heard of streaks or used an app with one. But ever wondered why streaks are so popular and powerful? Well, there is the obvious one that apps want as much of your attention as possible, but aside from that, did you know that when the popular learning app Duolingo introduced iOS widgets to display streaks, user commitment surged by 60%. Sixty percent is a massive shift in behaviour and demonstrates how “streak” patterns can be used to increase engagement and drive usage.

At its most basic, a streak is the number of consecutive days that a user completes a specific activity. Some people also define it as a “gamified” habit or a metric designed to encourage consistent usage.

But streaks transcend beyond being a metric or a record in an app; it is more psychological than that. Human instincts are easy to influence with the right factors. Look at these three factors: progress, pride, and fear of missing out (commonly called FOMO). What do all these have in common? Effort. The more effort you put into something, the more it shapes your identity, and that is how streaks crosses into the world of behavioural psychology.

Now, with great power comes great responsibility, and because of that, there’s a dark side to streaks.

In this article, we’ll be going into the psychology, UX, and design principles behind building an effective streak system. We’ll look at (1) why our brains almost instinctively respond to streak activity, (2) how to design streaks in ways that genuinely help users, and (3) the technical work involved in building a streak pattern.

The Psychology Behind Streaks

To design and build an effective streak system, we need to understand how it aligns with how our brains are wired. Like, what makes it so effective to the extent that we feel so much intense dedication to protect our streaks?

There are three interesting, well-documented psychology principles that support what makes streaks so powerful and addictive.

Loss Aversion

This is probably the strongest force behind streaks. I say this because most times, you almost can’t avoid this in life.

Think of it this way: If a friend gives you $100, you’d be happy. But if you lost $100 from your wallet, that would hurt way more. The emotional weight of those situations isn’t equal. Loss hurts way more than gain feels good.

Let’s take it further and say that I give you $100 and ask you to play a gamble. There’s a 50% chance you win another $100 and a 50% chance you lose the original $100. Would you take it? I wouldn’t. Most people wouldn’t. That’s loss aversion.

If you think about it, it is logical, it is understandable, it is human.

The concept behind loss aversion is that we feel the pain of losing something twice as much as the pleasure of gaining something of equal value. In psychological terms, loss lingers more than gains do.

You probably see how this relates to streaks. To build a noticeable streak, it requires effort; as a streak grows, the motivation behind it begins to fade; or more accurately, it starts to become secondary.

Here’s an example: Say your friend has a three-day streak closing their “Move Rings” on their Apple Watch. They have almost nothing to lose beyond wanting to achieve their goal and be consistent. At the same time, you have an impressive 219-day streak going. Chances are that you are trapped by the fear of losing it. You most likely aren’t thinking about the achievement at this point; it’s more about protecting your invested effort, and that is loss aversion.

Duolingo explains how loss aversion contributes to a user’s reluctance to break a long streak, even on their laziest days. In a way, a streak can turn into a habit when loss aversion settles in.

The Fogg Behaviour Model (B = MAP)

Now that we understand the fear of losing the effort invested in longer streaks, another question is: What makes us do the thing in the first place, day after day, even before the streak gets big?

That’s what the Fogg Behaviour Model is about. It is relatively simple. A behaviour (B) only occurs when three factors — Motivation (M), Ability (A), and Prompt (P) — align at the same moment. Thus, the equation B=MAP.

If any of these factors, even one, is missing at that moment, the behaviour won’t happen.

So, for a streak system to be efficient and recurring, all three factors must be present:

Motivation
This is fragile and not something that is consistently present. There are days when you’re pumped to learn Spanish, and days you don’t even feel an iota of willpower to learn the language. Motivation by itself to build a habit is unreliable and a losing battle from day one.

Ability
To compensate for the limitations of motivation, ability is critical. In this context, ability means the ease of action, i.e, the effort is so easy that it’s unrealistic to say it isn’t possible. Most apps intentionally use this. Apple Fitness just needs you to stand for one minute in an hour to earn a tick towards your Stand goal. Duolingo only needs one completed lesson. These tasks do not require all that much effort. The barrier is so low that even on your worst days, you can do it. But the combined effort of an ongoing streak is where the idea of losing that streak kicks in.

Prompt
This is what completes the equation. Humans are naturally forgetful, so yes, ability can get us 90% there. But a prompt reminds us to act. Streaks are persistent by design, so users need to be constantly reminded to act. To see how powerful a prompt can be, Duolingo did an A/B test to see if a little red badge on the app’s icon increased consistent usage. It produced a 6% increase in daily active users. Just a red badge.

Model Limitations

All this being said, there is a limitation to the Fogg model whereby critics and modern research have noticed that a design that relies too heavily on prompts, like aggressive notifications, risks creating mental fatigue. Constant notifications and overtime could cause users to churn. So, watch out for that.

The Zeigarnik Effect

How do you feel when you leave a task of project half-done? That irritates many people because unfinished tasks occupy more mental space than the things we complete. When something is done and gone, we tend to forget it. When something is left undone, it tends to weigh on our minds.

This is exactly why digital products use artificial progress indicators, like Upwork’s profile completion bar, to let a user know that their profile is only “60% complete”. It nudges the user to finish what they started.

Let’s look at another example. You have five tasks in a to-do list app, and at the end of the day, you only check four of them as completed. Many of us will feel unaccomplished because of that one unfinished task. That, right there, is the Zeigarnik effect.

The Zeigarnik effecthe was demonstrated by psychologist Bluma Zeigarnik, who described that we tend to keep incomplete tasks active in our memory longer than completed tasks.

A streak pattern naturally taps into this in UX design. Let’s say you are on day 63 of a learning streak. At that point, you’re in an ongoing pattern of unfinished business. Your brain would rarely forget about it as it sits in the back of your mind. At this point, your brain becomes the one sending you notifications.

When you put these psychological forces together, you begin to truly understand why streaks aren’t just a regular app feature; they are capable of reshaping human behaviour.

But somewhere along the line — I can’t say exactly when, as it differs for everyone — things reach a point where a streak shifts from “fun” to something you feel you can’t afford to lose. You don’t want 58 days of effort to go to waste, do you? That is what makes a streak system effective. If done right, streaks help users build astounding habits that accomplish a goal. It could be reading daily or hitting the gym consistently.

These repeated actions (sometimes small) compound over time and become evident in our daily lives. But there are two sides to every coin.

The Thin Line Between Habit And Compulsion

If you have been following along, you can already tell there’s a dark side to streak systems. Habit formation is about consistency with a repeated goal. Compulsion, however, is the consistency of working on a goal that is no longer needed but held onto out of fear or pressure. It is a razor-thin line.

You brush your teeth every morning without thinking; it is automatic and instinctive, with a clear goal of having good breath. That’s a streak that forms a good habit. An ethical streak system gives users space to breathe. If, for some reason, you don’t brush in the morning, you can brush at noon. Imperfection is allowed without fear of losing a long effort.

Compulsion takes the opposite route, whereby a streak makes you anxious, you feel guilty or even exhausted, and sometimes, it feels like you haven’t accomplished anything, despite all your work. You act not because you want to, but because you’re subconsciously terrified of seeing your progress reset to zero.

Someone even described this perfectly, “I felt that I was cheating, but simply did not care. I am nothing without my streak”. This shows the extreme hold streaks can have on an individual. To the extent that users begin to tie their self-worth to an arbitrary metric rather than the original goal or reason they started the streak in the first place. The streak becomes who they are, not just what they do.

A well-designed ethical streak system should feel like encouragement to the user, not pressure or obligation. This relates to the balance of intrinsic and extrinsic motivation. Extrinsic motivation (external rewards, avoiding punishment) might get users started, but intrinsic motivation (doing the task for a personal goal like learning Spanish because you genuinely want to communicate with a loved one) is stronger for long-term engagement.

A good system should gravitate towards intrinsic motivation with careful use of extrinsic elements, i.e., remind users of how far they have come, not threaten them with what they might lose. Again, it is a fine line.

A simple test when designing a streak system is to actually take some time and think whether your products make money by selling solutions to anxiety that your product created. If yes, there’s a high chance you are exploiting users.

So the next question becomes, If I choose to use streak, how do I design it in a way that genuinely helps users achieve their goals?

The UX of Good Streak System Design

I believe this is where most projects either nail an effective streak system or completely mess it up. Let’s go through some UX principles of a good streak design.

Keep It Effortless

You’ve probably heard this before, maybe from books like Atomic Habits, but it’s worth mentioning that one of the easiest ways habits can be formed is by making the action tiny and easy. This is similar to the ability factor we discussed from the Fogg Behaviour Model.

The first rule of any streak design should be making the required action as small as humanly possible while still achieving progress.

If a daily action requires willpower to complete, that action won’t make it past five days. Why? You can’t be motivated five days in a row.

Case in point: If you run a meditation app, you don’t need to make users go through a 20-minute session just to maintain the streak. Try a single minute, maybe even something as small as thirty seconds, instead.

As the saying goes, little drops of water make the mighty ocean). Small efforts compile into big achievements with time. That should be the goal: remove friction, especially when the moment might be difficult. When users are stressed or overwhelmed, let them know that simply showing up, even for a few seconds, counts as effort.

Provide Clear Visual Feedback

Humans are visual by nature. Most times, we need to see something to believe; there’s this need to visualize things to understand them better and put things into perspective.

This is why streak patterns often use visual elements, like graphs, checkmarks, progress rings, and grids, to visualize effort. Look at GitHub’s contribution graph. It is a simple visualization of consistency. Yet developers breathe it in like oxygen.

The key is not to make a streak system feel abstract. It should feel real and earned. For instance, Duolingo and Apple’s Fitness activity rings use clean animation designs on completion of a streak, and GitHub shows historical data of a user’s consistency over time.

Use Good Timing

I mentioned earlierthat humans are generally forgetful by nature, and that prompts can help maintain forward momentum. Without prompts, most new users forget to keep going. Life can get busy, motivation disappears, and things happen. Even long-time users benefit from prompts, though most times, they are already locked inside the habit loop. Nevertheless, even the most committed person can accidentally miss a day.

Your streak system most definitely needs reminders. The most-used prompt reminders are push notifications. Timing really matters when working with push notifications. The type of app matters, too. Sending a notification at 9 a.m. saying “You haven’t practiced today” is just weird for a learning app because many have things to do in the day before they even think about completing a lesson. If we’re talking about a fitness app, though, it is reasonable and maybe even expected to be reminded earlier in the day.

Push notifications vary significantly by app category. Fitness apps, for instance, see higher engagement with early morning notifications (7–8 AM), while productivity apps might perform better in early noon. The key is to A/B test your app’s timing based on your users’ behaviours rather than assuming things are one-size-fits-all. What works for a meditation app might not work for a coding tracker.

Other prompt methods are red dots on the app icon and even app widgets. Studies vary, but the average person unlocks their device between 50-150 times a day (PDF). If a user sees a red dot on an app or a widget that indicates a current streak every time they unlock their phone, it increases commitment.

Just don’t overdo it; the prompt should serve as a reminder, not a nag.

Celebrate Milestones

A streak system should try to celebrate milestones to reignite emotions, especially for users deep into a streak.

When a user hits Day 7, Day 30, Day 50, Day 100, Day 365, you should make a big deal out of it. Acknowledge achievements — especially for long-time users.

As we saw earlier, Duolingo figured this out and implemented an animated graphic that celebrates milestones with confetti. Some platforms even give substantial bonus rewards that validate users’ efforts. And this can be beneficial to apps, such that users tend to share their milestones publicly on social media.

Another benefit is the anticipation that comes before reaching milestones. It isn’t just keeping the streak alive endlessly; users have something to look forward to.

Use Grace Mechanisms

Life is unpredictable. People get distracted. Any good streak system should expect imperfection. One of the biggest psychological threats to a streak system is the hard reset to zero after just a single missed day.

An “ethical” streak system should provide the user with some slack. Let’s say you have a 90-day chess learning streak. You have been consistent for three good months, and one day, your phone dies while traveling, and just like that, 90 becomes 0 — everything, all that effort, is erased, and progress vanishes. The user might be completely devastated. The thought of rebuilding it from scratch is so demoralizing that the effort isn’t worth it. At worst, a user might abandon the app after feeling like a failure.

Consider adding a “grace” mechanism to your streak system:

  • Streak Freeze
    Allow users to intentionally miss a day without penalties.
  • Extra Time
    Allow a few hours (2–3) past the usual deadline before triggering a reset.
  • Decay Models
    Instead of a hard reset, the streak decreases by a small amount, e.g., 10 days is deducted from the streak per missed day.

Use An Encouraging Tone

Let’s compare two messages shown to users when a streak breaks:

  1. “You lost your 42-day streak. Start over.”
  2. “You showed up for 42 days straight. That’s incredible progress! Wanna give it another try?”

Both convey the same information, but the emotional impact is different. The first message would most likely make a user feel demoralized and cause them to quit. The second message celebrates what has already been achieved and gently encourages the user to try again.

Streak Systems Design Challenges

Before we go into the technical specifics of building a streak system, you should be aware of the challenges that you might face. Things can get complicated, as you might expect.

Handling Timezones

There is a reason why handling time and date is among the most difficult concepts developers deal with. There’s formatting, internationalization, and much more to consider.

Let me ask you this: What counts as a day?

We know the world runs on different time zones, and as if that is not enough, some regions have Daylight Saving Time (DST) that happens twice a year. Where do you even begin handling these edge cases? What counts as the “start” of tomorrow?

Some developers try to avoid this by using one central timezone, like UTC. For some users, this would yield correct results, but for some, it could be off by an hour, two hours, or more. This inconsistency ruins the user experience. Users care less how you handle the time behind the scenes; all they expect is that if they perform a streak action at 11:40 p.m., then it should register at that exact time, in their context. You should define “one day” based on the user’s local timezone, not the server time.

Sure, you can take the easy route and reset streaks globally for all users at midnight UTC, but you are very much creating unfairness. Someone in California always has eight extra hours to complete their task than someone living in London. That’s an unjust design flaw that punishes certain users because of their location. And what if that person in London is only visiting, completes a task, then returns to another timezone?

One effective solution to all these is to ask users to explicitly set their timezone during onboarding (preferably after first authentication). It’s a good idea to include a subtle note that providing timezone information is only used for the app to accurately track progress, rather than being used as personally identifiable data. And it’s another good idea to make that a changeable setting.

I suggest that anyone avoid directly handling timezone logic in an app. Use tried-and-true date libraries, like Moment.js or pytz (Python), etc. There’s no need to reinvent the wheel for something as complex as this.

Missed Days And Edge Cases

Another challenge you should worry about is uncontrollable edge cases like users oversleeping, server downtime, lag, network failures, and so on. Using the idea of grace mechanisms, like the ones we discussed earlier, can help.

A grace window of two hours might help both user and developer, in the sense that users are not rigidly punished for uncontrollable life circumstances. For developers, grace windows are helpful in those uncontrollable moments when the server goes down in the middle of the night.

Above all, never trust the client. Always validate on the server-side. The server should be the single source of truth.

Cheating Prevention

Again, I cannot stress this enough: Make sure to validate everything server-side. Users are humans, and humans might cheat if given the opportunity. It is unavoidable.

You might try:

  • Storing all actions with UTC timestamps.
    The client can send their local time, but the server can immediately convert that to UTC and validate against the server time. That way, if the client’s timestamp is suspiciously far, the system can reject it as an error, and the UI can respond accordingly.
  • Using event-based tracking.
    In other words, store a record of each action with metadata including information like the user’s ID, the type of action performed, and the timestamp and timezone. This helps with validation.

Building A Streak System Engine

This isn’t a code tutorial, so I will avoid dumping a bunch of code on you. I’ll keep this practical and describe how things generally operate a streak system engine as far as architecture, flow, and reliability.

Core Architecture

As I’ve said several times, make the serverthe single source of truth for streak data. The architecture can go something like this on the server:

  • Store each user’s data in a database.
  • Store the current streak store (default as 0) as an integer.
  • Store the timezone preference, i.e., IANA Timezone string (either implicitly from local timestamp or explicitly by asking user to select their timezone). For example, “America/New_York”.
  • Handle all logic to determine if the streak continues or breaks, with a timezone check that is relative to the user’s local timezone.

Meanwhile, on the client-side:

  • Display the current streak, normally fetched from the server.
  • Send action done in the form of metadata to the server to validate whether the user actually completed a qualifying streak action.
  • Provide visual feedback based on the server responses.

So, in short, the brain is on the server, and the client is for display purposes and submitting events. This saves you a lot of failures and edge cases, plus makes updates and fixes easier.

The Logical Flow

Let’s simulate a walkthrough of how a minimal efficient streak system engine would go when a user completes an action:

  1. The user completes a qualifying streak action.
  2. The client sends an event to the server as metadata. This could be “User X completed action Y at timestamp Z”.
  3. The server receives this event and does basic validation. Is this a real user? Are they authenticated? Is the action valid? Is the timezone consistent?
  4. If this passes, the server retrieves the user’s streak data from the database.
  5. Then, convert the received action timestamp to the user’s local timezone.
  6. Let the server compare the calendar dates (not timestamps) in the user’s local timezone:
    • If it is the same day, then the action is redundant and there is no change in the streak.
    • If it is the next day, then the streak extends and increments by 1.
    • If there is a gap of more than one day, the streak breaks. However, this is where you might apply grace mechanics.
    • If the grace mechanism is missed, then reset the streak to 1.
  7. If you choose to save historical data for milestone achievements, then update variables like “longest streak” or “total active days”.
  8. The server then updates the database and responds to the client. Something like this:
{
  "current_streak": 48,
  "longest_streak": 50,
  "total_active_days": 120,
  "streak_extended": true,
}

As a further measure, the server should either retry or reject and notify the client when anything fails during the process.

Building For Resilience

As mentioned before, users losing a streak due to bugs or server downtime is terrible UX, and users don’t expect to take the fall for it. Thus, your streak system should have safeguards for those scenarios.

If the server is down for maintenance (or whatever reason), consider allowing a temporary window of additional hours to get it fixed so actions can be submitted late and still count. You can also choose to notify users, especially if the situation is capable of affecting an ongoing streak.

Note: Establish an admin backdoor where data can be manually restored. Bugs are inevitable, and some users would call your app out or reach out to support that their streak broke for a reason they could not control. You should be able to manually restore the streaks if, after investigation, the user is right.

Conclusion

One thing remains clear: Streaks are really powerful because of how human psychology works on a fundamental level.

The best streak system out there is the one that users don’t think about consciously. It has become a routine of immediate results or visible progress, like brushing teeth, which becomes a regular habit.

And I’m just gonna say it: Not all products need a streak system. Should you really force consistency just because you want daily active users? The answer may very well be “no”.

5 DataFrame Operations LLMs Handle Better Than Code

There are things I do with DataFrames all the time that pandas was never built for. Filtering by subjective criteria. Joining tables that don’t share a key. Looking up information that only exists on the web. Recently I’ve been using LLMs, and the results have been surprisingly cheap and accurate.

Here are five operations I now handle with LLMs (with working code).

1. Filter by Qualitative Criteria

You have 3,616 job postings and want only the ones that are remote-friendly, senior-level, AND disclose salary. df[df['posting'].str.contains('remote')] matches “No remote work available.”

Cost: $4.24 for 3,616 rows (9.9 minutes)

from everyrow.ops import screen
from pydantic import BaseModel, Field

class JobScreenResult(BaseModel):
    qualifies: bool = Field(description="True if meets ALL criteria")

result = await screen(
    task="""
    A job posting qualifies if it meets ALL THREE criteria:
    1. Remote-friendly: Explicitly allows remote work
    2. Senior-level: Title contains Senior/Staff/Lead/Principal
    3. Salary disclosed: Specific compensation numbers mentioned
    """,
    input=jobs,
    response_model=JobScreenResult,
)

216 of 3,616 passed (6%). Interestingly, the pass rate has climbed from 1.7% in 2020 to 14.5% in 2025 as more companies are offering remote work and disclosing salaries.

Full guide with dataset · See it applied to real job postings: Screening job postings by criteria

2. Classify Rows Into Categories

You need to label 200 job postings into categories (backend, frontend, data, ML/AI, devops, etc.). Keyword matching misses anything that’s not an exact match, but training a classifier is overkill for a one-off task like this.

Cost: $1.74 for 200 rows (2.1 minutes). At scale: ~$9 for 1,000 rows, ~$90 for 10,000.

from everyrow.ops import agent_map
from typing import Literal

class JobClassification(BaseModel):
    category: Literal[
        "backend", "frontend", "fullstack", "data",
        "ml_ai", "devops_sre", "mobile", "security", "other"
    ] = Field(description="Primary role category")
    reasoning: str = Field(description="Why this category was chosen")

result = await agent_map(
    task="Classify this job posting by primary role...",
    input=jobs,
    response_model=JobClassification,
)

The Literal type constrains the LLM to your predefined set, so there’s no post-processing needed. You can add confidence scores and multi-label support by extending the Pydantic model.

Full guide with dataset

3. Add a Column Using Web Research

You have a list of 246 SaaS products and need the annual price of each one’s lowest paid tier. There’s no API for this kind of problem because it requires visiting pricing pages that all present information differently.

Cost: $6.68 for 246 rows (15.7 minutes), 99.6% success rate

from everyrow.ops import agent_map

class PricingInfo(BaseModel):
    lowest_paid_tier_annual_price: float = Field(
        description="Annual price in USD for the lowest paid tier"
    )
    tier_name: str = Field(description="Name of the tier")

result = await agent_map(
    task="""
    Find the pricing for this SaaS product's lowest paid tier.
    Visit the product's pricing page.
    Report the annual price in USD and the tier name.
    """,
    input=df,
    response_model=PricingInfo,
)

Each result comes with a research column showing how the agent found the answer, with citations. For example, Slack’s entry references slack.com/pricing/pro and shows the math: $7.25/month × 12 = $87/year.

Full guide with dataset · See it applied to vendor matching: Matching software vendors to requirements

4. Join DataFrames Without a Shared Key

You have two tables of S&P 500 data — one with company names and market caps, the other with stock tickers and fair values. Without a shared column across both datasets, pd.merge() is useless.

Cost: $1.00 for 438 rows (~30 seconds), 100% accuracy

from everyrow.ops import merge

result = await merge(
    task="Match companies to their stock tickers",
    left_table=companies,   # has: company, price, mkt_cap
    right_table=valuations,  # has: ticker, fair_value
)
# 3M → MMM, Alphabet Inc. → GOOGL, etc.

Under the hood, it uses a cascade: exact match → fuzzy match → LLM reasoning → web search. The results show 99.8% of rows matched via LLM alone. And even with 10% character-level noise (“Alphaeet Iqc.” instead of “Alphabet Inc.”), it hit 100% accuracy at $0.44. I’d much prefer having to manually review the unmatched rows than deal with false positives.

Full guide with dataset · See it applied at scale: LLM-powered merging at scale

5. Rank by a Metric That’s Not in Your Data

You have 300 PyPI packages and want to rank them by days since last release and number of GitHub contributors. This data is on PyPI and GitHub (not in your DataFrame).

Cost: $3.90 for days-since-release, $4.13 for GitHub contributors (300 rows each, ~5 minutes)

from everyrow.ops import rank

result = await rank(
    task="Rank by number of days since the last PyPI release",
    input=packages,
)

The SDK sends a web research agent per row to look up the metric, then ranks by the result. And it works for any metric you can describe in natural language, as long as it’s findable on the web.

Full guide with dataset

Cost Summary

Operation Rows Cost Time
Filter job postings 3,616 $4.24 9.9 min
Classify into categories 200 $1.74 2.1 min
Web research (pricing) 246 $6.68 15.7 min
Fuzzy join (no key) 438 $1.00 30 sec
Rank by external metric 300 $3.90 4.3 min

All of these are one function call on a pandas DataFrame. The orchestration (batching, parallelism, retries, rate limiting, model selection) is handled by everyrow, an open-source Python SDK. New accounts get $20 in free credit, which covers all five examples above with room to spare.

The full code and datasets for each example are linked above.

How I Built a Side-by-Side Font Comparison Tool (And Accidentally Learned Way Too Much About Browser APIs)

I wanted to let people compare system fonts with Google Fonts.

Sounds simple, right?

It was not simple.

But after a lot of trial, error, and yelling at my browser console, I got it working. Here’s how it works — including the parts that almost made me give up.

The Idea

I built FontPreview because I was tired of guessing how fonts would look with real text. But the thing designers kept asking was: “Can I compare this Google Font with the font already on my computer?”

Turns out, that’s harder than it sounds.

Browsers don’t really want you poking around someone’s system fonts. For good reason — imagine every website you visit getting a list of everything installed on your computer. That’s a privacy nightmare.

But there’s a newer API that lets you do this, if the user says it’s okay.

The Local Font Access API

There’s this thing called the Local Font Access API. It’s relatively new, and not every browser supports it yet (looking at you, Safari). But in Chrome and Edge, you can do this: javascript

const fonts = await window.queryLocalFonts();

That one line of code returns an array of every font installed on the user’s system.

Except — and this is important — the browser will ask the user for permission first. A little popup shows up saying “This site wants to see your fonts.” If the user says no, you get nothing.

This is good. We don’t want random sites scraping font lists without permission.

The Permission Dance

Here’s how I handle it: javascript

function checkSystemFontsPermission() {
  if (!window.queryLocalFonts) {
    showToast('Local Font Access API not supported in this browser');
    useFallbackFonts();
    return;
  }

  // Show the permission modal
  document.getElementById('permissionModal').style.display = 'flex';
}

If the browser doesn’t support the API, I fall back to a list of popular system fonts. Not perfect, but better than nothing.

If it does support it, I show a little modal explaining why I’m asking and what I’ll do with the data. (Spoiler: nothing. I just show them in a dropdown.)

What Happens When They Say Yes

When the user clicks “Allow”, this runs: javascript

async function requestFontPermission() {
  try {
    const fonts = await window.queryLocalFonts();

    // Clean up font names (remove "Regular", "Bold", etc.)
    const fontMap = new Map();

    fonts.forEach(f => {
      let name = f.family;
      const suffixes = [' Regular', ' Bold', ' Italic', ' Light', ' Medium'];

      suffixes.forEach(suffix => {
        if (name.endsWith(suffix)) {
          name = name.substring(0, name.length - suffix.length);
        }
      });

      if (!fontMap.has(name)) {
        fontMap.set(name, {
          family: name,
          fullName: f.family,
          style: f.style,
          weight: f.weight
        });
      }
    });

    // Sort and store
    allSystemFonts = Array.from(fontMap.values()).sort((a, b) => 
      a.family.localeCompare(b.family)
    );

    showToast(`Loaded ${allSystemFonts.length} system fonts`);

  } catch (error) {
    if (error.name === 'NotAllowedError') {
      showToast('Permission denied. Using fallback fonts.');
    } else {
      showToast('Error loading fonts. Using fallback.');
    }
    useFallbackFonts();
  }
}

The Map thing is important. A lot of fonts come back with multiple entries for different styles — “Arial Regular”, “Arial Bold”, “Arial Italic”. I just want “Arial” once. The Map deduplicates them.

The Big Problem I Didn’t Expect

Once I had the font list, I needed to actually use those fonts in the preview.

In CSS, you can just do:

font-family: 'Arial', sans-serif;

And if the user has Arial installed, it works. Great.

But here’s the thing — I also wanted to let users compare system fonts with Google Fonts side by side. So if someone picks Arial on the left and Roboto on the right, I need to load Roboto from Google Fonts.

That means dynamically injecting a tag: javascript

function loadGoogleFontForPanel(fontName, panel) {
  const fontFamily = fontName.replace(/ /g, '+');

  const linkId = `panel-font-${panel}`;
  const oldLink = document.getElementById(linkId);
  if (oldLink) oldLink.remove();

  const link = document.createElement('link');
  link.id = linkId;
  link.rel = 'stylesheet';
  link.href = `https://fonts.googleapis.com/css2?family=${fontFamily}&display=swap`;

  document.head.appendChild(link);
}

This way, each panel can load its own font independently. Left panel can load a system font (which doesn’t need a stylesheet), right panel can load a Google Font (which does).

The Part That Still Bugs Me

When you load a Google Font dynamically, there’s a tiny delay before it’s available. During that moment, the text shows up in the default font, then swaps to the chosen font.

It’s a flash of unstyled text. FOIT, if you want the fancy term.

I tried a bunch of fixes. Preloading, font-display: swap, even hiding the text until the font loads. Everything felt janky.

Eventually I just left it. The flash is brief, and most users don’t notice. But I notice. Every time.

What I’d Do Differently

If I built this again from scratch:

  • Cache the permission status. Right now, the modal shows every time you click “System”. That’s annoying. I should store their choice in localStorage.
  • Better error handling. Sometimes the API just… fails. No error, no nothing. The code just stops. I need to catch that better.
  • Fallback for Safari. Safari doesn’t support this API at all. I should build a real fallback instead of just a list of popular fonts.

The Code (If You Want It)

The whole thing is on FontPreview if you want to see it in action. View source — it’s all client-side, no backend, no tracking, just HTML, CSS, and JavaScript.

I’m not a great developer. I learned most of this by breaking things and Googling errors. But it works, and people use it, and that’s enough for me.

If you’re building something with the Local Font Access API, hit me up. I’ve probably already hit the same bugs you’re hitting.

Originally published on dev.to. Try the tool here: FontPreview

Tags: javascript, webdev, tutorial, showdev, api

Secure Your AWS Environment with GuardDuty and Inspector

Introduction:

In today’s cloud-native world, security isn’t just a checkbox; it’s a continuous process that needs to be embedded throughout your development lifecycle. AWS provides two powerful security services that work together to protect your cloud infrastructure: Amazon GuardDuty for intelligent threat detection and Amazon Inspector for comprehensive vulnerability management. This guide explores how to leverage both services to implement a robust DevSecOps strategy that secures your applications from code to runtime.

Part 1: Amazon GuardDuty – Your 24/7 Threat Detection Guardian

What is Amazon GuardDuty?
Amazon GuardDuty is an intelligent threat detection service that continuously monitors your AWS environment for malicious activity and unauthorized behavior. Think of it as your cloud security guard that never sleeps and analyzes billions of events across multiple data sources using machine learning, anomaly detection, and integrated threat intelligence from AWS and industry-leading third parties.

Key GuardDuty Capabilities:

Expanded Workload Runtime Protection

GuardDuty now monitors EC2 instances, Amazon EKS containers, and AWS Fargate workloads at runtime to detect:

  • Suspicious processes and unauthorized executables
  • Reverse shells indicating remote access attempts
  • Cryptocurrency mining malware.
  • Backdoor behavior and persistence mechanisms.
  • Defense evasion tactics and unusual file access patterns.
    This agent-based monitoring provides deep visibility into operating system-level activity, generating over 30 different runtime security findings to help protect your workloads.

Enhanced Malware Detection Capability

GuardDuty Malware Protection now offers comprehensive malware scanning across multiple AWS services

1.EC2 and EBS Volume Scanning:

  • Agentless scanning of EBS volumes attached to EC2 instances.
  • GuardDuty initiated scans triggered by suspicious behavior.
  • On-demand scans you can initiate manually.
  • Detects trojans, ransomware, botnets, webshells, and cryptominers.

2.S3 Malware Protection:

  • Automatic scanning of newly uploaded objects to S3 buckets.
  • AWS developed multiple industry-leading third-party scan engines.
  • Tagging of scanned objects with scan status (NO_THREATS_FOUND, THREATS_FOUND, etc.)
  • Policy-based prevention of accessing malicious files.

3.AWS Backup Malware Protection (New):

  • Extends malware detection to EC2, EBS, and S3 backups.
  • Automatic scanning of new backups.
  • On-demand scanning of existing backups.
  • Verification that backups are clean before restoration.
  • Incremental scanning to analyze only changed data, reducing costs.
  • Helps identify your last known clean backup to minimize business disruption.

Broader Service Coverage

GuardDuty now protects an expanded range of AWS services beyond EC2:

Amazon S3 Protection: Detects unusual access patterns, data exfiltration attempts, disabling of S3 Block Public Access, and API patterns indicating misconfigured bucket permissions.
Amazon RDS Protection: Monitors RDS and Aurora databases for anomalous login behavior, brute force attacks, and suspicious database access patterns.
AWS Lambda Protection: Detects malicious execution behavior in serverless functions, including invocations from suspicious locations and unusual VPC network activity.
Amazon EKS Protection: Monitors Kubernetes audit logs to detect suspicious API activity, unauthorized access attempts, and policy violations in your EKS clusters.

Smarter Threat Intelligence & Advanced Finding Types

GuardDuty’s enhanced machine learning models and AWS and third-party threat intelligence enable detection of sophisticated attack patterns:

Credential Compromise: Detects IAM credentials being used from unusual locations or by compromised instances
Persistence Techniques: Identifies attackers establishing backdoors and maintaining access
Privilege Escalation: Flags attempts to gain higher-level permissions within your environment
Command-and-Control Traffic: Detects EC2 instances communicating with known malicious domains and C2 servers
Cryptomining Activity: Identifies unauthorized cryptocurrency mining using your resources
Extended Threat Detection: Uses AI/ML to automatically correlate multiple security signals across network activity, process runtime behavior, malware execution, and API activity to detect multi-stage attacks that might otherwise go unnoticed

GuardDuty now generates critical severity findings like AttackSequence:EC2/CompromisedInstanceGroup that provide attack sequence information, complete timelines, MITRE ATT&CK mappings, and remediation recommendations, allowing you to spend less time on analysis and more time responding to threats.

How GuardDuty Works?
GuardDuty analyzes and processes data from multiple sources:

  • VPC Flow Logs: Network traffic patterns and communication with malicious IPs.
  • AWS CloudTrail Management Events: API calls and account activity for detecting credential misuse.
  • CloudTrail S3 Data Events: S3 object-level API activity.
  • DNS Query Logs: DNS queries to detect malicious domain communications.
  • EKS Audit Logs: Kubernetes control plane activity.
  • RDS Login Activity: Database authentication events.
  • Lambda Network Activity: Function execution behavior and network connections.
  • Runtime Monitoring: Operating system-level process and file activity.

All this happens without requiring you to deploy or manage any security software. GuardDuty operates entirely through AWS service integrations.

Practical GuardDuty Demo: Detecting Real Threats

Use Case: Detecting a Compromised EC2 Instance with Cryptomining Activity

Let’s walk through a real-world scenario where GuardDuty detects and alerts on a compromised EC2 instance that’s been infected with cryptocurrency mining malware.

Step 1: Enable GuardDuty

Navigate to AWS Console → GuardDuty → Get Started

  • Click “Enable GuardDuty” (30-day free trial available)

  • Enable protection plans: Foundational, Runtime Monitoring, and Malware Protection.

Step 2: Simulate a Compromised Instance
Launch an EC2 instance and simulate suspicious activity:

  • SSH into your EC2 instance.
  • Make DNS queries to known malicious test domains (provided by GuardDuty for testing).
  • Generate unusual network traffic patterns.

Step 3: Review GuardDuty Findings
Within 15-30 minutes, GuardDuty will generate findings such as

  • Cryptocurrency: EC2/BitcoinTool.B!DNS (indicates your EC2 instance is querying a domain associated with Bitcoin mining).
  • Unauthorized Access: EC2/MaliciousIPCaller.Custom (EC2 instance is communicating with a known malicious IP).
  • Runtime: EC2/SuspiciousProcess (Suspicious process detected at the OS level).

Each finding includes:

  • Severity level (Low, Medium, High, Critical)
  • Affected resource details
  • Action details showing what triggered the alert
  • Recommended remediation steps
  • MITRE ATT&CK technique mappings

Step 4: Investigate with Malware Protection

When GuardDuty detects suspicious behavior, it can automatically trigger a malware scan:

  • Navigate to GuardDuty → Malware scans

  • View the scan results for your EC2 instance
  • If malware is detected, GuardDuty generates an Execution:EC2/MaliciousFile finding
  • Finding details includes the file hash, file path, and threat name


Step 5: Automated Response

Set up automated remediation using EventBridge and Lambda:

  • Create an EventBridge rule to trigger on GuardDuty findings
  • Connect it to a Lambda function that:
  • Isolates the compromised instance (modifiessecurity group)
- Creates a snapshot for forensics
- Sends notifications to your security team
- Tags the resource for investigation


This demo demonstrates how GuardDuty provides continuous, intelligent monitoring with minimal configuration, detecting threats in real-time, and enabling rapid response to protect your AWS environment.

Part 2: Amazon Inspector – Comprehensive Vulnerability Management

What is Amazon Inspector?
Amazon Inspector is an automated vulnerability management service that continuously scans your AWS workloads for software vulnerabilities and network exposures. While GuardDuty detects active threats, Inspector identifies weaknesses before they can be exploited. It’s your proactive security assessor that helps you implement a “shift-left” security approach by catching vulnerabilities early in the development lifecycle.

Key Inspector Capabilities (Enhanced Features):

Code Security Scanning: Shift-Left DevSecOps

Inspector now supports application dependency and source code scanning, enabling true shift-left security:

  • Software Composition Analysis (SCA): Scans open-source library vulnerabilities in your dependencies.
  • Static Application Security Testing (SAST): Analyzes your source code for security flaws.
  • Secrets Detection: Identifies hardcoded credentials, API keys, and sensitive data in code.
  • Infrastructure as Code (IaC) Scanning: Detects misconfigurations in Terraform, CloudFormation, and CDK templates.

Supported Package Managers & Languages:
JavaScript/Node.js: package.json, package-lock.json, yarn.lock
Python: requirements.txt, Pipfile.lock, poetry.lock
Java: pom.xml (Maven), build.gradle (Gradle)
Ruby: Gemfile.lock
Go: go.mod, go.sum

Continuous Scanning

Unlike traditional security tools that run on schedules, Inspector provides continuous, event-driven scanning:

  • Automatic scanning on every code commit to connected repositories.
  • Immediate scanning when new container images are pushed to ECR.
  • Instant scanning when Lambda functions are created or updated.
  • Continuous monitoring of running EC2 instances.
  • Real-time rescanning when new CVEs are published.

Network Exposure Detection

The inspector detects network reachability issues that could expose your workload:

  • Open ports accessible from the internet.
  • Overly permissive security groups.
  • Instances with public IP addresses.
  • Vulnerable services exposed to untrusted networks.

Complete Code → Container → Compute Lifecycle Coverage

Inspector provides end-to-end security across your entire application lifecycle:

  • Code Stage: Scan source code repositories (GitHub, GitLab) for vulnerabilities and secrets before deployment
  • Container Stage: Scan container images in Amazon ECR for CVEs in packages and base images
  • Compute Stage: Monitor running EC2 instances and Lambda functions for package vulnerabilities

DevSecOps Integration: Shift-Left Security

Inspector enables true DevSecOps by shifting security earlier in the Software Development Lifecycle (SDLC):

CI/CD Pipeline Integration:

  • Scan code before merging pull requests
  • Block deployments containing critical vulnerabilities
  • Integrate findings into developer workflows via GitHub/GitLab
  • Automated security gates in deployment pipelines

Early Detection Benefits:

  • Catch vulnerabilities during development, not in production
  • Reduce remediation costs by finding issues early
  • Empower developers with immediate security feedback
  • Maintain security compliance throughout the SDLC

What Inspector Scans?

  • EC2 Instances: Operating system packages and applications, Common Vulnerabilities and Exposures (CVEs), Center for Internet Security (CIS) benchmark compliance
  • Container Images (ECR): Base image vulnerabilities, installed packages, dependency vulnerabilities
  • Lambda Functions: Application code vulnerabilities, package dependencies, layer vulnerabilities, hardcoded secrets
  • Source Code Repositories: Security vulnerabilities in application code, dependency vulnerabilities, IaC misconfigurations, exposed secrets

Practical Inspector Demo: Securing Your Application from Network Vulnerabilities

This demo shows Inspector’s ability to detect and address network vulnerabilities within your deployed infrastructure, helping secure the network layer across the application lifecycle.

Step 1: Enable Amazon Inspector

  • Navigate to AWS Console → Inspector → Get Started

  • Select “Activate Inspector.”

Step 2: Deploy a Vulnerable Infrastructure

  • Launch an EC2 instance with intentional misconfigurations:
  • Launch an EC2 instance with an outdated AMI (e.g., Amazon Linux 2).
  • Create a security group with port 22 (SSH) open to 0.0.0.0/0 (public access).
  • Install outdated packages to simulate a vulnerable environment.

Step 3: View Network Vulnerability Findings
After deploying your vulnerable infrastructure, Inspector will scan for network-related issues and generate findings:

Network Exposure:

  • Finding: Port 22 (SSH) is open to the internet.
  • Severity: Medium
  • Remediation: Restrict access to specific IP ranges or use a bastion host for secure SSH access.

Package Vulnerabilities:

  • Multiple CVEs in system packages
  • Outdated kernel version
  • Suggested package updates

Step 4: Remediate and Rescan:
Fix the identified issues and observe continuous monitoring

  • The inspector automatically rescans and closes remediated findings.

This demo focuses on identifying and remediating network vulnerabilities within your infrastructure using Amazon Inspector.

GuardDuty + Inspector: Better Together

While GuardDuty and Inspector serve different purposes, they complement each other perfectly to provide comprehensive AWS security:

  • GuardDuty: Detects active threats and malicious activity in real-time (“something bad is happening”)
  • Inspector: Identifies vulnerabilities and misconfigurations proactively (“something could be exploited”)

Integration Best Practices

  • Centralize with Security Hub: Aggregate findings from both GuardDuty and Inspector in AWS Security Hub for a unified security dashboard
  • Automate Responses: Use EventBridge to trigger Lambda functions for automated remediation based on finding severity
  • Enable Organization-Wide: Deploy both services across all AWS accounts using AWS Organizations for comprehensive coverage
  • Integrate with SIEM: Export findings to your Security Information and Event Management system for correlation with other security data
  • Track Metrics: Monitor mean time to detect (MTTD) and mean time to remediate (MTTR) to measure security posture improvements.

Conclusion:
Securing your AWS environment requires a multi-layered approach. Amazon GuardDuty provides intelligent, continuous threat detection across your entire AWS infrastructure, while Amazon Inspector enables proactive vulnerability management from code to production. Together, they form a comprehensive security solution that:

  • Implements shift-left security by catching vulnerabilities during development
  • Continuously monitors for threats and vulnerabilities across your entire environment
  • Detects malware, cryptomining, and sophisticated multi-stage attacks
  • Provides actionable findings with remediation guidance
  • Integrates seamlessly into DevSecOps workflows and CI/CD pipelines

Enables automated security responses and compliance reporting
By enabling both GuardDuty and Inspector, you create a robust security foundation that protects your AWS workloads throughout their entire lifecycle from the first line of code to running production infrastructure. Start your security journey today by enabling both services and implementing the best practices outlined in this guide.

Building Voice Agents That Adapt to Context: Personality Layers for AI Assistants

The Problem: Generic Voice Agents Sound Like Robots

Every voice agent sounds the same. Your customer support bot uses the same cadence as your fitness coach, which uses the same tone as your technical assistant. Users notice. They bounce.

The naive solution: train separate models for each personality. That’s expensive, maintenance hell, and doesn’t scale.

The better solution: one core agent with a personality layer that adapts on the fly. When a user switches contexts or the agent’s role changes, the output shifts without retraining.

This is where personality adaptation becomes your competitive advantage.

How Personality Layers Work

A personality layer isn’t magic. It’s a small, composable module that:

  1. Receives the current context (who is the user, what is their preference, what is the task)
  2. Selects or synthesizes a personality profile (formality level, tone, speed, accent characteristics)
  3. Modulates the agent’s output before sending it to speech synthesis
  4. Feeds back — if the user corrects the tone, the layer learns and adjusts

Think of it like prompt engineering for voice. Instead of:

"Be helpful and friendly."

You’re passing:

{
  "tone": "conversational",
  "formality": 0.3,
  "pace": "moderate",
  "enthusiasm": 0.7,
  "technical_depth": 0.4
}

Your voice synthesis engine (TTS) reads these attributes and generates speech that matches the profile.

Building This With Claude Code + Adaptation

Here’s where Claude Code agents shine. You can use Claude Code to:

  1. Generate the personality profile from user context in real-time
  2. Test variations without retraining anything
  3. Log and learn which profiles work best for which use cases

Example flow:

User Input → Claude Agent → Personality Layer → TTS → Audio Output

The Claude agent doesn’t just generate text. It generates:

  • The text response
  • The personality metadata (tone, pace, formality)
  • Optional: a summary of why this personality was chosen

Your TTS engine consumes both and produces voice that matches intent and context.

Why This Matters for Your Product

Case 1: Customer Support
A frustrated customer needs quick, direct answers (high formality, moderate pace, low enthusiasm). A first-time user needs encouragement and clarity (lower formality, slower pace, higher enthusiasm). Same agent. Different personalities.

Case 2: Education
A student reviewing basics needs patient, encouraging voice. An advanced student needs crisp, technical delivery. Personality layer switches in milliseconds.

Case 3: Enterprise
Executive briefing? Corporate tone. Developer onboarding? Casual and approachable. Personality layer makes your bot adapt to the room.

The Architecture

Here’s a minimal implementation:

  1. Context Parser (Claude)

    • Reads user profile, task type, conversation history
    • Outputs a personality vector
  2. Response Generator (Claude)

    • Generates text response + personality metadata
    • No separate model needed
  3. TTS with Modulation (Your chosen TTS)

    • Applies pitch, pace, emphasis based on personality vector
    • Tools like Nvidia’s Personaplex can handle this modulation efficiently
  4. Feedback Loop (Optional but powerful)

    • User feedback on voice quality → stored as training signal
    • Claude agent learns which personalities work best

The entire system is lightweight. No massive retraining. No separate models. One agent with adaptive output.

Real-World Numbers

  • Cost: Run entirely on Claude API. No custom TTS models to train or host.
  • Latency: Personality layer adds <50ms to response time (Claude generates metadata in the same call as text).
  • Scalability: One agent handles unlimited personality variations.
  • Maintenance: When you improve the core agent, all personality variants improve automatically.

What to Do Next

  1. Pick one use case where personality matters (support, education, or internal tools)
  2. Define 3-5 personality profiles for that use case (excited, serious, casual, technical, friendly)
  3. Build a Claude agent that takes context and outputs both response + personality metadata
  4. Connect it to a TTS engine that respects the metadata (Nvidia Personaplex, Google Cloud Text-to-Speech, or similar)
  5. Log which personalities work for different user types. Let the data guide you.

Start small. One use case. Three personalities. Measure engagement. Scale from there.

The future of voice agents isn’t smarter models. It’s smarter routing and adaptation. Personality layers let you build that today.

Migrating from Jekyll to Hugo… or not

Most of my blog posts are lessons learned. I’m trying to achieve something, and I document the process I used to do it. This one is one of the few where, in the end, I didn’t achieve what I wanted. In this post, I aim to explain what I learned from trying to migrate from Jekyll to Hugo, and why, in the end, I didn’t take the final step.

Context

I started this blog on WordPress. After several years, I decided to migrate to Jekyll. I have been happy with Jekyll so far. It’s based on Ruby, and though I’m no Ruby developer, I was able to create a few plugins.

I’m hosting the codebase on GitLab, with GitLab CI, and I have configured Renovate to create a PR when a Gem is outdated. This way, I pay technical debt every time, and I don’t accrue it over the years. Last week, I got a PR to update the parent Ruby Docker image from 3.4 to 4.0.

I checked if Jekyll was ready for Ruby 4. It isn’t, though there’s an open issue. However, it’s not only Jekyll: the Gemfile uses gems whose versions aren’t compatible with Ruby 4.

Worse, I checked the general health of the Jekyll project. The last commits were some weeks ago from the Continuous Integration bot. I thought perhaps it was time to look for an alternative.

Hugo

Just like Jekyll, Hugo is a static site generator.

Hugo is one of the most popular open-source static site generators. With its amazing speed and flexibility, Hugo makes building websites fun again.

Contrary to Jekyll, Hugo builds upon Go. It touts itself as “amazingly fast”. Icing on the cake, the codebase sees much more activity than Jekyll. Though I’m not a Go fan, I decided Hugo was a good migration target.

Jekyll to Hugo

Migrating from Jekyll to Hugo follows the Pareto Law.

Migrating content

Hugo provides the following main folders:

  • content for content that needs to processed
  • static for resources that are copied as is
  • layouts for templates
  • data for datasources

Check the full list for exhaustivity.

Jekyll distinguishes between posts and pages. The former have a date, the latter don’t. Thus, posts are the foundation of a blog. Pages are stable and structure the site. Hugo doesn’t make this distinction.

Jekyll folders structure maps as:

Jekyll Hugo
_posts content/posts
_pages/<foo.md> content/posts/<foo.md>
_data data
_layouts layouts
assets static

When mapping isn’t enough

Jekyll offers plugins. Plugins come in several categories:

  • Generators – Create additional content on your site
  • Converters – Change a markup language into another format
  • Commands – Extend the jekyll executable with subcommands
  • Tags – Create custom Liquid tags
  • Filters – Create custom Liquid filters
  • Hooks – Fine-grained control to extend the build process

On Jekyll, I use generators, tags, filters, and hooks. Some I use through existing gems, such as the Twitter plugin, others are custom-developed for my own needs.

Jekyll tags translate to shortcodes in Hugo:

A shortcode is a template invoked within markup, accepting any number of arguments. They can be used with any content format to insert elements such as videos, images, and social media embeds into your content.

There are three types of shortcodes: embedded, custom, and inline.

Hugo offers quite a collection of shortcodes out-of-the-box, but you can roll out your own.

Unfortunately, generators don’t have any equivalent in Hugo. I have developed generators to create newsletters and talk pages. The generator plugin automatically generates a page per year according to my data. In Hugo, I had to manually create one page per year.

Migrating the GitLab build

The Jekyll build consists of three steps:

  1. Detects if any of Gemfile.lock, Dockerfile, or .gitlab-ci.yml has changed, and builds the Docker image if it’s the case
  2. Uses the Docker image to actually build the site
  3. Deploy the site to GitLab Pages

The main change obviously happens in the Dockerfile. Here’s the new Hugo version for reference:

FROM docker.io/hugomods/hugo:exts

ENV JAVA_HOME=/usr/lib/jvm/java-21-openjdk
ENV PATH=$JAVA_HOME/bin:$PATH

WORKDIR /builds/nfrankel/nfrankel.gitlab.io

RUN apk add --no-cache openjdk21-jre graphviz                                       #1
 && gem install --no-document asciidoctor-diagram asciidoctor-diagram-plantuml rouge #2
  1. Packages for PlantUML
  2. Gems for Asciidoctor diagrams and syntax highlighting

At this point, I should have smelled something fishy, but it worked, so I continued.

The deal breaker

I migrated with the help of Claude Code and Copilot CLI. It took me a few sessions, spread over a week, mostly during the evenings and on the weekend. During migration, I regularly requested one-to-one comparisons to avoid regressions. My idea was to build the Jekyll and Hugo sites side-by-side, deploy them both on GitLab Pages, and compare both deployed versions for final gaps. I updated the build to do that, and I triggered a build: the Jekyll build took a bit more than two minutes, while the Hugo build took more than ten! I couldn’t believe it, so I triggered the build again. Results were consistent.

Builds screenshot

I analyzed the logs to better understand the issue. Besides a couple of warnings, I saw nothing explaining where the slowness came from.

                  │  EN  
──────────────────┼──────
 Pages            │ 2838 
 Paginator pages  │  253 
 Non-page files   │    5 
 Static files     │ 2817 
 Processed images │    0 
 Aliases          │  105 
 Cleaned          │    0 
Total in 562962 ms

When I asked Claude Code, it pointed out my usage of Asciidoc in my posts. While Hugo perfectly supports Asciidoc (and other formats), it delegates formats other than Markdown to an external engine. For Asciidoc, it’s asciidoctor. It turns out that this approach works well for a couple of Asciidoc documents, not so much for more than 800. I searched and quickly found that I wasn’t the first one to hit this wall: this thread spans five years.

Telling I was disappointed is an understatement. I left the work on a branch. I’ll probably delete it in the future, once I’ve cooled down.

Conclusion

Before working on the migration, I did my due diligence and asserted the technical feasibility of the work. I did that by reading the documentation and chatting with an LLM. Yet, I wasted time doing the work before rolling back. I’m moderately angry toward the Hugo documentation for not clearly mentioning the behavior and the performance hit in bold red letters. Still, it’s a good lesson to remember to check for such issues before spending that much time, even on personal projects.

Go further:

  • Hugo shortcodes
  • Hugo functions

Originally published at A Java Geek on February 15th, 2026

JakartaOne by Jozi-JUG 2026

JakartaOne by Jozi-JUG 2026

When I Code Java was cancelled with short notice,Phillip, Buhake and I scrambled and created a substitute event. With funds from the Eclipse Foundation concept of Open Community Meetup and the organisation of Jozi-JUG, we created JakartaOne by Jozi-JUG where Phillip and I presented. The event had 208 registered attendees.

I started the evening by presenting The Past, Present, and Future of Enterprise Java. Phillip took over after me and presented AI with (and in) Quarkus. We were hosted by Investec, who also provided food and drinks for us. After the talks prices and swag were raffled out to the attendees. This is always very appreciated.

This was the first JakartaOne by [JUG] we have done, but it certainly won’t be the last. We even discussed turning it into a half-day or full-day conference and brand it as JakartaOne South Africa or JakartaOne Johannesburg next year.

Ivar Grimstad


Codebase Intelligence

Navigating a new repository can be overwhelming. I built “Codebase Intelligence” tool to turn static code into an interactive knowledge base using Retrieval-Augmented Generation. Instead of the AI guessing what your code does, it reads the relevant files before answering.

By using semantic search and vector embeddings, you can ask questions like:
“How is the authentication flow handled?”
“Where are the API routes defined?”
Get a context-aware answer backed by your actual code.

I reached some key milestones while building this tool: automated an ingestion pipeline using LangChain and OpenAI embedding model to fetch, chunk, and embed GitHub repos. Leveraged Pinecone vector database for high-performance semantic search and metadata filtering. Integrated GPT 4.0 and Vercel AI SDK to manage the conversation flow. Implemented GitHub Actions to handle automated daily maintenance and cleanup of the database.

Check it out here: https://codebase-intelligence-nu.vercel.app/

Open Source and Contributions 🌟
I’ve made this tool open source! Whether you want to use it for your own repos or help improve the ingestion logic, feel free to check out the code or create an Issue.

Github Repository Link: https://github.com/nancy-kataria/codebase-intelligence