TaskDev – a task runner for AI coding agents (MCP)

One place for your dev tasks. One place for your logs. And your AI agent sees them too.

Like most developers working on web apps, I usually have a few long-running processes open during the day:

  • the API server
  • the frontend dev server
  • a build watcher

Usually one terminal each. That works, but it is not the handiest setup – you end up jumping between tabs to check what is running and where the logs are.

TaskDev puts them in one place – and makes them visible to your AI agent over MCP.

TaskDev sidebar showing a project node with two tasks

Why I built TaskDev

Agents can read output, but they can’t manage processes.

AI coding agents – Codex, Claude Code, Windsurf Cascade, Cursor – write code well and can read terminal output. What they lack is a stable interface for starting, stopping, and tracking long-running processes. So they spawn duplicates, lose track of what is running, fight stuck ports, and retry until the developer takes over.

The Model Context Protocol (MCP) makes a unified solution possible: one task list that both the developer and the agent can drive.

That is TaskDev:

  • a sidebar for the developer
  • an MCP server for the agent
  • one source of truth – same tasks, same processes, same logs
  • agent commands are sandboxed (see Trust and safety below)

The agent problem, in detail

Long-running tasks like a web service are the worst case:

  • the agent forgets a task is already running and starts it again – and again
  • the previous process still holds the port, so the new one fails
  • it sometimes takes several attempts to stop a task, burning tokens for no reason
  • some agents spawn tasks in hidden terminals or redirect the console output, and the developer doesn’t see what is going on
  • the agent waits forever on a command that never returns

As a result, failed attempts, wasted tokens, and a developer forced to intervene.

The agent itself is not the issue. It just doesn’t have a reliable control interface to manage tasks.

TaskDev is a small, lightweight process supervisor that provides exactly that interface – start, stop, restart, status, logs.

What it is

A small extension for VS Code-based editors (VS Code, Cursor, Windsurf).

  • plain JSON config
  • local processes
  • local logs
  • no telemetry

Tasks are defined in taskdev.json at the root of the workspace.

Install TaskDev

Repository: github.com/tolbxela/taskdev – MIT license.

Install TaskDev from the Extensions panel – search for TaskDev:

  • VS Code → Visual Studio Marketplace
  • Cursor and Windsurf → Open VSX Registry

Then drop a taskdev.json in your workspace and run TaskDev: Install MCP config to wire up the agent side.

Configuration

Example for an ASP.NET Core + Vue.js project:

{
  "project": "My App",
  "tasks": [
    {
      "name": "api",
      "command": "dotnet run --project src/Api",
      "detail": "Starts the backend API",
      "icon": "server-process"
    },
    {
      "name": "ui",
      "type": "npm",
      "command": "npm run dev",
      "cwd": "ui",
      "detail": "Starts the Vite dev server",
      "icon": {
        "id": "globe",
        "color": "terminal.ansiBlue"
      }
    }
  ]
}

Each task needs a name and a command. Everything else is optional:

  • cwd – working directory for the command
  • env – extra environment variables
  • detail – short description shown in the sidebar
  • icon – a codicon id, or { id, color }
  • type – a free-form label like npm or dotnet

Add as many tasks as you want. Two shapes fit naturally:

  • long-running – dev server, build watcher, worker, tunnel, test watcher
  • repetitive – test run, lint, type-check, one-off build, data seed

Both end up in the same sidebar with the same logs, and the agent can start either one on demand.

Multi-root workspaces are supported: each folder can have its own taskdev.json.

Sidebar with the title-bar Open taskdev.json button next to the open config

The sidebar

Click the TaskDev icon in the Activity Bar. You get a tree grouped by project – one node per workspace folder that has a taskdev.json. The project header shows the task count and how many are running.

Each task row shows:

  • an icon (auto-picked from the name, or whatever you set in icon) that turns green while the task is running
  • the task name, plus either the first line of detail or running · 12m once started
  • a rich tooltip on hover with status, command, cwd, PID, uptime, and log path

Inline buttons appear on the task row:

  • play when the task is stopped
  • stop when it is running
  • log to open the current log file in the editor

Hovering a task row reveals Start task and Show log buttons

Clicking log opens the current run in a regular editor tab – searchable, scrollable, and the same file the agent reads over MCP.

Task log open beside the sidebar

The view title has three more actions:

  • Install MCP config – wire up agents (see below)
  • Open taskdev.json – jump to the config, or create one if it is missing
  • Refresh – re-read the config

TaskDev sidebar showing a project node with two tasks

The sidebar refreshes itself every 10 seconds while at least one task is running, every 60 seconds otherwise, and immediately when you edit taskdev.json. Multi-root workspaces show each project side by side.

MCP integration

Run TaskDev: Install MCP config from the command palette and pick which agents to wire up. Detected config files are pre-checked.

Install MCP config picker listing Windsurf, Claude Code, Cursor, Codex, and workspace-scoped configs

The MCP config is only written when this command runs. Nothing happens implicitly.

One necessary drawback is that the MCP config stores the installed extension path, which changes with each new TaskDev version. So you need to re-run TaskDev: Install MCP config after each update. TaskDev will prompt you after an upgrade, but the configs are only rewritten when you confirm in the picker.

The agent gets eight tools:

Tool Purpose
taskdev_list list tasks with status, PID, command, cwd, log path
taskdev_status status of one task or all
taskdev_control start or stop a task
taskdev_restart stop and start
taskdev_logs read recent log lines (current run, or an older run by file)
taskdev_logs_history list previous log files for a task
taskdev_add add a task (with confirmation)
taskdev_remove remove a stopped task (with confirmation)

Agents communicate with TaskDev over MCP and can manage tasks efficiently.

Typical agent loop: change code → taskdev_restart apitaskdev_logs api → read the error → fix or report.

No retry loops. No hung commands. No wasted tokens.

Trust and safety

Commands in your own taskdev.json are normal shell commands – treat the file like code, and only run it in trusted workspaces.

Agent-added tasks (taskdev_add) are sandboxed:

  • no shell chaining, redirects, variables, or subshells
  • no path traversal or arguments outside the project
  • no risky env overrides (PATH, NODE_OPTIONS, dynamic-loader vars, …)
  • only known dev command shapes – npm / pnpm / yarn scripts, dotnet, cargo, go
  • explicit confirmation before any add or remove

The agent can spin up dotnet test. It cannot invent curl ... | sh.

For the exact allow-list, env rules, runtime layout, and MCP tool reference, see security-and-config.md. For setup, see the extension README.

Feedback

Found a bug or have an idea? Open an issue at github.com/tolbxela/taskdev/issues.

Generation 1 — Standalone Models (2018–2022)

The Foundation of Modern AI Systems
When people think of tools like ChatGPT, they often assume the intelligence comes from a single powerful system that “remembers,” “reasons,” and “understands context.”

That intuition is misleading. To truly understand how modern AI systems evolved, we need to go back to Generation 1 — the era of Standalone Models, where everything began. Generation 1 (2018–2022) refers to the period defined by:

  • Large pre‑trained models like GPT, GPT‑2, and GPT‑3
  • Minimal system design around them, with no real external memory or tool integration
    These models were powerful—but fundamentally isolated. They could generate text, but they couldn’t access information, retrieve knowledge, or take actions beyond what was encoded in their training data.

The Core Idea: AI as a Stateless Engine, At the heart of Generation 1 is a critical concept. The model is stateless. Every time you send a prompt, The model processes it independently, It does not remember previous interactions and It does not learn in real time. This is true for GPT-3, Claude, Gemini, Grok. Different vendors, same architectural truth.

The 3-Layer Architecture (Simplified Mental Model)
Even in Generation 1, what you interact with (like ChatGPT) is not just a model.

3-layer
It can be understood as three distinct layers:

➡️Layer 1 — The UI Layer (Interaction Surface)
This is everything the user directly touches. It includes the chat window, the input box, the streaming response area, the conversation sidebar, the “regenerate” button, and even small touches like the copy‑to‑clipboard icon.

You see this layer in tools like ChatGPT, Claude.ai, Perplexity, Gemini, and chat panels inside apps like Cursor or Slack.

Core responsibilities

  • Capture user intent — text input, file uploads, voice, images, tool toggles, model selection
  • Render model output — token‑by‑token streaming, markdown, code blocks, math, citations
  • Create continuity — the illusion that the AI “remembers” the conversation
  • Manage session state — active chat, history navigation, drafts, error recovery
  • Surface controls — stop, regenerate, edit message, branch conversation, share, export

The non‑obvious insight
A great UI layer is what makes ChatGPT feel magical.
Under the hood, it’s the same model you could call with a simple API request.
But the experience is completely different.

➡️Layer 2 — The Orchestration Layer (The Hidden Middleware)
This is the layer most beginners never notice — and it’s the reason many “ChatGPT clones” feel broken or low‑quality. It sits between the UI and the model, quietly doing a huge amount of work the user never sees but always feels. When you send a message to ChatGPT, the text that reaches the model is not the raw message you typed. The orchestration layer transforms it first.

What this layer does

  • System prompt injection — Adds a long, carefully written instruction set that defines the assistant’s personality, tone, abilities, and safety rules.
  • Conversation history management — Decides which past messages to include, which to summarize, and which to drop as the context window fills.
  • Context window budgeting — Tracks token usage across system prompt + history + user message + expected output.
  • Safety and policy filtering — Checks your message before it reaches the model, and checks the model’s output before it reaches you.
  • Rate limiting and quotas — Enforces usage limits that show up as “You’ve reached your limit.”
  • Routing logic — Sends simple queries to cheaper models and complex ones to stronger models.
  • Telemetry and evaluation — Logging, A/B tests, quality checks, and feedback loops.

The non-obvious part: This is where AI products truly differentiate themselves. Two companies can use the same base model, yet one feels magical and the other feels clunky. Why?

Because most of the perceived quality comes from the orchestration layer — not the model.

Why “stateless model + stateful product” matters

The model behind ChatGPT is stateless. Every request is a fresh start.
It doesn’t remember your name, your last message, or that you said “use Python” earlier.

The illusion of memory and continuity is created by the orchestration layer, which replays the relevant parts of your conversation every single time.

This is the most important idea for beginners to understand:

Continuity is created by the UI + orchestration layer, not by the model.

Even today, “memory” features are built on top of the model — the model itself still forgets everything between calls.

➡️Layer 3 — The Model Layer (The Engine That Generates the Output)
This is the part everyone thinks they’re interacting with — the actual AI model. In reality, it’s only one piece of the system, but it’s the piece that does the core job: turning text in → generating text out.
At this layer, things are surprisingly simple.
What the model actually does It takes the final prompt created by the orchestration layer, and it predicts the next token Then the next, and the next, until it forms a complete response. That’s it.

  • No memory.
  • No awareness.
  • No understanding of past conversations unless they’re replayed to it.

What the model doesn’t do

  • It doesn’t remember previous chats
  • It doesn’t store facts about you
  • It doesn’t know the “session” you’re in
  • It doesn’t know what it said 10 minutes ago
  • It doesn’t know what tools the product has
    All of that lives in Layer 2, not here.

Why this layer still matters Even though the model is “just” a prediction engine, it defines the system’s raw capabilities:

  • Language fluency
  • Reasoning ability
  • Knowledge encoded during training
  • Creativity and style
  • Generalization
    A stronger model gives the orchestration layer more to work with — but the model alone is never the full product.

The key beginner insight
The model is stateless. Every request is a blank slate. It only knows what’s inside the prompt it receives right now.This is why the orchestration layer is so important: It builds the illusion of memory, personality, and continuity. The model simply reacts to whatever text it’s given.

Putting it all together

  1. Layer 1 (UI) makes the experience feel smooth
  2. Layer 2 (Orchestration) makes the experience feel intelligent
  3. Layer 3 (Model) generates the actual words

Most people think they’re talking to Layer 3.
In reality, they’re experiencing all three layers working together.

But the foundation remains:

UI + Orchestration + model
Key Takeaway for Developers

If you remember one thing, make it this, LLMs don’t remember—they are made to simulate memory through prompt construction.

This insight is essential when:
Designing AI applications
Debugging responses
Optimizing prompts
Building scalable systems
What Comes Next?

Generation 1 solved text generation. But it couldn’t:

Fetch real-time data
Ground responses in facts

That led to the next evolution:

➡️ Generation 2 — RAG (Retrieval-Augmented Generation)
Where models are no longer isolated—but connected to knowledge.

Final Thought
Generation 1 was not about building “smart assistants.”
It was about discovering that, A stateless probabilistic model, when scaled, can simulate intelligence. Everything that followed—RAG, agents, multi-agent systems—is built on top of this simple but powerful idea.

What Building a SAST Tool Taught Me About AppSec That 13 Years of Software Engineering Didn’t

I’ve been writing software professionally since 2011.

Java, C#, Kotlin, Node.js. Enterprise backends, microservices, APIs, data pipelines. I’ve shipped production code that millions of people have used without knowing it. I’ve led teams, reviewed architectures, mentored junior engineers, and done all the things that accumulate into what people call “senior software engineer.”

And yet, when I decided to transition into application security, I realised I had significant blind spots — not about how software works, but about how software fails. Specifically, how it fails in ways that attackers can exploit.

This is the final article in a series about building a SAST scanner from scratch, embedding it in CI/CD pipelines, writing custom detection rules, and managing false positives. But it’s really about what that whole process taught me about application security as a discipline — and what I wish I’d understood earlier.

I Knew How to Write Secure Code. I Didn’t Know Why It Was Secure.

Here’s an embarrassing admission: I’ve been using parameterised queries for SQL for at least a decade. I knew you were supposed to use them. I used them every time. I would have told you confidently that they prevent SQL injection.

But if you’d asked me, before I started studying AppSec seriously, to explain why they prevent SQL injection — the actual mechanism — I would have given you a hand-wavy answer about “the database handling it separately.”

Building the SQL injection detection rule forced me to get precise. I had to understand exactly what makes "SELECT * FROM users WHERE id = " + userId dangerous, what makes SELECT * FROM users WHERE id = ? with a bound parameter safe, and why the difference matters at the level of how the database parses and executes the statement.

The answer — that parameterised queries send the query structure and the data in separate messages, so the database never attempts to parse the data as SQL syntax — is not complicated. But I didn’t actually know it at that level of precision until I had to write a rule that distinguishes between the two patterns.

This was a theme throughout the project. I knew the what of secure coding from years of following conventions and best practices. Building detection rules forced me to learn the why — the actual attack mechanics that the conventions are defending against.

The lesson: Knowing the secure pattern is not the same as understanding the vulnerability. For a software engineer, the secure pattern is enough to write safe code. For an AppSec engineer, you need to understand the attack, because your job is to find it when someone else didn’t write the safe pattern.

Security Is an Adversarial Discipline

Software engineering is largely a collaborative discipline. You’re building something. The goal is for it to work. Your mental model of the system is oriented around the happy path — the flow where inputs are valid, networks are reliable, and users do what you expect.

AppSec is adversarial. The mental shift required is genuinely disorienting at first.

When I was building the JWT algorithm none rule, I had to think like someone who wants to forge authentication tokens. Not because I want to do that, but because unless I understand exactly how the attack works — what the attacker controls, what assumptions the vulnerable code makes, what the exploit chain looks like — I can’t write a rule that reliably detects it.

This is the skill that 13 years of software engineering didn’t develop: adversarial thinking. The question isn’t “does this code do what it’s supposed to do?” It’s “how could someone make this code do something it’s not supposed to do?”

The OWASP Top 10 is, at its core, a catalogue of the assumptions developers make that attackers exploit. A03 — Injection assumes that input is data, not instructions. A07 — Authentication Failures assumes that the code correctly validates identity. A02 — Cryptographic Failures assumes that encryption means the data is protected.

Every category is a place where the developer’s mental model of the system diverges from what an attacker can actually do to it. Understanding OWASP deeply means understanding those divergences — not as a checklist, but as a way of thinking.

The lesson: You can’t find vulnerabilities you can’t imagine. Developing adversarial thinking — the habit of asking “how could this go wrong for someone who wants it to go wrong” — is the most important cognitive shift in the AppSec transition.

Tools Are Amplifiers, Not Answers

Before I built my own SAST tool, I used SAST tools. And I treated them roughly like a compiler warning: something fires, I look at it, I decide whether to fix it or ignore it.

Building one changed how I think about what a SAST tool actually is.

A SAST tool is a codified set of heuristics about what vulnerable code looks like. Those heuristics are written by humans, based on human understanding of vulnerability patterns, with human decisions about confidence levels and severity ratings. The tool doesn’t know your codebase. It doesn’t know your threat model. It doesn’t know whether the finding it just generated is actually exploitable in your specific deployment context.

This sounds like a criticism. It isn’t. It’s a description of a tool’s appropriate role.

When I run Snyk or Semgrep now, I engage with the results differently than I did before. I ask: what pattern is this rule trying to catch? Is that pattern present in my code for the reason the rule assumes? Does the vulnerability the rule targets actually apply in my context? What would an attacker need to control to exploit this?

Those are AppSec questions, not DevOps questions. A DevOps mindset treats SAST output as a compliance gate. An AppSec mindset treats it as a starting point for analysis.

The lesson: A SAST scanner is a signal generator, not an oracle. The value it provides is proportional to the quality of thinking applied to its output — not to the number of findings it generates or suppresses.

False Positives Taught Me About Risk Tolerance

Every time I suppressed a finding in my own scanner, I had to make a decision: is this actually safe, and how confident am I?

That turns out to be the central skill of AppSec: structured risk assessment under uncertainty.

You almost never have complete information. You can’t always trace every data flow through a complex system. You can’t always know whether a finding is exploitable without building a proof of concept. You have to make a judgment call about whether the risk is acceptable given what you know.

What I learned from managing false positives is that risk tolerance is not a feeling — it’s a position that needs to be documented and defensible. “I suppressed this because it looked fine” is not a risk assessment. “I suppressed this because the data being processed is always from our internal configuration system and never from user input, as confirmed by tracing the call stack in lines 42–67” is a risk assessment.

The difference matters when something goes wrong. And in security, things go wrong.

The lesson: Risk assessment is a core AppSec competency, not a soft skill. Developing a structured, documented approach to risk decisions — even informal ones — is more valuable than any specific technical knowledge.

The Gap Between Writing Secure Code and Finding Insecure Code

These are related skills. They are not the same skill.

Writing secure code is a constructive activity. You know what you’re building. You apply secure patterns. You follow established conventions. The feedback loop is relatively tight — if you use parameterised queries, you know you’re not vulnerable to SQL injection there.

Finding insecure code is a forensic activity. You’re examining code you didn’t write, often without full context, looking for patterns that indicate vulnerability. The feedback loop is loose — you might flag something, triage it, determine it’s a false positive, and never know whether your triage was correct.

The cognitive skills are different. Construction requires knowing the secure pattern. Detection requires knowing the vulnerable pattern and all its variations. It requires understanding which variations are genuinely dangerous and which are contextually safe. It requires maintaining a mental model of an attacker’s perspective while reading code that was written from a developer’s perspective.

I’ve spent 13 years getting good at construction. Building this scanner was the first systematic exercise I did in detection. It was harder than I expected — not technically, but cognitively. Shifting from “I’m building this thing to work” to “I’m looking for ways this thing could be exploited” is a genuine gear change.

The lesson: AppSec is not “software engineering plus security knowledge.” It’s a different cognitive discipline that happens to use the same raw material. Senior software engineers making this transition should expect a genuine learning curve, not just a knowledge gap.

What I’d Tell Someone Starting This Transition

If you’re a software engineer moving into AppSec — or considering it — here’s what I’d tell you based on this project and the broader transition.

Build something. Reading about OWASP is useful. Reading CVE writeups is useful. Neither teaches you what building a detection rule teaches you. The act of translating “this is a vulnerability” into “this is what the vulnerable code looks like in text” forces a precision of understanding that passive learning doesn’t produce.

Study the attacks, not just the defences. Most of your software engineering career was spent learning defences — secure patterns, safe APIs, frameworks that handle the dangerous parts for you. AppSec requires understanding the attacks those defences are designed against. Read exploit writeups. Understand how CVEs actually work. Build your own vulnerable applications and attack them.

Get comfortable with ambiguity. Software engineering has right answers. Does this code compile? Does this test pass? Does this function return the correct value? AppSec often doesn’t. Is this finding exploitable? Is this suppression justified? Is this risk acceptable? These questions frequently don’t have clean answers, and developing comfort with that ambiguity is part of the transition.

Use your engineering background as a superpower, not a crutch. The thing that makes engineers valuable in AppSec is the ability to read code at scale, understand system architecture, and reason about data flows — skills most pure security professionals develop slowly. Use that. But don’t assume that understanding how the code is supposed to work means you understand how it can be broken.

Write about what you’re learning. This series started as a way to document my own thinking. Every article forced me to be more precise about something I thought I understood. The act of explaining something to someone else reveals the gaps in your own understanding faster than almost anything else.

Where This Goes Next

Building this scanner and writing this series was one project. The transition is ongoing.

The next project is taking an old Java service and doing something I haven’t done yet in this series: running Snyk against a real dependency tree on real legacy code, remediating real CVEs, and measuring the before-and-after security posture with actual metrics.

That’s a different kind of AppSec work — Software Composition Analysis rather than static analysis, dependency vulnerabilities rather than code vulnerabilities, Snyk’s recommendations rather than my own rules. But the underlying skills are the same: understand the attack, assess the risk, make a defensible decision, measure the outcome.

The transition from software engineer to AppSec engineer is not a destination. It’s an ongoing process of developing adversarial thinking, structured risk assessment, and the forensic discipline of finding what’s broken rather than building what works.

Thirteen years in, I’m still learning. That’s the right state to be in.

The full SAST tool that this series was built around is at github.com/pgmpofu/sast-tool.

If this series was useful to you — or if you’re making a similar transition and want to compare notes — I’d genuinely like to hear from you. Find me here on dev.to or connect on LinkedIn.

Python argparse: Build CLI Tools in 10 Minutes

Python argparse: Build CLI Tools in 10 Minutes

🎁 Free: AI Publishing Checklist — 7 steps in Python · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)

The Problem with sys.argv[1]

You’ve been there. You write a quick script, hardcode a filename, then immediately need to change it. So you reach for sys.argv:

import sys

filename = sys.argv[1]
count = int(sys.argv[2])

This works — until it doesn’t. Run it without arguments and you get an IndexError. Pass a string where you expected an integer and it crashes. There’s no help text, no validation, no defaults. Anyone else who picks up your script has to read the source code to know how to run it.

argparse solves all of this. It’s in the standard library, requires no installation, and turns your script into a proper CLI tool in minutes.

The Basics: ArgumentParser

Every argparse script starts with a parser:

import argparse

parser = argparse.ArgumentParser(
    description="My CLI tool — does useful things."
)
args = parser.parse_args()

That one call to parse_args() handles everything: reading sys.argv, validating inputs, and printing help when the user passes --help.

Positional Arguments

Positional arguments are required and identified by position, not name:

parser.add_argument("filename", help="Path to the input file")
parser.add_argument("count", help="Number of items to process")

Optional Arguments (--flag and -f)

Optional arguments use -- prefix and can have short aliases:

parser.add_argument("--output", "-o", help="Output file path", default="output.txt")
parser.add_argument("--verbose", "-v", help="Enable verbose logging", action="store_true")

Type Validation: No More Manual Casting

Instead of int(sys.argv[1]) wrapped in a try/except, let argparse handle it:

parser.add_argument("--count", type=int, default=10, help="Number of items")
parser.add_argument("--rate", type=float, default=1.5, help="Processing rate")
parser.add_argument(
    "--format",
    choices=["json", "csv", "txt"],
    default="json",
    help="Output format"
)

If a user passes --count hello, argparse prints a clean error message and exits — no stack trace, no confusion.

Required Arguments, nargs, and Lists

Required Optional Arguments

parser.add_argument("--title", required=True, help="Article title (required)")

Accepting Multiple Values

# One or more values: --tags python beginner tutorial
parser.add_argument("--tags", nargs="+", help="One or more tags")

# Zero or more values: --tags (empty is fine)
parser.add_argument("--tags", nargs="*", help="Zero or more tags")

The result is a Python list you can iterate directly:

args = parser.parse_args()
for tag in args.tags:
    print(tag)

Boolean Flags: store_true and store_false

Boolean flags don’t take a value — their presence or absence is the value:

parser.add_argument("--dry-run", action="store_true", help="Simulate without writing")
parser.add_argument("--no-color", action="store_false", dest="color", help="Disable color output")

Usage:

python publish.py --dry-run        # args.dry_run is True
python publish.py                  # args.dry_run is False
python publish.py --no-color       # args.color is False

Subcommands: One Tool, Many Commands

Real CLI tools like git, docker, and pip use subcommands. add_subparsers() gives you the same structure.

parser = argparse.ArgumentParser(description="Publish queue manager")
subparsers = parser.add_subparsers(dest="command", required=True)

# `publish` subcommand
publish_parser = subparsers.add_parser("publish", help="Publish the next article in queue")
publish_parser.add_argument("--dry-run", action="store_true", help="Simulate without publishing")

# `list` subcommand
list_parser = subparsers.add_parser("list", help="Show the publish queue")
list_parser.add_argument("--format", choices=["table", "json"], default="table")

Now args.command tells you which subcommand was chosen, and each subcommand has its own arguments.

The --verbose / -v Pattern

A common pattern is using --verbose to set the logging level at runtime:

import argparse
import logging

parser = argparse.ArgumentParser()
parser.add_argument("--verbose", "-v", action="store_true", help="Enable debug logging")
args = parser.parse_args()

logging.basicConfig(
    level=logging.DEBUG if args.verbose else logging.INFO,
    format="%(levelname)s: %(message)s"
)

log = logging.getLogger(__name__)
log.info("Starting...")
log.debug("This only shows with --verbose")

Complete Example: Publish Queue CLI

Here’s a working CLI for managing an article publish queue — the same pattern used in the full pipeline.

#!/usr/bin/env python3
"""
publish_queue.py — CLI for managing the article publish queue.
Usage: python publish_queue.py <command> [options]
"""

import argparse
import json
import logging
import sys
from pathlib import Path

QUEUE_FILE = Path("queue.json")


def load_queue() -> list[dict]:
    if not QUEUE_FILE.exists():
        return []
    return json.loads(QUEUE_FILE.read_text())


def save_queue(queue: list[dict]) -> None:
    QUEUE_FILE.write_text(json.dumps(queue, indent=2))


def cmd_list(args: argparse.Namespace) -> None:
    queue = load_queue()
    if not queue:
        print("Queue is empty.")
        return
    for i, article in enumerate(queue, 1):
        status = "[published]" if article.get("published") else "[pending]  "
        print(f"{i}. {status} {article['title']} ({', '.join(article.get('tags', []))})")


def cmd_add(args: argparse.Namespace) -> None:
    queue = load_queue()
    article = {
        "title": args.title,
        "tags": args.tags or [],
        "published": False,
    }
    queue.append(article)
    save_queue(queue)
    logging.info("Added: %s", args.title)
    print(f"Added '{args.title}' to queue. Total: {len(queue)} articles.")


def cmd_publish(args: argparse.Namespace) -> None:
    queue = load_queue()
    pending = [a for a in queue if not a.get("published")]
    if not pending:
        print("No pending articles.")
        return
    next_article = pending[0]
    if args.dry_run:
        print(f"[DRY RUN] Would publish: {next_article['title']}")
        return
    next_article["published"] = True
    save_queue(queue)
    print(f"Published: {next_article['title']}")
    logging.info("Published: %s", next_article["title"])


def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        prog="publish_queue",
        description="Manage your article publish queue.",
    )
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable debug logging",
    )

    subparsers = parser.add_subparsers(dest="command", required=True)

    # list
    list_parser = subparsers.add_parser("list", help="Show the publish queue")
    list_parser.set_defaults(func=cmd_list)

    # add
    add_parser = subparsers.add_parser("add", help="Add an article to the queue")
    add_parser.add_argument("--title", required=True, help="Article title")
    add_parser.add_argument("--tags", nargs="*", help="Tags for the article")
    add_parser.set_defaults(func=cmd_add)

    # publish
    publish_parser = subparsers.add_parser("publish", help="Publish the next pending article")
    publish_parser.add_argument("--dry-run", action="store_true", help="Simulate without writing")
    publish_parser.set_defaults(func=cmd_publish)

    return parser


def main() -> None:
    parser = build_parser()
    args = parser.parse_args()

    logging.basicConfig(
        level=logging.DEBUG if args.verbose else logging.INFO,
        format="%(levelname)s: %(message)s",
    )

    args.func(args)


if __name__ == "__main__":
    main()

--help Output

$ python publish_queue.py --help
usage: publish_queue [-h] [--verbose] {list,add,publish} ...

Manage your article publish queue.

positional arguments:
  {list,add,publish}
    list              Show the publish queue
    add               Add an article to the queue
    publish           Publish the next pending article

options:
  -h, --help          show this help message and exit
  --verbose, -v       Enable debug logging

$ python publish_queue.py add --help
usage: publish_queue add [-h] --title TITLE [--tags [TAGS ...]]

options:
  -h, --help           show this help message and exit
  --title TITLE        Article title
  --tags [TAGS ...]    Tags for the article

Running It

# Add articles to the queue
python publish_queue.py add --title "Python argparse guide" --tags python beginners tutorial
python publish_queue.py add --title "Automate your workflow" --tags python automation

# List the queue
python publish_queue.py list
# 1. [pending]   Python argparse guide (python, beginners, tutorial)
# 2. [pending]   Automate your workflow (python, automation)

# Publish next (dry run first)
python publish_queue.py publish --dry-run
# [DRY RUN] Would publish: Python argparse guide

python publish_queue.py publish
# Published: Python argparse guide

# Check updated queue with debug logging
python publish_queue.py list --verbose

Key Patterns to Remember

Pattern When to use it
type=int / type=float Any numeric input
choices=[...] Fixed set of valid values
required=True Mandatory optional args
nargs="+" / nargs="*" Lists of values
action="store_true" Boolean flags
add_subparsers() Multi-command tools
set_defaults(func=...) Dispatch to subcommand functions

What You Get for Free

Every argparse-based script automatically has:

  • --help / -h — generated from your help= strings
  • Type validation — with clear error messages, no tracebacks
  • Default values — documented in the help output
  • Usage line — auto-generated from your argument definitions

No third-party libraries. No pip install. Just the standard library.

The publish queue CLI in the full pipeline uses argparse for –list, –add, and –publish: germy5.gumroad.com/l/xhxkzz — pay what you want, min $9.99.

Further Reading

  • Your First Automated Python Script That Validates and Runs Itself
  • Python logging: Stop Using print() in Your Automation Scripts
  • How to Schedule Python Scripts with Cron: A Beginner’s Complete Guide

Build your own AI-powered Voice To-Do Assistant using a Waveshare 1.75″ display + Cursor + DuckyClaw — from setup to full feature implementation

As a developer, I recently built a custom voice-enabled to-do assistant using the Waveshare 1.75″ display, Cursor IDE, and DuckyClaw framework. This guide breaks down my step-by-step implementation, with practical tips and pitfalls to avoid—no fluff, just actionable steps for fellow makers. No advanced embedded experience is needed, but basic familiarity with Git and hardware flashing will help.

🧭 Step-by-step Implementation Guide
Step 1 – Clone the DuckyClaw repo

  1. Navigate to the DuckyClaw official documentation and locate the Waveshare dev board quick start section.
  2. Find the “Clone the repo” step, copy the official repository URL (https://github.com/tuya/DuckyClaw.git).
  3. Open Cursor IDE, use the built-in Git integration to clone the repo. Cursor automatically installs required dependencies, eliminating manual package management—this saves time and avoids version conflicts.

Step 2 – Install TuyaOpen Dev Skills (workflow)

  1. Visit the TuyaOpen website and navigate to the developer tools section to find the TuyaOpen Dev Skills workflow installation prompt.
  2. Copy the exact prompt provided (it’s tailored for DuckyClaw integration) and paste it into the Cursor chat panel.
  3. The workflow installs automatically, establishing a direct connection between your project and TuyaOpen’s SDK—critical for accessing cloud services and hardware drivers later.

Step 3 – Create product & get credentials (PID / UUID / AuthKey)

  1. Follow the DuckyClaw quick start guide to create a new product on the Tuya Developer Platform (select “AI Agent” as the product type for seamless DuckyClaw integration).
  2. From the product dashboard, retrieve your Product ID (PID)—this identifies your custom device in the Tuya ecosystem.
  3. Navigate to the “Hardware Development” tab to download your UUID and AuthKey. These credentials are non-negotiable—store them securely, as they authenticate your board with Tuya Cloud and DuckyClaw.

Step 4 – Build & flash with Cursor

  1. In Cursor, use this precise prompt to ensure proper compilation and flashing:
    Build and flash DuckyClaw firmware for Waveshare 1.75&#34; display, using the PID, UUID, and AuthKey I retrieved from Tuya Developer Platform.
  2. Cursor detects your connected Waveshare board automatically, compiles the firmware with your credentials, and flashes it—no manual CLI commands or makefiles required. I tested this with three different Waveshare boards, and it worked consistently.

Step 5 – Activate in Smart Life app

  1. Download the Smart Life app (iOS/Android) and create an account if you don’t already have one.
  2. Follow the app’s “Add Device” flow to complete Wi-Fi provisioning—ensure your phone and Waveshare board are on the same Wi-Fi network for a smooth pairing process.
  3. Complete the pairing and activation steps. Once done, your board is connected to Tuya Cloud and ready to interact with DuckyClaw.

Step 6 – Add To-Do List feature
To implement the to-do functionality, I used Cursor to generate and integrate the code with DuckyClaw’s skill system. Use this specific prompt to avoid missing key features:
Implement a To-Do system for DuckyClaw + Waveshare 1.75&#34; display: Swipe left to access To-Do List, swipe right for Scheduled tasks, UI styled after Apple Reminders, and smooth scrolling using the lv_example_scroll_6 component. Integrate with DuckyClaw’s CRON skill for task scheduling and heartbeat skill for reminders. Cursor generates clean, framework-compatible code—review it briefly to ensure display dimensions match the 1.75″ screen, then adjust any UI elements if needed.

Step 7 – Build & flash again
Re-run the build and flash process in Cursor (use the same prompt as Step 4) to push the to-do feature to your board. The flash process takes 30-60 seconds—do not disconnect the board during this time. I recommend testing the UI immediately after flashing to catch any display alignment issues early.

Step 8 – Final Testing &amp; Debugging
After flashing, test all core features to ensure stability. Here’s what to verify:
● 🎙️ Voice input: Test DuckyClaw’s hardware ASR (ensure your board has a built-in mic or external mic connected) – it should recognize voice commands to add to-dos.
● ✅ To-Do management: Add, edit, and mark tasks as complete—verify UI responsiveness and swipe navigation.
● ⏰ Scheduled tasks: Set a test reminder to confirm the CRON skill triggers notifications (check the display and any connected speaker).
● 📱 Display functionality: Ensure smooth scrolling and no UI glitches on the 1.75″ screen.
If you encounter issues, check the Cursor output log for compilation errors or the Tuya Developer Platform for device connection status.

💡 Developer Notes & Key Takeaways
This project is a practical example of combining AI, IoT, and low-code development to build a useful hardware product. Here’s what I learned during implementation:

  • DuckyClaw’s TuyaOpen foundation simplifies hardware integration—its built-in drivers for displays and ASR save hours of custom coding.
  • Cursor’s low-code approach accelerates feature development, but always review generated code to ensure compatibility with DuckyClaw’s skill system.
  • Credential management is critical—never hardcode PID/UUID/AuthKey in public repos; use DuckyClaw’s config files for secure storage.
  • Extensibility is a strong point: you can easily add more features (e.g., IoT device control, voice TTS) using DuckyClaw’s modular skills.

🔗 Resources & Contribution
Official Docs: Step-by-step hardware setup, SDK guides, and skill development tutorials — https://tuyaopen.ai/duckyclaw

GitHub Repo: GitHub – tuya/DuckyClaw: Edge-Hardware (SoC/MCU) oriented Claw🦞 (check the TODOs.md for upcoming features)

Discord Community: [https://discord.com/invite/yPPShSTttG]

If you build this project, share your tweaks and improvements—I’d love to see how fellow developers extend the to-do functionality or integrate additional DuckyClaw skills. Feel free to drop a comment with questions or your build details! 🦆✨