I built persistent AI memory for Claude on Cloudflare’s free tier

Every Claude session starts fresh. You copy context, explain your setup, reintroduce your project, and then do it all over again the next day. I got tired of this and created a solution.

second-brain-cloudflare is a self-hosted MCP server that provides Claude, ChatGPT, Cursor, and any MCP-compatible client with persistent memory across sessions. It operates entirely on Cloudflare’s free tier. Here’s how it works.

The stack

Cloudflare Workers: MCP server, REST API, and web UI, all from one wrangler deploy
D1 (SQLite): stores entry content, tags, source, timestamps, and vector chunk IDs
Vectorize: the vector index (bge-small-en-v1.5, 384 dimensions)
Workers AI: bge-small-en-v1.5 for embeddings,
@cf/meta/llama-4-scout-17b-16e-instruct for web UI synthesis

One deployment. No external databases. No API keys needed beyond your Cloudflare account token.

Tag-based time-decay reranking

Pure vector similarity has a drawback. A memory from three months ago can outrank something you saved yesterday if it’s semantically closer. The solution is to fetch three times more candidates than needed (topK=5 pulls 15), then score each using a tag-aware half-life:

Tasks: 7-day half-life
Work: 3-month half-life
Context: 6-month half-life
Default: 30-day half-life

adjusted_score = cosine_similarity × e^(-age_in_days / half_life)

Duplicate detection

Before storing anything, embed the incoming content and query Vectorize for its nearest neighbor:

Score ≥ 95%: block
Score 85–94%: store with duplicate-candidate tag
Score < 85%: store normally

Without this step, Claude creates 20–30 nearly identical entries for the same decision.

Smart chunking

Long notes split at sentence ends, with a 200-character overlap. Each chunk receives its own vector. Chunk IDs are stored in D1, so forget() reliably removes all related vectors.

Temporal recall (v1.2.0)

Queries now support time limits:

recall(“API decisions”, after=”7 days ago”)
recall(“standup notes”, after=”2026-05-12″)
Supports: “today”, “yesterday”, “last week”, “this month”, ISO dates, and epoch timestamps.

AI synthesis in the web UI

Queries flow through @cf/meta/llama-4-scout-17b-16e-instruct before being rendered. Answers stream in real time, with source memories that can be collapsed underneath. You’ll find Append and Forget buttons. This runs on your own Cloudflare account.

Why the free tier works

D1: 5GB storage, 5 million row reads per day
Vectorize: 5 million vectors, 30 million queried dimensions per month (adequate for team scale but fine for personal use)
Workers AI: 10,000 Neurons per day

Try it

Deploy: https://thesecondbrain.dev
GitHub: https://github.com/rahilp/second-brain-cloudflare

If this was helpful, please give it a star.

I built persistent AI memory for Claude on Cloudflare’s free tier

The stack

Tag-based time-decay reranking

Duplicate detection

Smart chunking

Temporal recall (v1.2.0)

AI synthesis in the web UI

Why the free tier works

Try it

Search

Quads Text

Recent Posts

Archives

Meta