Sentry

Building Sentry: A Distributed Message Broker in Go (From Scratch)
I’m learning systems engineering and system design by doing —
not by watching tutorials, not by drawing diagrams, but by actually building infrastructure.
This blog documents my journey building Sentry, a Kafka-inspired distributed message broker written in Go, from scratch.
Not a wrapper.
Not a framework experiment.
Not a CRUD backend.
A real broker: raw TCP, binary wire protocol, append-only logs, offset indexing, crash recovery, and concurrency at scale.
Why I Started This Project
Most backend engineers (including me, earlier) live behind abstractions:
HTTP frameworks
ORMs
Message queues as black boxes
Managed services that “just work”
But at some point I realized something uncomfortable:
I could use Kafka, Redis, and RabbitMQ…
but I had no real idea how they worked internally.
So I asked myself:
How does a broker actually store messages on disk?
How are offsets tracked?
What happens when the process crashes mid-write?
How do consumers resume safely?
How does a binary protocol work over raw TCP?
What breaks under network lag or partial writes?
And that’s where Sentry was born.
What Is Sentry?
Sentry is a high-performance, Kafka-like distributed message broker written in Go.
It focuses on:
Simplicity over features
Deterministic behavior
Low-level correctness
Failure-first design
Observability via logs
Learning by implementation
The goal is not to replace Kafka.
The goal is to understand Kafka-class systems by building one.
High-Level Architecture
At a high level, Sentry has these core layers:
Network Layer
Raw TCP server
Per-connection goroutines
Binary protocol decoding
Protocol Layer
Custom wire protocol
Length-prefixed frames
Correlation IDs
Versioning support
Broker Core
Message routing
Partition selection
Offset assignment
Persistence Layer
Append-only log segments
Offset → byte index files
Time-based index files
Crash-safe replay
Consumer Layer
Offset-based reads
Replay semantics
Deterministic fetch order
Each layer is explicit.
Nothing is hidden behind magic.
Custom Binary Wire Protocol
One of the first things I built was the wire protocol.
Instead of HTTP or gRPC, Sentry uses a custom binary protocol over raw TCP.
Why?
Because production brokers don’t speak JSON over HTTP.
They speak:
Framed binary messages
Big-endian encoded integers
Compact payloads
Deterministic layouts
Versioned schemas
Protocol Design
Each request frame looks like:
Copy code

| Frame Length (4 bytes) |
| Message Type (2 bytes) |
| Correlation ID (4 bytes) |
| Payload (N bytes) |
Key features:
Length-prefixed framing
So partial TCP reads can be reassembled correctly.
Correlation IDs
So responses match requests in concurrent connections.
Big-endian encoding
For deterministic cross-platform decoding.
Version field (planned)
For forward compatibility.
Concurrency Model
Sentry uses Go’s strengths:
Goroutines
Channels
Mutexes
Worker pools
Ingress Model
One goroutine per TCP connection
Each connection reads frames
Frames are pushed into a bounded worker pool
Workers decode and route requests
This prevents:
Unbounded goroutine growth
Memory pressure
Head-of-line blocking
Backpressure
If the worker pool queue is full:
New requests block
Producers slow down
The system protects itself
No silent overload.
No hidden memory leaks.
Persistence: Append-Only Log Segments
This is the heart of the system.
Every topic partition in Sentry is backed by:
A .log file → message bytes
A .index file → offset → byte position
A .timeindex file → timestamp → offset
Why Append-Only?
Because append-only logs give you:
Crash safety
Sequential disk writes
High throughput
Simple replay
Deterministic ordering
Segment Structure
Each segment is defined by:
Copy code
Go
type Segment struct {
baseOffset uint64
log *os.File
index *os.File
timeIndex *os.File
topic string
partition uint32
active bool
}
Writing a Message
The write path:
Append message bytes to .log
Record (relativeOffset, bytePosition) in .index
Record (timestamp, relativeOffset) in .timeindex
fsync the log file
Return offset to producer
This ensures:
Durability
Deterministic offsets
Replay safety
Offset Indexing
To avoid scanning entire log files:
Each message write updates an index entry:
Copy code

| Relative Offset (4 bytes) |
| Byte Position (4 bytes) |
Stored in .index.
This allows:
O(1) offset → file seek
Fast consumer reads
Predictable latency
Time-Based Indexing
Each message also updates a .timeindex file:
Copy code

| Timestamp (8 bytes) |
| Relative Offset (4 bytes) |
This enables future features like:
Fetch by timestamp
Log compaction
Retention policies
Crash Recovery
This is where systems engineering actually begins.
When the broker restarts:
It scans the partition directory
Discovers all segment files
Loads .index files
Replays .log files if needed
Rebuilds in-memory offsets
Marks last segment as active
This ensures:
No data loss
No duplicate offsets
Safe resume for consumers
Failure-First Design
Everything in Sentry is built assuming failure:
Disk write failures
Partial TCP reads
Process crashes
Power loss
Client disconnects
Examples:
Writes check active segment state
Index lookups validate bounds
Reads handle EOF explicitly
Recovery paths are first-class
Consumer Fetch Logic
Consumers fetch messages using:
Topic
Partition
Offset
Max batch size
The broker:
Looks up offset in .index
Seeks to byte position
Reads message bytes
Returns batch to consumer
Advances consumer offset
This allows:
Replay from any offset
Exactly-once semantics (client-side)
Crash-safe resumes
Observability via Logs
Every major action logs explicitly:
Copy code

[INFO] Sentry Broker Starting…
[INFO] Data directory ready: ./data
[INFO] Loading log segments from disk…
[INFO] Loaded 3 log segments from disk
[INFO] Recovered offsets up to 10432
[INFO] Listening for producer connections…
Why this matters:
Debuggability
Replay verification
Crash analysis
Trust in the system
What This Project Taught Me

Abstractions Hide Complexity
Before Sentry, “Kafka stores messages” was a black box.
Now I know:
How bytes hit disk
How offsets are assigned
How indexes are structured
How recovery actually works
Concurrency Is Not Free
Goroutines are cheap — but not infinite.
Without:
Worker pools
Backpressure
Bounded queues
Your system will explode under load.
Failure Is the Default
If you don’t explicitly design for:
crashes
restarts
partial writes
corrupt state
your system is already broken.
Performance Is a Trade-Off
Every choice has consequences:
fsync = durability vs throughput
memory buffers = speed vs safety
batching = latency vs efficiency
There is no “best” choice.
Only conscious trade-offs.
What’s Next for Sentry
This is only the beginning.
Planned improvements:
Consumer groups
Replication across nodes
Leader election
Segment compaction
Retention policies
Snapshotting
Metrics & dashboards
Raft-based metadata layer
Why I’m Building This in Public
Because:
It keeps me accountable
I get feedback from real engineers
It forces me to document decisions
It creates a learning trail others can follow
Final Thoughts
Building Sentry taught me something critical:
Building features is easy.
Designing systems that don’t break is the real challenge.
This project took me far beyond CRUD apps and APIs.
It forced me to think in:
bytes
offsets
failure modes
recovery paths
concurrency limits
durability guarantees
And honestly?
This has been the most educational project I’ve ever built.
Repo & Resources
GitHub:
https://github.com/tejas2428cse990-svg/Sentry.git
Blog series (coming soon):
👉 Dev.to link
Feedback Welcome

DevegygiebyOL

Related Posts

EGC: Your AI agents never start from zero again

From Neural Compute Stick to Hailo-8: What Actually Happened to Edge AI Hardware

Keep your AI agent’s memory clean and organized with memory-doctor