lowkey: Building a Distributed Lock Service That Actually Works

Yashaswi Mishra

5 min

Yes, sometimes the payments processing cron job can run twice. In the same batch and for the same customer. Unfortunately, the customer would be charged twice.

How did this happen? Two instances of the same server ran the cron job twice. “But we did have locks, right?” You did. It was probably Redis Redlock. It worked perfectly in development. But production isn't kind to assumptions. Say a network partition occurred leading two instances think they're in charge and the poor customer gets charged twice and raises angry support tickets. Poor you are awake at 3 AM.

This is why I built lowkey - a distributed lock service that understands Murphy's Law: If it can go wrong, it will.

What is lowkey?

lowkey makes three promises:

Only ONE - Strong consistency via Raft consensus
Fencing tokens - Simple math prevents stale writes
Fast - 3.24ms latency (faster than etcd's 5-10ms) [p50 benchmark]

Let's talk about why distributed locks are genuinely hard, and how to stop charging the same customer twice.

The Fundamental Problem of Time

You cannot really trust time in distributed systems.

Your process thinks it's been 1 second. But in reality it's been 30 seconds or anything longer than 1 second. You paused for a GC cycle, or the OS scheduled someone else, or the network was slow. Doesn't matter. You missed the lease expiration, and when your process wakes up, it thinks that only a second has passed.

This is called a process pause. It is not theoretical and almost causes production chaos daily.

So how do we solve it?

The CAP Theorem Choice

CAP theorem: Consistency, Availability, Partition Tolerance. We have to pick one out of C and A.

CAP theorem states that in a distributed system, when a network partition happens, you can only guarantee one of the following two properties out of C and A:

Consistency (C)
Every node sees the same data at the same time. Once a write succeeds, all future reads see it.
Availability (A)
Every request receives a response, even if that response might not be the latest value.
Partition tolerance (P)
The system continues operating even when the network splits and messages are dropped or delayed.

For distributed locks, this isn't a choice. It's a requirement: You MUST pick CP (Consistency + Partition Tolerance).

Here's why Availability is the wrong choice:

In an AP system like Redis Redlock, if a network partition occurs :

node A can’t see node B
node B can’t see node A

Both nodes are still alive. Both think they’re healthy. Both say:

“I am available”

This is already the critical moment. This is called Split-Brain situation.
Availability is being preserved by lying about the global state.

Availability means "always respond, even if wrong". For locks, wrong equals disaster.

Whereas, lowkey follows a CP system.

In a CP system, nodes don’t pretend they’re alone in the universe. They are explicitly aware that:

other nodes exist
agreement is required
proceeding without quorum is dangerous

So when a network partition happens, the system splits into two very unequal worlds.

The Three Pillars: Leases, Fencing Tokens, and Raft

Leases

A lease is like a parking meter. You pay for 2 hours. After 2 hours, your time expires. You lose the spot. Automatic.
Lease is the main entity in lowkey. Every owner(node) has to start a lease to acquire locks. Every lock is associated with a lease.

go
// Client creates a lease
lease := client.CreateLease(TTL: 10 * time.Second)
// lease_id = 100

// Client uses this lease for all locks
lock := client.Acquire("job-name", lease_id: 100)

// Client must send keepalive every ~3 seconds (TTL/3)
// If keepalive stops -> lease expires -> locks released

Without leases, a crashed client holds locks forever. System gets stuck. With leases, system recovers automatically.
The flow looks like this.

Time: 0s → Client creates lease (expires at 10s) Time: 3s → Client: keepalive → expires at 13s Time: 6s → Client: keepalive → expires at 16s Time: 9s → Client: keepalive → expires at 19s --- CLIENT CRASHES --- Time: 12s → (no keepalive) Time: 15s → (no keepalive) Time: 19s → Lease expires → Server auto-releases all locks → Other clients can proceed

Fencing Tokens

Imagine a deli counter with number tickets. Person #43 cannot be served after person #44. The number proves freshness.

Fencing tokens are the same idea: monotonically increasing numbers that prove "I'm not a zombie client from the past".
Why are fencing tokens important?

This is how it happens in lowkey :

The token must be validated by the protected resource (database, API, file, etc.). The lock service provides the token, but the resource enforces it.
In lowkey:

go
// Client gets token when acquiring lock
lock := client.Acquire("migration-job")
token := lock.Token()  // e.g., 42

// Client includes token in EVERY protected operation
database.Execute(query, token: token)

// Database validates
func (db *Database) Execute(query string, token uint64) error {
    if token < db.lastSeenToken {
        return errors.New("stale token - rejected")
    }

    // Execute query
    db.lastSeenToken = token
    return nil
}

Raft Consensus

Leases and Fencing tokens solve client-side failures. What about server-side failures? If you have a single lock server and it crashes, your entire system is down. No locks translates to no progress.

The solution: Multiple servers with consensus

But here's the catch: multiple servers create a new problem, who's in charge?
Without consensus, the problem of split brain occurs that we discussed before.

This is why we need consensus - a protocol that ensures all servers agree on:

Who is the leader (only one can hand out locks)
What the current state is (which locks are held)
What operations have been committed (which token numbers are valid)

Why Raft specifically?

There are other consensus algorithms (Paxos, ZAB, Viewstamped Replication). We chose Raft because:

Understandable - Designed explicitly to be easier to understand than Paxos
Proven - Used in production by etcd, Consul, CockroachDB
Battle-tested - HashiCorp's implementation has years of production use
Leader-based - Simpler than leaderless protocols, easier to reason about failures

This is how a leader based consensus helps us :

Network partition: [Server 1] X [Server 2, Server 3] Server 1 (minority): → Tries to become leader → Can't get majority votes (1 out of 3) → Stays as follower → Rejects all write requests → READ-ONLY mode Server 2, 3 (majority): → Server 2 becomes leader (2 out of 3 votes) → Can commit writes (has quorum) → Hands out locks with monotonic tokens → System continues When partition heals: → Server 1 realizes it's behind → Syncs from Server 2 (the true leader) → Consistency restored

Raft is like a jury reaching a verdict. Even if some jurors leave, the remaining majority must agree.

This is CP in action: sacrifice availability (minority can't write) for consistency (no split-brain).

lowkey Architecture

lowkey is built in 4 layers.

Layer 1: FSM (Finite State Machine)

The FSM is the "brain" that manages state:

go
type FSM struct {
    locks  map[string]*Lock   // "migration-job" → {owner, token, lease}
    leases map[uint64]*Lease  // 100 → {owner, expires_at}

    fencingCounter uint64     // Monotonically increasing
    clock *monotime.Clock     // NOT wall clock!
}

Design decision #1: Monotonic time

We don't use time.Now(). Why? System clocks can jump backwards (NTP sync, manual changes and so many reasons). Instead:

go
// Monotonic clock - only moves forward
type Clock struct {
    start time.Time  // Fixed reference point
}

func (c *Clock) Elapsed() Duration {
    return Duration(time.Since(c.start))  // Monotonic!
}

func (c *Clock) ExpiresAt(ttl Duration) Duration {
    return c.Elapsed() + ttl
}

This gives us a clock immune to system time changes. It only knows "time since server started", which only goes forward.

Design decision #2: Fencing counter is sacred

Every lock acquisition increments the fencing counter. This is strictly monotonic - never decreases, never repeats:

go
func (f *FSM) AcquireLock(lockName, ownerID string, leaseID uint64) (uint64, error) {
    // Validate lease exists and isn't expired
    lease := f.leases[leaseID]
    if lease == nil || lease.IsExpired(f.clock.Elapsed()) {
        return 0, ErrLeaseExpired
    }

    // Check if lock is available
    if lock, held := f.locks[lockName]; held && lock.LeaseID != leaseID {
        return 0, ErrLockAlreadyHeld
    }

    // THIS IS CRITICAL: increment before assigning
    f.fencingCounter++

    f.locks[lockName] = &Lock{
        Name:         lockName,
        OwnerID:      ownerID,
        FencingToken: f.fencingCounter,  // Unique, monotonic
        LeaseID:      leaseID,
    }

    return f.fencingCounter, nil
}

Layer 2: Raft Consensus

We use HashiCorp's Raft implementation - battle-tested, production-proven.

The critical optimization: Not everything needs consensus.

Wait, lease renewal doesn't go through Raft?

Operations that need Raft:

CreateLease (affects state)
AcquireLock (affects state)
ReleaseLock (affects state)

Operations that DON'T need Raft:

RenewLease (just extends time)

Here's the trick:

go
// RenewLeaseLocal - leader-only operation
func (n *Node) RenewLeaseLocal(leaseID uint64) (time.Duration, error) {
    // Safety check: must be leader
    if !n.IsLeader() {
        return 0, fmt.Errorf("not leader")
    }

    // Update expiration time in FSM (no Raft consensus)
    return n.fsm.RenewLeaseLocal(leaseID)
}

This is safe because:

Only the leader processes renewals
If leader crashes, then clients reconnect to new leader
If client can't reach new leader within TTL, then lease expires

Benchmark impact: This made heartbeats 10-100x faster. Raft consensus is expensive - use it only when necessary.

Layer 3: Server (gRPC + HTTP)

The server validates leadership and routes requests:

go
func (s *Server) AcquireLock(ctx context.Context, req *pb.AcquireLockRequest) (*pb.AcquireLockResponse, error) {
    // Apply command through Raft
    result, err := s.node.Apply(types.AcquireLockCmd{
        LockName: req.LockName,
        OwnerID:  req.OwnerId,
        LeaseID:  req.LeaseId,
    })

    if err != nil {
        return nil, toGRPCError(err)
    }

    token := result.(fsm.AcquireLockResponse).FencingToken

    return &pb.AcquireLockResponse{
        FencingToken: token,  // Critical: return token to client
    }, nil
}

Why Protocol Buffers (Protobuf)?

We chose Protobuf over JSON for the API layer. Here's why:

1. Performance - Binary encoding beats JSON

plaintext
Message size for AcquireLockRequest:
- Protobuf: ~50 bytes
- JSON:     ~120 bytes
→ 2.4x smaller payload

Serialization speed:
- Protobuf: ~50 ns/op
- JSON:     ~500 ns/op
→ 10x faster serialization

2. Type safety - Compile-time guarantees

plaintext
// lock.proto - the schema IS the contract
message AcquireLockRequest {
  string lock_name = 1;
  string owner_id = 2;
  uint64 lease_id = 3;
}

With JSON, you find bugs at runtime:

go
// JSON - runtime error
{"lock_name": 123}  // oops, should be string!

// Protobuf - compile-time error
req.LockName = 123  // won't compile, type mismatch

3. Schema evolution - Backward compatible

plaintext
// Version 1
message Lock {
  string name = 1;
  uint64 token = 2;
}

// Version 2 - add optional field, old clients still work
message Lock {
  string name = 1;
  uint64 token = 2;
  int64 expires_at = 3;  // new field, backwards compatible
}

4. Multi-language support - One schema, all languages

Generate clients automatically:

bash
protoc --go_out=. lock.proto        # Go
protoc --python_out=. lock.proto    # Python
protoc --java_out=. lock.proto      # Java
protoc --rust_out=. lock.proto      # Rust

5. gRPC integration - Native support

Protobuf is the native serialization for gRPC. You get:

Bidirectional streaming (for heartbeats)
HTTP/2 multiplexing (multiple RPCs on one connection)
Built-in authentication and load balancing

But wait, what about JSON users?

We support both! Using grpc-gateway:

Client sends JSON to /v1/lock/acquire

grpc-gateway translates to Protobuf
gRPC server processes Protobuf
grpc-gateway translates response back to JSON
Client receives JSON

Best of both worlds:

Internal efficiency (Protobuf)
External flexibility (JSON)

The tradeoff:

Protobuf requires schema management and code generation. But for a distributed lock service where every millisecond matters, the performance and type safety are worth it.

For lowkey specifically:

3.24ms average latency - every microsecond counts
Strong typing prevents token confusion (uint64, not string)
Schema evolution allows adding metrics without breaking clients

Layer 4: Client SDK

The SDK makes it easy:

go
// Create client
c, _ := client.NewClient("localhost:9000", "worker-1")
defer c.Stop()

// Start lease (automatic heartbeats every TTL/3)
c.Start(ctx, 10*time.Second)

// Acquire lock
lock, err := c.Acquire(ctx, "migration-job")
if err != nil {
    log.Printf("Another instance is running")
    return
}
defer lock.Release(ctx)

// Get fencing token
token := lock.Token()

// Use token in all protected operations
database.Migrate(token)

The SDK handles:

Automatic heartbeats in background
Error handling and logging
Reconnection on leader changes
Clean shutdown

Benchmarks: The Numbers

Benchmarked on AMD Ryzen 7 5800HS (16 cores), single-node cluster:

plaintext
lowkey Benchmarks:
BenchmarkSequential-16       4460 ops     3.24ms/op
BenchmarkParallel-16        19911 ops     0.60ms/op
BenchmarkContention-16      10000 ops     1.40ms/op

What these numbers mean:

3.24ms sequential - Baseline Raft consensus latency for lock acquisition
0.60ms parallel - With no lock contention, throughput is 5.4x higher
1.40ms contention - Realistic scenario with competing clients

Why these numbers:

Optimized heartbeats → Leader-only renewal (no Raft overhead)
HashiCorp Raft → Battle-tested, highly optimized
Go → Concurrent, low-latency runtime
Smart design → Only critical ops need consensus

The key insight from benchmarks:

Sequential (3.24ms): Baseline Raft consensus latency
Parallel (0.60ms): No lock contention → 5.4x speedup
Contention (1.40ms): Realistic scenario → 2.3x faster than sequential

This proves the heartbeat optimization works - parallel throughput is dramatically higher because heartbeats don't bottleneck the system.

Latency Percentiles: What Users Actually Experience

Average latency (3.24ms) tells part of the story. But what about tail latency? The 1% of requests that take longer?

For distributed locks, tail latency matters:

p50 (median) - Half of requests are faster
p90 - 90% of requests are faster (most users)
p99 - 99% of requests are faster (outliers)
p99.9 - 99.9% of requests are faster (worst case)

lowkey percentile benchmarks (1000 samples, sequential):

plaintext
Percentile Latency Benchmarks:
p50  (median):     2.87ms
p90  (90th):       4.12ms
p95  (95th):       4.58ms
p99  (99th):       5.94ms
p99.9 (99.9th):    8.21ms

What this tells us:

Percentile	Latency	Insight
p50	2.87ms	Half of lock acquisitions complete in under 3ms
p90	4.12ms	90% of users see sub-5ms latency
p99	5.94ms	Even outliers stay under 6ms
p99.9	8.21ms	Worst case is still under 10ms

Why this matters for production:

plaintext
Scenario: API endpoint acquires lock before processing

With lowkey p99 = 5.94ms:
  → 99% of requests add <6ms overhead
  → Predictable, low tail latency
  → Good user experience

With slower system (p99 = 50ms):
  → 1% of requests add 50ms
  → Visible lag for users
  → Unpredictable performance

The distribution is tight:

p50 → p99 spread: 3.07ms (2.87ms to 5.94ms)
No huge outliers (p99.9 is only 8.21ms)
Raft consensus is predictable, not spiky

Takeaway: lowkey's tight percentile distribution means your p99 users don't get a worse experience. Consensus is expensive, but at least it's predictable.

Run the percentiles yourself:

bash
make bench-percentiles

Design Decisions Recap

CP over AP → Consistency is non-negotiable for locks
Monotonic time → Wall clocks are lies, monotonic clocks are truth
Lease-based locks → Auto-cleanup on client failure
Fencing tokens → Math prevents stale writes
Raft consensus → Proven, understood, battle-tested
Leader-only heartbeats → 10-100x performance improvement
gRPC + HTTP → Flexibility for all languages

What I Learned Building This

Time is broken in distributed systems:

Wall clocks jump backward (NTP)
Processes pause unpredictably (GC)
Network delays are unbounded → Solution: Monotonic time + fencing tokens

You can't cheat the CAP theorem:

AP (Redis Redlock) → Split-brain under partition
CP (lowkey) → Only majority can make progress → For locks, CP is the only correct choice

Consensus is expensive:

Raft consensus: ~3-5ms per operation
Leader-only ops: ~0.5ms per operation → Use consensus sparingly, optimize the rest

Fencing tokens are mandatory:

Timeouts alone cannot prevent stale writes
Process pauses are real and unpredictable → The protected resource MUST validate tokens

Try It Yourself

bash
// Clone the repository
git clone https://github.com/pixperk/lowkey.git
cd lowkey

// Build the binary
go build -o lowkey cmd/lowkey/main.go

// Start the server
./lowkey --bootstrap --data-dir ./data

// Use the SDK
go get github.com/pixperk/lowkey/pkg/client

Full documentation, examples, and benchmarks on GitHub.

The Bottom Line

Distributed locks are genuinely hard. Here's why:

Time is unreliable (wall clocks jump, processes pause)
Networks are unreliable (partitions happen)
Processes are unreliable (crashes, GC pauses)

But they're solvable with the right primitives:

Raft consensus → Strong consistency (CP in CAP)
Fencing tokens → Mathematical safety against stale writes
Monotonic time → Immune to clock drift
Leases → Automatic cleanup on failures

lowkey proves you can build a distributed lock service that's both correct (no split-brain, no stale writes) and fast (3.24ms, faster than etcd).

The real lesson? Distributed systems require paranoia.

Assume clocks lie. Assume networks partition. Assume processes pause at the worst possible moment. Then design systems that work anyway.

That's lowkey.

Want more like this?

Buy me a coffeeSupport

All Posts