lowkey: Building a Distributed Lock Service That Actually Works

    lowkey: Building a Distributed Lock Service That Actually Works

    GoDistributed SystemsConsensusBackendSystem Design
    December 28, 2025
    5 min
    YM
    Yashaswi Mishra
    5 min

    Yes, sometimes the payments processing cron job can run twice. In the same batch and for the same customer. Unfortunately, the customer would be charged twice.

    How did this happen? Two instances of the same server ran the cron job twice. “But we did have locks, right?” You did. It was probably Redis Redlock. It worked perfectly in development. But production isn't kind to assumptions. Say a network partition occurred leading two instances think they're in charge and the poor customer gets charged twice and raises angry support tickets. Poor you are awake at 3 AM.

    This is why I built lowkey - a distributed lock service that understands Murphy's Law: If it can go wrong, it will.

    What is lowkey?

    lowkey makes three promises:

    1. Only ONE - Strong consistency via Raft consensus

    2. Fencing tokens - Simple math prevents stale writes

    3. Fast - 3.24ms latency (faster than etcd's 5-10ms) [p50 benchmark]

    Let's talk about why distributed locks are genuinely hard, and how to stop charging the same customer twice.

    The Fundamental Problem of Time

    You cannot really trust time in distributed systems.

    Your process thinks it's been 1 second. But in reality it's been 30 seconds or anything longer than 1 second. You paused for a GC cycle, or the OS scheduled someone else, or the network was slow. Doesn't matter. You missed the lease expiration, and when your process wakes up, it thinks that only a second has passed.

    Time is a Lie
    Time is a Lie

    This is called a process pause. It is not theoretical and almost causes production chaos daily.

    So how do we solve it?

    The CAP Theorem Choice

    CAP theorem: Consistency, Availability, Partition Tolerance. We have to pick one out of C and A.

    CAP theorem states that in a distributed system, when a network partition happens, you can only guarantee one of the following two properties out of C and A:

    • Consistency (C)
      Every node sees the same data at the same time. Once a write succeeds, all future reads see it.

    • Availability (A)
      Every request receives a response, even if that response might not be the latest value.

    • Partition tolerance (P)
      The system continues operating even when the network splits and messages are dropped or delayed.

    For distributed locks, this isn't a choice. It's a requirement: You MUST pick CP (Consistency + Partition Tolerance).

    Here's why Availability is the wrong choice:

    In an AP system like Redis Redlock, if a network partition occurs :

    • node A can’t see node B

    • node B can’t see node A

    Both nodes are still alive. Both think they’re healthy. Both say:

    “I am available”

    This is already the critical moment. This is called Split-Brain situation.
    Availability is being preserved by lying about the global state.

    Availability means "always respond, even if wrong". For locks, wrong equals disaster.

    Whereas, lowkey follows a CP system.

    In a CP system, nodes don’t pretend they’re alone in the universe. They are explicitly aware that:

    • other nodes exist

    • agreement is required

    • proceeding without quorum is dangerous

      So when a network partition happens, the system splits into two very unequal worlds.

    The Three Pillars: Leases, Fencing Tokens, and Raft

    Leases

    A lease is like a parking meter. You pay for 2 hours. After 2 hours, your time expires. You lose the spot. Automatic.
    Lease is the main entity in lowkey. Every owner(node) has to start a lease to acquire locks. Every lock is associated with a lease.

    go
    // Client creates a lease
    lease := client.CreateLease(TTL: 10 * time.Second)
    // lease_id = 100
    
    // Client uses this lease for all locks
    lock := client.Acquire("job-name", lease_id: 100)
    
    // Client must send keepalive every ~3 seconds (TTL/3)
    // If keepalive stops -> lease expires -> locks released

    Without leases, a crashed client holds locks forever. System gets stuck. With leases, system recovers automatically.
    The flow looks like this.

    Time: 0s → Client creates lease (expires at 10s) Time: 3s → Client: keepalive → expires at 13s Time: 6s → Client: keepalive → expires at 16s Time: 9s → Client: keepalive → expires at 19s --- CLIENT CRASHES --- Time: 12s → (no keepalive) Time: 15s → (no keepalive) Time: 19s → Lease expires → Server auto-releases all locks → Other clients can proceed

    Fencing Tokens

    Imagine a deli counter with number tickets. Person #43 cannot be served after person #44. The number proves freshness.

    Fencing tokens are the same idea: monotonically increasing numbers that prove "I'm not a zombie client from the past".
    Why are fencing tokens important?

    This is how it happens in lowkey :

    The token must be validated by the protected resource (database, API, file, etc.). The lock service provides the token, but the resource enforces it.
    In lowkey:

    go
    // Client gets token when acquiring lock
    lock := client.Acquire("migration-job")
    token := lock.Token()  // e.g., 42
    
    // Client includes token in EVERY protected operation
    database.Execute(query, token: token)
    
    // Database validates
    func (db *Database) Execute(query string, token uint64) error {
        if token < db.lastSeenToken {
            return errors.New("stale token - rejected")
        }
    
        // Execute query
        db.lastSeenToken = token
        return nil
    }

    Raft Consensus

    Leases and Fencing tokens solve client-side failures. What about server-side failures? If you have a single lock server and it crashes, your entire system is down. No locks translates to no progress.

    The solution: Multiple servers with consensus

    But here's the catch: multiple servers create a new problem, who's in charge?
    Without consensus, the problem of split brain occurs that we discussed before.

    This is why we need consensus - a protocol that ensures all servers agree on:

    1. Who is the leader (only one can hand out locks)

    2. What the current state is (which locks are held)

    3. What operations have been committed (which token numbers are valid)

    Why Raft specifically?

    There are other consensus algorithms (Paxos, ZAB, Viewstamped Replication). We chose Raft because:

    1. Understandable - Designed explicitly to be easier to understand than Paxos

    2. Proven - Used in production by etcd, Consul, CockroachDB

    3. Battle-tested - HashiCorp's implementation has years of production use

    4. Leader-based - Simpler than leaderless protocols, easier to reason about failures

    This is how a leader based consensus helps us :

    Network partition: [Server 1] X [Server 2, Server 3] Server 1 (minority): → Tries to become leader → Can't get majority votes (1 out of 3) → Stays as follower → Rejects all write requests → READ-ONLY mode Server 2, 3 (majority): → Server 2 becomes leader (2 out of 3 votes) → Can commit writes (has quorum) → Hands out locks with monotonic tokens → System continues When partition heals: → Server 1 realizes it's behind → Syncs from Server 2 (the true leader) → Consistency restored

    Raft is like a jury reaching a verdict. Even if some jurors leave, the remaining majority must agree.

    This is CP in action: sacrifice availability (minority can't write) for consistency (no split-brain).

    lowkey Architecture

    lowkey is built in 4 layers.

    Layer 1: FSM (Finite State Machine)

    The FSM is the "brain" that manages state:

    go
    type FSM struct {
        locks  map[string]*Lock   // "migration-job" → {owner, token, lease}
        leases map[uint64]*Lease  // 100 → {owner, expires_at}
    
        fencingCounter uint64     // Monotonically increasing
        clock *monotime.Clock     // NOT wall clock!
    }

    Design decision #1: Monotonic time

    We don't use time.Now(). Why? System clocks can jump backwards (NTP sync, manual changes and so many reasons). Instead:

    go
    // Monotonic clock - only moves forward
    type Clock struct {
        start time.Time  // Fixed reference point
    }
    
    func (c *Clock) Elapsed() Duration {
        return Duration(time.Since(c.start))  // Monotonic!
    }
    
    func (c *Clock) ExpiresAt(ttl Duration) Duration {
        return c.Elapsed() + ttl
    }

    This gives us a clock immune to system time changes. It only knows "time since server started", which only goes forward.

    Design decision #2: Fencing counter is sacred

    Every lock acquisition increments the fencing counter. This is strictly monotonic - never decreases, never repeats:

    go
    func (f *FSM) AcquireLock(lockName, ownerID string, leaseID uint64) (uint64, error) {
        // Validate lease exists and isn't expired
        lease := f.leases[leaseID]
        if lease == nil || lease.IsExpired(f.clock.Elapsed()) {
            return 0, ErrLeaseExpired
        }
    
        // Check if lock is available
        if lock, held := f.locks[lockName]; held && lock.LeaseID != leaseID {
            return 0, ErrLockAlreadyHeld
        }
    
        // THIS IS CRITICAL: increment before assigning
        f.fencingCounter++
    
        f.locks[lockName] = &Lock{
            Name:         lockName,
            OwnerID:      ownerID,
            FencingToken: f.fencingCounter,  // Unique, monotonic
            LeaseID:      leaseID,
        }
    
        return f.fencingCounter, nil
    }

    Layer 2: Raft Consensus

    We use HashiCorp's Raft implementation - battle-tested, production-proven.

    The critical optimization: Not everything needs consensus.

    Wait, lease renewal doesn't go through Raft?

    Operations that need Raft:

    • CreateLease (affects state)

    • AcquireLock (affects state)

    • ReleaseLock (affects state)

    Operations that DON'T need Raft:

    • RenewLease (just extends time)

    Here's the trick:

    go
    // RenewLeaseLocal - leader-only operation
    func (n *Node) RenewLeaseLocal(leaseID uint64) (time.Duration, error) {
        // Safety check: must be leader
        if !n.IsLeader() {
            return 0, fmt.Errorf("not leader")
        }
    
        // Update expiration time in FSM (no Raft consensus)
        return n.fsm.RenewLeaseLocal(leaseID)
    }

    This is safe because:

    1. Only the leader processes renewals

    2. If leader crashes, then clients reconnect to new leader

    3. If client can't reach new leader within TTL, then lease expires

    Benchmark impact: This made heartbeats 10-100x faster. Raft consensus is expensive - use it only when necessary.

    Layer 3: Server (gRPC + HTTP)

    The server validates leadership and routes requests:

    go
    func (s *Server) AcquireLock(ctx context.Context, req *pb.AcquireLockRequest) (*pb.AcquireLockResponse, error) {
        // Apply command through Raft
        result, err := s.node.Apply(types.AcquireLockCmd{
            LockName: req.LockName,
            OwnerID:  req.OwnerId,
            LeaseID:  req.LeaseId,
        })
    
        if err != nil {
            return nil, toGRPCError(err)
        }
    
        token := result.(fsm.AcquireLockResponse).FencingToken
    
        return &pb.AcquireLockResponse{
            FencingToken: token,  // Critical: return token to client
        }, nil
    }

    Why Protocol Buffers (Protobuf)?

    We chose Protobuf over JSON for the API layer. Here's why:

    1. Performance - Binary encoding beats JSON

    plaintext
    Message size for AcquireLockRequest:
    - Protobuf: ~50 bytes
    - JSON:     ~120 bytes
    → 2.4x smaller payload
    
    Serialization speed:
    - Protobuf: ~50 ns/op
    - JSON:     ~500 ns/op
    → 10x faster serialization

    2. Type safety - Compile-time guarantees

    plaintext
    // lock.proto - the schema IS the contract
    message AcquireLockRequest {
      string lock_name = 1;
      string owner_id = 2;
      uint64 lease_id = 3;
    }

    With JSON, you find bugs at runtime:

    go
    // JSON - runtime error
    {"lock_name": 123}  // oops, should be string!
    
    // Protobuf - compile-time error
    req.LockName = 123  // won't compile, type mismatch

    3. Schema evolution - Backward compatible

    plaintext
    // Version 1
    message Lock {
      string name = 1;
      uint64 token = 2;
    }
    
    // Version 2 - add optional field, old clients still work
    message Lock {
      string name = 1;
      uint64 token = 2;
      int64 expires_at = 3;  // new field, backwards compatible
    }

    4. Multi-language support - One schema, all languages

    Generate clients automatically:

    bash
    protoc --go_out=. lock.proto        # Go
    protoc --python_out=. lock.proto    # Python
    protoc --java_out=. lock.proto      # Java
    protoc --rust_out=. lock.proto      # Rust

    5. gRPC integration - Native support

    Protobuf is the native serialization for gRPC. You get:

    • Bidirectional streaming (for heartbeats)

    • HTTP/2 multiplexing (multiple RPCs on one connection)

    • Built-in authentication and load balancing

    But wait, what about JSON users?

    We support both! Using grpc-gateway:

    Client sends JSON to /v1/lock/acquire

    • grpc-gateway translates to Protobuf

    • gRPC server processes Protobuf

    • grpc-gateway translates response back to JSON

    • Client receives JSON

    Best of both worlds:

    • Internal efficiency (Protobuf)

    • External flexibility (JSON)

    The tradeoff:

    Protobuf requires schema management and code generation. But for a distributed lock service where every millisecond matters, the performance and type safety are worth it.

    For lowkey specifically:

    • 3.24ms average latency - every microsecond counts

    • Strong typing prevents token confusion (uint64, not string)

    • Schema evolution allows adding metrics without breaking clients

    Layer 4: Client SDK

    The SDK makes it easy:

    go
    // Create client
    c, _ := client.NewClient("localhost:9000", "worker-1")
    defer c.Stop()
    
    // Start lease (automatic heartbeats every TTL/3)
    c.Start(ctx, 10*time.Second)
    
    // Acquire lock
    lock, err := c.Acquire(ctx, "migration-job")
    if err != nil {
        log.Printf("Another instance is running")
        return
    }
    defer lock.Release(ctx)
    
    // Get fencing token
    token := lock.Token()
    
    // Use token in all protected operations
    database.Migrate(token)

    The SDK handles:

    • Automatic heartbeats in background
    • Error handling and logging
    • Reconnection on leader changes
    • Clean shutdown

    Benchmarks: The Numbers

    Benchmarked on AMD Ryzen 7 5800HS (16 cores), single-node cluster:

    plaintext
    lowkey Benchmarks:
    BenchmarkSequential-16       4460 ops     3.24ms/op
    BenchmarkParallel-16        19911 ops     0.60ms/op
    BenchmarkContention-16      10000 ops     1.40ms/op

    What these numbers mean:

    • 3.24ms sequential - Baseline Raft consensus latency for lock acquisition
    • 0.60ms parallel - With no lock contention, throughput is 5.4x higher
    • 1.40ms contention - Realistic scenario with competing clients

    Why these numbers:

    1. Optimized heartbeats → Leader-only renewal (no Raft overhead)
    2. HashiCorp Raft → Battle-tested, highly optimized
    3. Go → Concurrent, low-latency runtime
    4. Smart design → Only critical ops need consensus

    The key insight from benchmarks:

    • Sequential (3.24ms): Baseline Raft consensus latency
    • Parallel (0.60ms): No lock contention → 5.4x speedup
    • Contention (1.40ms): Realistic scenario → 2.3x faster than sequential

    This proves the heartbeat optimization works - parallel throughput is dramatically higher because heartbeats don't bottleneck the system.

    Latency Percentiles: What Users Actually Experience

    Average latency (3.24ms) tells part of the story. But what about tail latency? The 1% of requests that take longer?

    For distributed locks, tail latency matters:

    • p50 (median) - Half of requests are faster
    • p90 - 90% of requests are faster (most users)
    • p99 - 99% of requests are faster (outliers)
    • p99.9 - 99.9% of requests are faster (worst case)

    lowkey percentile benchmarks (1000 samples, sequential):

    plaintext
    Percentile Latency Benchmarks:
    p50  (median):     2.87ms
    p90  (90th):       4.12ms
    p95  (95th):       4.58ms
    p99  (99th):       5.94ms
    p99.9 (99.9th):    8.21ms

    What this tells us:

    PercentileLatencyInsight
    p502.87msHalf of lock acquisitions complete in under 3ms
    p904.12ms90% of users see sub-5ms latency
    p995.94msEven outliers stay under 6ms
    p99.98.21msWorst case is still under 10ms

    Why this matters for production:

    plaintext
    Scenario: API endpoint acquires lock before processing
    
    With lowkey p99 = 5.94ms:
      → 99% of requests add <6ms overhead
      → Predictable, low tail latency
      → Good user experience
    
    With slower system (p99 = 50ms):
      → 1% of requests add 50ms
      → Visible lag for users
      → Unpredictable performance

    The distribution is tight:

    • p50 → p99 spread: 3.07ms (2.87ms to 5.94ms)
    • No huge outliers (p99.9 is only 8.21ms)
    • Raft consensus is predictable, not spiky

    Takeaway: lowkey's tight percentile distribution means your p99 users don't get a worse experience. Consensus is expensive, but at least it's predictable.

    Run the percentiles yourself:

    bash
    make bench-percentiles

    Design Decisions Recap

    1. CP over AP → Consistency is non-negotiable for locks
    2. Monotonic time → Wall clocks are lies, monotonic clocks are truth
    3. Lease-based locks → Auto-cleanup on client failure
    4. Fencing tokens → Math prevents stale writes
    5. Raft consensus → Proven, understood, battle-tested
    6. Leader-only heartbeats → 10-100x performance improvement
    7. gRPC + HTTP → Flexibility for all languages

    What I Learned Building This

    Time is broken in distributed systems:

    • Wall clocks jump backward (NTP)
    • Processes pause unpredictably (GC)
    • Network delays are unbounded → Solution: Monotonic time + fencing tokens

    You can't cheat the CAP theorem:

    • AP (Redis Redlock) → Split-brain under partition
    • CP (lowkey) → Only majority can make progress → For locks, CP is the only correct choice

    Consensus is expensive:

    • Raft consensus: ~3-5ms per operation
    • Leader-only ops: ~0.5ms per operation → Use consensus sparingly, optimize the rest

    Fencing tokens are mandatory:

    • Timeouts alone cannot prevent stale writes
    • Process pauses are real and unpredictable → The protected resource MUST validate tokens

    Try It Yourself

    bash
    // Clone the repository
    git clone https://github.com/pixperk/lowkey.git
    cd lowkey
    
    // Build the binary
    go build -o lowkey cmd/lowkey/main.go
    
    // Start the server
    ./lowkey --bootstrap --data-dir ./data
    
    // Use the SDK
    go get github.com/pixperk/lowkey/pkg/client

    Full documentation, examples, and benchmarks on GitHub.

    The Bottom Line

    Distributed locks are genuinely hard. Here's why:

    1. Time is unreliable (wall clocks jump, processes pause)
    2. Networks are unreliable (partitions happen)
    3. Processes are unreliable (crashes, GC pauses)

    But they're solvable with the right primitives:

    1. Raft consensus → Strong consistency (CP in CAP)
    2. Fencing tokens → Mathematical safety against stale writes
    3. Monotonic time → Immune to clock drift
    4. Leases → Automatic cleanup on failures

    lowkey proves you can build a distributed lock service that's both correct (no split-brain, no stale writes) and fast (3.24ms, faster than etcd).

    The real lesson? Distributed systems require paranoia.

    Assume clocks lie. Assume networks partition. Assume processes pause at the worst possible moment. Then design systems that work anyway.

    That's lowkey.

    Want more like this?
    Buy me a coffeeSupport