Real-time Communication

Table of Contents


Overview

The platform delivers real-time features across micro frontends, including:

  • Live updates -- dashboard metrics, feed items, and status indicators that reflect server-side changes within milliseconds.
  • Collaborative editing -- multiple users can view and modify the same document or configuration simultaneously, with operational transforms or CRDT-based conflict resolution.
  • Notifications -- push-style alerts delivered instantly to connected clients without polling.
  • Presence -- awareness of who is online, what page they are viewing, and whether they are actively typing or idle.

Architecture

The data flow for all real-time features follows a consistent path:

Real-time data flow: Browser to Worker to Durable Object to Hibernatable WebSocket Browser React app WebSocket client WSS Worker Edge routing JWT authentication stub.fetch Durable Object Stateful coordination Single-writer consistency Hibernatable WebSocket Persistent bidirectional Client-side Cloudflare Edge (zero infrastructure to manage)
  1. The browser opens a WebSocket connection to a Cloudflare Worker endpoint.
  2. The Worker authenticates the request, resolves the correct Durable Object ID, and forwards the upgrade request.
  3. The Durable Object accepts the WebSocket using the Hibernatable WebSocket API, attaches metadata, and manages all message routing.
  4. The Hibernatable WebSocket keeps the connection alive even when the Durable Object has no active work, dramatically reducing cost.

Why Durable Objects Over Traditional WebSocket Servers

ConcernTraditional ServerDurable Objects
DeploymentProvision and manage VMs or containersEdge-native, zero infrastructure
ScalingManual horizontal scaling + sticky sessionsAutomatic per-object scaling across Cloudflare's network
Server managementOS patching, load balancers, health checksFully managed by Cloudflare
Cost during idlePay for always-on serversHibernation eliminates CPU charges during idle periods
Global latencySingle-region or multi-region replicationRuns on the edge closest to the coordinating user
State consistencyExternal databases + cache invalidationSingle-threaded per-object with co-located storage

Durable Objects eliminate the operational overhead of running WebSocket infrastructure while providing strong single-writer consistency guarantees per coordination boundary.


Hibernatable WebSocket API

WebSocket hibernation lifecycle: connected, hibernating, wake on message, process, hibernate again Hibernation Lifecycle Connected Active processing idle Hibernating No CPU charges message Wake Isolate restored Process webSocketMessage() all connections quiet again more msgs Attachments survive Tags + serialized metadata In-memory state lost Use ctx.storage to persist

What is Hibernation

Durable Objects support a hibernation mechanism that fundamentally changes the cost model for long-lived WebSocket connections:

  • During active processing, the Durable Object runs normally, executes JavaScript, and is billed for CPU time.
  • When all WebSocket connections are idle (no incoming messages, no pending alarms), the Durable Object can enter hibernation.
  • During hibernation:
    • No CPU charges are incurred.
    • All WebSocket connections remain open and functional from the client's perspective.
    • In-memory state is discarded. Any state that must survive hibernation needs to be stored via this.ctx.storage or serialized into WebSocket attachments.
    • The Durable Object's JavaScript isolate is evicted from memory.
  • The Durable Object wakes up when:
    • An incoming WebSocket message arrives (triggers webSocketMessage()).
    • A previously scheduled alarm fires (triggers alarm()).
    • A new HTTP request arrives (triggers fetch()).

Billing during hibernation: While the Durable Object is hibernating, no CPU or duration charges are incurred. You are only billed for storage and for any incoming WebSocket messages that wake the DO. This makes it economically viable to maintain thousands of idle WebSocket connections for presence and notification channels.

API Pattern

The Hibernatable WebSocket API uses event handlers on the Durable Object class instead of attaching listeners to individual WebSocket objects. This is what allows Cloudflare to evict the isolate from memory while keeping connections open.

import { DurableObject } from 'cloudflare:workers';

interface Env {
  CHAT_ROOMS: DurableObjectNamespace<ChatRoom>;
}

interface WebSocketAttachment {
  userId: string;
  displayName: string;
  role: 'admin' | 'member' | 'viewer';
  joinedAt: number;
}

interface IncomingMessage {
  type: 'chat' | 'typing' | 'presence' | 'ping';
  payload: Record<string, unknown>;
}

interface OutgoingMessage {
  type: string;
  payload: Record<string, unknown>;
  sender: string;
  timestamp: number;
}

export class ChatRoom extends DurableObject<Env> {
  private messageHistory: OutgoingMessage[] = [];

  /**
   * Handles the initial HTTP request to upgrade to a WebSocket connection.
   * The Worker forwards the upgrade request here after authentication.
   */
  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);

    // Verify this is a WebSocket upgrade request
    if (request.headers.get('Upgrade') !== 'websocket') {
      return new Response('Expected WebSocket upgrade', { status: 426 });
    }

    // Extract authenticated user info (set by the Worker after JWT validation)
    const userId = url.searchParams.get('userId');
    const displayName = url.searchParams.get('displayName') ?? 'Anonymous';
    const role = (url.searchParams.get('role') as WebSocketAttachment['role']) ?? 'member';

    if (!userId) {
      return new Response('Missing userId', { status: 400 });
    }

    // Create the WebSocket pair
    const pair = new WebSocketPair();
    const [client, server] = Object.values(pair);

    // Accept the server-side WebSocket using the Hibernatable API.
    // Tags enable efficient filtered broadcasting later.
    this.ctx.acceptWebSocket(server, [`user:${userId}`, `role:${role}`]);

    // Attach metadata that survives hibernation
    const attachment: WebSocketAttachment = {
      userId,
      displayName,
      role,
      joinedAt: Date.now(),
    };
    server.serializeAttachment(attachment);

    // Notify existing participants of the new connection
    this.broadcast({
      type: 'user-joined',
      payload: { userId, displayName },
      sender: 'system',
      timestamp: Date.now(),
    }, server);

    // Send recent message history to the new client
    const recentMessages = await this.getRecentMessages(50);
    server.send(JSON.stringify({
      type: 'history',
      payload: { messages: recentMessages },
      sender: 'system',
      timestamp: Date.now(),
    }));

    // Return the client-side WebSocket to the Worker, which returns it to the browser
    return new Response(null, { status: 101, webSocket: client });
  }

  /**
   * Called when any connected WebSocket sends a message.
   * This is the core handler that wakes the DO from hibernation.
   */
  async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
    const attachment = ws.deserializeAttachment() as WebSocketAttachment;

    // Handle binary messages if needed
    if (message instanceof ArrayBuffer) {
      console.warn(`Binary message from ${attachment.userId}, ignoring`);
      return;
    }

    let data: IncomingMessage;
    try {
      data = JSON.parse(message);
    } catch {
      ws.send(JSON.stringify({ type: 'error', payload: { message: 'Invalid JSON' } }));
      return;
    }

    switch (data.type) {
      case 'chat': {
        const outgoing: OutgoingMessage = {
          type: 'chat',
          payload: {
            ...data.payload,
            messageId: crypto.randomUUID(),
          },
          sender: attachment.userId,
          timestamp: Date.now(),
        };

        // Persist the message
        await this.storeMessage(outgoing);

        // Broadcast to all connected clients including the sender (for confirmation)
        this.broadcast(outgoing);
        break;
      }

      case 'typing': {
        // Typing indicators are transient -- broadcast but do not persist
        this.broadcast({
          type: 'typing',
          payload: {
            userId: attachment.userId,
            displayName: attachment.displayName,
            isTyping: data.payload.isTyping ?? false,
          },
          sender: attachment.userId,
          timestamp: Date.now(),
        }, ws); // Exclude the sender
        break;
      }

      case 'presence': {
        // Update presence state in storage
        await this.ctx.storage.put(`presence:${attachment.userId}`, {
          status: data.payload.status,
          lastSeen: Date.now(),
        });

        this.broadcast({
          type: 'presence',
          payload: {
            userId: attachment.userId,
            status: data.payload.status,
          },
          sender: 'system',
          timestamp: Date.now(),
        });
        break;
      }

      case 'ping': {
        ws.send(JSON.stringify({ type: 'pong', payload: {}, sender: 'system', timestamp: Date.now() }));
        break;
      }

      default: {
        ws.send(JSON.stringify({
          type: 'error',
          payload: { message: `Unknown message type: ${data.type}` },
          sender: 'system',
          timestamp: Date.now(),
        }));
      }
    }
  }

  /**
   * Called when a WebSocket connection closes cleanly or abnormally.
   */
  async webSocketClose(ws: WebSocket, code: number, reason: string): Promise<void> {
    const attachment = ws.deserializeAttachment() as WebSocketAttachment;

    // Close the server side of the connection
    ws.close(code, reason);

    // Remove presence data
    await this.ctx.storage.delete(`presence:${attachment.userId}`);

    // Notify remaining participants
    this.broadcast({
      type: 'user-left',
      payload: {
        userId: attachment.userId,
        displayName: attachment.displayName,
      },
      sender: 'system',
      timestamp: Date.now(),
    });

    // If no connections remain, schedule cleanup
    if (this.ctx.getWebSockets().length === 0) {
      await this.ctx.storage.setAlarm(Date.now() + 60_000); // 1 minute
    }
  }

  /**
   * Called when a WebSocket connection encounters an error.
   */
  async webSocketError(ws: WebSocket, error: unknown): Promise<void> {
    const attachment = ws.deserializeAttachment() as WebSocketAttachment | null;
    const userId = attachment?.userId ?? 'unknown';
    console.error(`WebSocket error for user ${userId}:`, error);

    // Attempt to close the errored connection
    try {
      ws.close(1011, 'Internal error');
    } catch {
      // Connection may already be closed
    }
  }

  /**
   * Alarm handler for scheduled tasks (cleanup, state compaction, etc.).
   */
  async alarm(): Promise<void> {
    const connections = this.ctx.getWebSockets();

    if (connections.length === 0) {
      // No connections for the full cleanup window -- purge old messages
      const keys = await this.ctx.storage.list({ prefix: 'msg:' });
      const cutoff = Date.now() - 24 * 60 * 60 * 1000; // 24 hours

      const toDelete: string[] = [];
      for (const [key, value] of keys) {
        const msg = value as OutgoingMessage;
        if (msg.timestamp < cutoff) {
          toDelete.push(key);
        }
      }

      if (toDelete.length > 0) {
        await this.ctx.storage.delete(toDelete);
      }
    }
  }

  // --- Private helpers ---

  private broadcast(message: OutgoingMessage, exclude?: WebSocket): void {
    const raw = JSON.stringify(message);
    for (const ws of this.ctx.getWebSockets()) {
      if (ws !== exclude) {
        try {
          ws.send(raw);
        } catch {
          // Connection may be closing; webSocketClose will handle cleanup
        }
      }
    }
  }

  private async storeMessage(message: OutgoingMessage): Promise<void> {
    const key = `msg:${message.timestamp}:${(message.payload as { messageId: string }).messageId}`;
    await this.ctx.storage.put(key, message);

    // Also keep in memory for quick access (reset on hibernation)
    this.messageHistory.push(message);
    if (this.messageHistory.length > 200) {
      this.messageHistory = this.messageHistory.slice(-100);
    }
  }

  private async getRecentMessages(count: number): Promise<OutgoingMessage[]> {
    // Try in-memory cache first (empty after hibernation)
    if (this.messageHistory.length > 0) {
      return this.messageHistory.slice(-count);
    }

    // Fall back to storage
    const entries = await this.ctx.storage.list<OutgoingMessage>({
      prefix: 'msg:',
      reverse: true,
      limit: count,
    });

    const messages = [...entries.values()].reverse();
    this.messageHistory = messages;
    return messages;
  }
}

The corresponding Worker that routes requests to the Durable Object:

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    // Example: /ws/room/:roomId
    const match = url.pathname.match(/^\/ws\/room\/(.+)$/);
    if (!match) {
      return new Response('Not found', { status: 404 });
    }

    const roomId = match[1];

    // Authenticate the request (JWT, cookie, etc.)
    const auth = await authenticateRequest(request);
    if (!auth) {
      return new Response('Unauthorized', { status: 401 });
    }

    // Resolve the Durable Object by a deterministic name
    const id = env.CHAT_ROOMS.idFromName(`room:${roomId}`);
    const stub = env.CHAT_ROOMS.get(id);

    // Forward the upgrade request with auth context
    const doUrl = new URL(request.url);
    doUrl.searchParams.set('userId', auth.userId);
    doUrl.searchParams.set('displayName', auth.displayName);
    doUrl.searchParams.set('role', auth.role);

    return stub.fetch(new Request(doUrl.toString(), request));
  },
} satisfies ExportedHandler<Env>;

Durable Object Structure

One DO Per Coordination Boundary

Each Durable Object instance represents a single coordination boundary. All participants within that boundary connect to the same DO instance, which provides single-threaded consistency:

Use CaseDO BoundaryExample ID
Chat roomOne DO per roomroom:engineering-general
Document editingOne DO per documentdoc:invoice-template-42
Presence trackingOne DO per organizationpresence:org-acme-corp
User notificationsOne DO per usernotifications:user-abc123
Dashboard updatesOne DO per dashboarddashboard:sales-overview

Naming Convention

Use a consistent prefix-based naming scheme for Durable Object IDs:

// In the Worker, resolve the DO ID by name
function getDurableObjectId(
  namespace: DurableObjectNamespace,
  type: string,
  identifier: string,
): DurableObjectId {
  return namespace.idFromName(`${type}:${identifier}`);
}

// Examples:
const roomId = getDurableObjectId(env.CHAT_ROOMS, 'room', roomSlug);
const presenceId = getDurableObjectId(env.PRESENCE, 'presence', orgId);
const notifId = getDurableObjectId(env.NOTIFICATIONS, 'notifications', userId);

Message Routing

Inside the Durable Object, route incoming messages to specialized handlers based on message type. This keeps the webSocketMessage handler clean and makes it straightforward to add new message types:

type MessageHandler = (ws: WebSocket, payload: Record<string, unknown>, attachment: WebSocketAttachment) => Promise<void>;

private handlers: Record<string, MessageHandler> = {
  'chat': this.handleChat.bind(this),
  'typing': this.handleTyping.bind(this),
  'presence': this.handlePresence.bind(this),
  'cursor-move': this.handleCursorMove.bind(this),
  'selection-change': this.handleSelectionChange.bind(this),
  'ping': this.handlePing.bind(this),
};

async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
  const attachment = ws.deserializeAttachment() as WebSocketAttachment;
  const data = JSON.parse(message as string) as IncomingMessage;

  const handler = this.handlers[data.type];
  if (handler) {
    await handler(ws, data.payload, attachment);
  } else {
    ws.send(JSON.stringify({ type: 'error', payload: { message: `Unknown type: ${data.type}` } }));
  }
}

State Management

Durable Objects offer two tiers of state:

Persistent state via this.ctx.storage -- survives hibernation, restarts, and redeployments:

// Key-value storage operations
await this.ctx.storage.put('room:metadata', { name: 'General', createdAt: Date.now() });
const metadata = await this.ctx.storage.get<RoomMetadata>('room:metadata');

// Batch operations for efficiency (counts as a single storage operation)
await this.ctx.storage.put({
  'user:alice:presence': { status: 'online', lastSeen: Date.now() },
  'user:bob:presence': { status: 'away', lastSeen: Date.now() },
});

// List with prefix scan
const allPresence = await this.ctx.storage.list({ prefix: 'user:', limit: 100 });

Transient in-memory state via class properties -- fast but lost on hibernation:

export class ChatRoom extends DurableObject<Env> {
  // In-memory caches (rebuilt on wake from hibernation)
  private messageCache: OutgoingMessage[] = [];
  private typingUsers: Map<string, number> = new Map();
  private connectionCount = 0;

  // These reset to their initial values after hibernation.
  // Rebuild them in the first handler call after waking up.
}

Use persistent storage for data that must not be lost (messages, user presence state, configuration). Use in-memory state for caches and frequently changing transient data (typing indicators, cursor positions) that can be derived from persistent state or reconstructed from connected clients.


Connection Management

serializeAttachment / deserializeAttachment

Every WebSocket accepted via the Hibernatable API can carry an attachment -- a JSON-serializable object that persists across hibernation cycles. This is the primary mechanism for identifying connections.

interface WebSocketAttachment {
  userId: string;
  displayName: string;
  role: 'admin' | 'member' | 'viewer';
  orgId: string;
  joinedAt: number;
  lastMessageId?: string;
}

// On connection accept:
async fetch(request: Request): Promise<Response> {
  const pair = new WebSocketPair();
  const [client, server] = Object.values(pair);

  this.ctx.acceptWebSocket(server);

  // Serialize metadata onto the WebSocket
  const attachment: WebSocketAttachment = {
    userId: 'user-abc',
    displayName: 'Alice',
    role: 'admin',
    orgId: 'org-acme',
    joinedAt: Date.now(),
  };
  server.serializeAttachment(attachment);

  return new Response(null, { status: 101, webSocket: client });
}

// In any message handler -- even after hibernation:
async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
  const attachment = ws.deserializeAttachment() as WebSocketAttachment;

  console.log(`Message from ${attachment.displayName} (${attachment.userId})`);

  // Update the attachment if needed
  attachment.lastMessageId = 'msg-xyz';
  ws.serializeAttachment(attachment);
}

Key properties of attachments:

  • Survive hibernation: the attachment is serialized to disk when the DO hibernates and restored when it wakes.
  • Per-connection: each WebSocket has its own independent attachment.
  • Mutable: call serializeAttachment() again to update the stored data.
  • Size limit: keep attachments small (under a few KB) as they are serialized on every hibernation cycle.

Connection Lifecycle

The full lifecycle of a WebSocket connection through the system:

1. Client initiates connection
   Browser: new WebSocket('wss://api.example.com/ws/room/general')

2. Worker authenticates and routes
   Worker: validate JWT → resolve DO ID → forward upgrade request

3. Durable Object accepts connection
   DO: acceptWebSocket(server, tags) → serializeAttachment(metadata)
       → broadcast user-joined → send message history

4. Bidirectional communication
   Client → webSocketMessage() → process → broadcast/respond
   DO → broadcast/targeted send → Client

5. Hibernation (when idle)
   All connections quiet → isolate evicted → connections stay open
   Incoming message → isolate restored → webSocketMessage() fires
   Attachments and tags are preserved

6. Client disconnects
   Browser closes tab / network drops / explicit close
   → webSocketClose(ws, code, reason) fires
   → cleanup presence → broadcast user-left
   → if no connections remain, schedule alarm for cleanup

7. Scheduled cleanup
   alarm() fires → purge stale data → compact storage

Broadcasting Patterns

Broadcasting patterns: broadcast to all (fan-out) versus broadcast with tags (filtered) Broadcast to All (fan-out) Durable Object Client A Client B Client C Client D getWebSockets() All receive msg Broadcast with Tags (filtered) Durable Object Client A role:admin Client B role:member Client C role:viewer Client D role:admin getWebSockets ('role:admin') Only admins

Broadcast to All

The most common pattern: send a message to every connected WebSocket in the Durable Object.

/**
 * Broadcast a message to all connected WebSockets.
 * Optionally exclude a specific connection (e.g., the sender).
 */
private broadcast(
  message: OutgoingMessage,
  options?: { exclude?: WebSocket; onlyTo?: string[] },
): void {
  const raw = JSON.stringify(message);
  const sockets = this.ctx.getWebSockets();

  for (const ws of sockets) {
    if (options?.exclude && ws === options.exclude) {
      continue;
    }

    if (options?.onlyTo) {
      const attachment = ws.deserializeAttachment() as WebSocketAttachment;
      if (!options.onlyTo.includes(attachment.userId)) {
        continue;
      }
    }

    try {
      ws.send(raw);
    } catch {
      // The connection may be in a closing state.
      // webSocketClose() will handle cleanup.
    }
  }
}

// Usage examples:

// Broadcast to everyone
this.broadcast({ type: 'chat', payload: { text: 'Hello' }, sender: 'alice', timestamp: Date.now() });

// Broadcast to everyone except the sender
this.broadcast(message, { exclude: senderWs });

// Send to specific users only
this.broadcast(notification, { onlyTo: ['user-bob', 'user-carol'] });

Broadcast with Tags

Tags provide a more efficient way to filter WebSocket connections without deserializing every attachment. Tags are set when accepting the WebSocket and can be used to retrieve a filtered subset.

// Accept a WebSocket with tags
async fetch(request: Request): Promise<Response> {
  const pair = new WebSocketPair();
  const [client, server] = Object.values(pair);

  const userId = 'user-abc';
  const role = 'admin';
  const teamId = 'team-frontend';

  // Tags allow efficient filtering via getWebSockets(tag)
  this.ctx.acceptWebSocket(server, [
    `user:${userId}`,
    `role:${role}`,
    `team:${teamId}`,
  ]);

  server.serializeAttachment({ userId, role, teamId });

  return new Response(null, { status: 101, webSocket: client });
}

// Broadcast only to admins
private broadcastToAdmins(message: OutgoingMessage): void {
  const raw = JSON.stringify(message);
  const adminSockets = this.ctx.getWebSockets('role:admin');

  for (const ws of adminSockets) {
    try {
      ws.send(raw);
    } catch {
      // handled by webSocketClose
    }
  }
}

// Send a direct message to a specific user
private sendToUser(userId: string, message: OutgoingMessage): void {
  const raw = JSON.stringify(message);
  const userSockets = this.ctx.getWebSockets(`user:${userId}`);

  for (const ws of userSockets) {
    try {
      ws.send(raw);
    } catch {
      // handled by webSocketClose
    }
  }
}

// Broadcast to a team channel
private broadcastToTeam(teamId: string, message: OutgoingMessage): void {
  const raw = JSON.stringify(message);
  const teamSockets = this.ctx.getWebSockets(`team:${teamId}`);

  for (const ws of teamSockets) {
    try {
      ws.send(raw);
    } catch {
      // handled by webSocketClose
    }
  }
}

Tag limits and guidelines:

  • Each WebSocket can have up to 10 tags.
  • Each tag can be up to 256 bytes.
  • Tags are immutable after acceptWebSocket() -- to change tags, the client must reconnect.
  • Use tags for coarse, stable groupings (role, team, room). Use attachment metadata for fine-grained, mutable filtering.

Message Batching for High-Frequency Updates

Problem

Features like collaborative cursor tracking, live typing indicators, or real-time dashboard metrics can generate dozens of updates per second per user. Sending each update as an individual WebSocket message creates unnecessary overhead:

  • Each send() call incurs serialization cost.
  • Receiving clients process many small messages instead of fewer batched updates.
  • Network overhead from framing many tiny messages.

Solution

Buffer incoming high-frequency messages and flush them on a short interval using Durable Object alarms:

import { DurableObject } from 'cloudflare:workers';

interface CursorUpdate {
  userId: string;
  x: number;
  y: number;
  timestamp: number;
}

interface BatchedMessage {
  type: 'cursor-batch';
  payload: {
    cursors: CursorUpdate[];
  };
  timestamp: number;
}

export class CollaborativeDocument extends DurableObject<Env> {
  /**
   * Buffer for high-frequency updates.
   * Maps userId to their latest cursor position (only the latest matters).
   */
  private cursorBuffer: Map<string, CursorUpdate> = new Map();

  /**
   * Whether a flush alarm is already scheduled.
   */
  private flushScheduled = false;

  /**
   * Flush interval in milliseconds.
   * 50ms gives a good balance between latency and batching efficiency.
   */
  private static readonly FLUSH_INTERVAL_MS = 50;

  async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
    const data = JSON.parse(message as string);

    switch (data.type) {
      case 'cursor-move': {
        const attachment = ws.deserializeAttachment() as WebSocketAttachment;

        // Buffer the cursor update (overwrite previous position for this user)
        this.cursorBuffer.set(attachment.userId, {
          userId: attachment.userId,
          x: data.payload.x,
          y: data.payload.y,
          timestamp: Date.now(),
        });

        // Schedule a flush if one is not already pending
        await this.scheduleFlush();
        break;
      }

      case 'document-edit': {
        // Critical updates bypass batching and are sent immediately
        this.broadcast({
          type: 'document-edit',
          payload: data.payload,
          sender: (ws.deserializeAttachment() as WebSocketAttachment).userId,
          timestamp: Date.now(),
        }, ws);
        break;
      }
    }
  }

  async alarm(): Promise<void> {
    // Flush the cursor buffer
    if (this.cursorBuffer.size > 0) {
      const cursors = Array.from(this.cursorBuffer.values());
      this.cursorBuffer.clear();
      this.flushScheduled = false;

      const batchedMessage: BatchedMessage = {
        type: 'cursor-batch',
        payload: { cursors },
        timestamp: Date.now(),
      };

      const raw = JSON.stringify(batchedMessage);
      for (const ws of this.ctx.getWebSockets()) {
        const attachment = ws.deserializeAttachment() as WebSocketAttachment;
        // Filter out the user's own cursor and skip if nothing remains
        const relevantCursors = cursors.filter((c) => c.userId !== attachment.userId);
        if (relevantCursors.length > 0) {
          const filtered = JSON.stringify({
            ...batchedMessage,
            payload: { cursors: relevantCursors },
          });
          try {
            ws.send(filtered);
          } catch {
            // handled by webSocketClose
          }
        }
      }
    }

    // Reschedule if new data arrived during the flush
    if (this.cursorBuffer.size > 0) {
      await this.scheduleFlush();
    }
  }

  private async scheduleFlush(): Promise<void> {
    if (!this.flushScheduled) {
      this.flushScheduled = true;

      const currentAlarm = await this.ctx.storage.getAlarm();
      if (currentAlarm === null) {
        await this.ctx.storage.setAlarm(
          Date.now() + CollaborativeDocument.FLUSH_INTERVAL_MS,
        );
      }
    }
  }

  private broadcast(message: OutgoingMessage, exclude?: WebSocket): void {
    const raw = JSON.stringify(message);
    for (const ws of this.ctx.getWebSockets()) {
      if (ws !== exclude) {
        try { ws.send(raw); } catch { /* handled by webSocketClose */ }
      }
    }
  }
}

Batching Guidelines

Update TypeRecommended IntervalStrategy
Cursor positions50msKeep only latest per user
Typing indicators100msDeduplicate by user
Dashboard metrics200-500msAggregate/average values
Presence heartbeats5,000-10,000msCoalesce per user
Chat messagesNo batchingSend immediately
Document editsNo batchingSend immediately (order matters)

The key principle: batch observational data, send operational data immediately. Users tolerate 50-100ms latency on cursor positions, but chat messages and document edits should be delivered as fast as possible.


Reconnection Strategy

Reconnection strategy: disconnect, exponential backoff at 1s, 2s, 4s, 8s, then reconnect and restore state Disconnect close event Exponential Backoff + Jitter 0.5s attempt 1 1s attempt 2 2s attempt 3 4s attempt 4 ... Reconnect WebSocket open Restore State replay missed msgs Capped at maxDelay (30s) | Full jitter prevents thundering herd | Max 20 attempts before giving up

Why Reconnection is Mandatory

WebSocket connections will drop. There is no way around this. Every client must implement reconnection logic because:

  1. Cloudflare deployments: When a new version of the Worker or Durable Object is deployed, Cloudflare terminates all existing WebSocket connections to that DO. This happens on every wrangler deploy.
  2. Network interruptions: WiFi switches, cellular handoffs, VPN reconnects, and ISP blips.
  3. Mobile backgrounding: iOS and Android aggressively close WebSocket connections when apps move to the background.
  4. Idle timeouts: While Hibernatable WebSockets keep the server side alive, clients or intermediary proxies may close idle connections.
  5. DO eviction: In rare cases, Cloudflare may need to migrate a Durable Object to a different machine.

Treating reconnection as an exceptional error case rather than a normal operating condition is the most common source of bugs in real-time features.

Client-side Reconnection

Implement exponential backoff with jitter to avoid thundering-herd reconnection storms:

interface ReconnectConfig {
  /** Initial delay before first reconnection attempt (ms) */
  initialDelay: number;
  /** Maximum delay between reconnection attempts (ms) */
  maxDelay: number;
  /** Multiplier applied to delay after each failed attempt */
  backoffMultiplier: number;
  /** Maximum number of reconnection attempts before giving up */
  maxAttempts: number;
  /** Maximum number of messages to queue while disconnected. When exceeded, oldest messages are dropped. */
  maxQueueSize: number;
}

const DEFAULT_RECONNECT_CONFIG: ReconnectConfig = {
  initialDelay: 500,
  maxDelay: 30_000,
  backoffMultiplier: 2,
  maxAttempts: 20,
  maxQueueSize: 100,
};

class ReconnectingWebSocket {
  private ws: WebSocket | null = null;
  private attempt = 0;
  private lastMessageId: string | null = null;
  private reconnectTimer: ReturnType<typeof setTimeout> | null = null;
  private messageQueue: unknown[] = [];
  private config: ReconnectConfig;

  constructor(
    private url: string,
    config?: Partial<ReconnectConfig>,
  ) {
    this.config = { ...DEFAULT_RECONNECT_CONFIG, ...config };
    this.connect();
  }

  private connect(): void {
    // Append lastMessageId for server-side replay on reconnect
    const connectUrl = this.lastMessageId
      ? `${this.url}${this.url.includes('?') ? '&' : '?'}lastMessageId=${this.lastMessageId}`
      : this.url;

    this.ws = new WebSocket(connectUrl);

    this.ws.addEventListener('open', () => {
      console.log('WebSocket connected');
      this.attempt = 0; // Reset backoff on successful connection
      this.onStatusChange?.('connected');
      this.flushQueue();
    });

    this.ws.addEventListener('message', (event) => {
      const data = JSON.parse(event.data);

      // Track the latest message ID for reconnection replay
      if (data.payload?.messageId) {
        this.lastMessageId = data.payload.messageId;
      }

      this.onMessage?.(data);
    });

    this.ws.addEventListener('close', (event) => {
      console.log(`WebSocket closed: ${event.code} ${event.reason}`);
      this.ws = null;
      this.scheduleReconnect();
    });

    this.ws.addEventListener('error', () => {
      // The close event always follows an error event; reconnection is handled there.
      console.error('WebSocket error');
    });
  }

  private scheduleReconnect(): void {
    if (this.attempt >= this.config.maxAttempts) {
      console.error(`Giving up after ${this.attempt} reconnection attempts`);
      this.onStatusChange?.('failed');
      return;
    }

    this.onStatusChange?.('reconnecting');

    // Exponential backoff with ceiling and full jitter to prevent thundering herd.
    // The ceiling (maxDelay) caps the exponential growth.
    // Full jitter randomizes the actual delay between [0, capped_delay] so that
    // clients that reconnect simultaneously spread their retries over time.
    const capped = Math.min(
      this.config.initialDelay * Math.pow(this.config.backoffMultiplier, this.attempt),
      this.config.maxDelay,
    );
    const delay = Math.random() * capped;

    console.log(`Reconnecting in ${Math.round(delay)}ms (attempt ${this.attempt + 1}/${this.config.maxAttempts})`);

    this.reconnectTimer = setTimeout(() => {
      this.attempt++;
      this.connect();
    }, delay);
  }

  send(data: unknown): void {
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(data));
    } else {
      // Enforce max queue size by dropping oldest messages when the limit is reached
      if (this.messageQueue.length >= this.config.maxQueueSize) {
        const dropped = this.messageQueue.shift();
        console.warn(
          `Message queue full (${this.config.maxQueueSize}). Dropping oldest message:`,
          dropped,
        );
      }
      this.messageQueue.push(data);
    }
  }

  close(): void {
    if (this.reconnectTimer) {
      clearTimeout(this.reconnectTimer);
    }
    this.ws?.close(1000, 'Client closing');
  }

  // Callbacks
  onMessage?: (data: unknown) => void;
  onStatusChange?: (status: 'connecting' | 'connected' | 'reconnecting' | 'failed') => void;

  private flushQueue(): void {
    while (this.messageQueue.length > 0 && this.ws?.readyState === WebSocket.OPEN) {
      const msg = this.messageQueue.shift();
      this.ws.send(JSON.stringify(msg));
    }
  }
}

Server-side Support

The Durable Object must support reconnection by storing recent messages and replaying missed ones:

export class ChatRoom extends DurableObject<Env> {
  /**
   * Handle WebSocket upgrade with optional reconnection replay.
   */
  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const lastMessageId = url.searchParams.get('lastMessageId');

    const pair = new WebSocketPair();
    const [client, server] = Object.values(pair);

    this.ctx.acceptWebSocket(server);
    server.serializeAttachment({
      userId: url.searchParams.get('userId'),
      joinedAt: Date.now(),
    });

    // If reconnecting, replay missed messages
    if (lastMessageId) {
      const missed = await this.getMessagesSince(lastMessageId);
      if (missed.length > 0) {
        server.send(JSON.stringify({
          type: 'replay',
          payload: {
            messages: missed,
            fromMessageId: lastMessageId,
          },
          sender: 'system',
          timestamp: Date.now(),
        }));
      }
    } else {
      // New connection -- send recent history
      const recent = await this.getRecentMessages(50);
      server.send(JSON.stringify({
        type: 'history',
        payload: { messages: recent },
        sender: 'system',
        timestamp: Date.now(),
      }));
    }

    return new Response(null, { status: 101, webSocket: client });
  }

  /**
   * Retrieve all messages stored after the given message ID.
   * Messages are stored with keys like `msg:<timestamp>:<messageId>`
   * so a prefix scan starting after the last known message returns
   * everything the client missed.
   */
  private async getMessagesSince(lastMessageId: string): Promise<OutgoingMessage[]> {
    // Find the key for the last known message
    const allMessages = await this.ctx.storage.list<OutgoingMessage>({
      prefix: 'msg:',
    });

    let found = false;
    const missed: OutgoingMessage[] = [];

    for (const [key, message] of allMessages) {
      if (found) {
        missed.push(message);
      }
      if ((message.payload as { messageId?: string }).messageId === lastMessageId) {
        found = true;
      }
    }

    // If the message was not found (pruned), send recent history instead
    if (!found) {
      return this.getRecentMessages(50);
    }

    return missed;
  }

  private async getRecentMessages(count: number): Promise<OutgoingMessage[]> {
    const entries = await this.ctx.storage.list<OutgoingMessage>({
      prefix: 'msg:',
      reverse: true,
      limit: count,
    });
    return [...entries.values()].reverse();
  }
}

React WebSocket Hook

A production-quality custom React hook for managing WebSocket connections within micro frontends. This hook lives in the shared-utils package and is consumed by every MFE that needs real-time communication.

import { useCallback, useEffect, useRef, useState } from 'react';

// --- Types ---

type WebSocketStatus = 'connecting' | 'connected' | 'disconnected' | 'reconnecting' | 'failed';

interface WebSocketOptions<TIncoming = unknown, TOutgoing = unknown> {
  /** Called when a parsed message is received */
  onMessage?: (message: TIncoming) => void;
  /** Called when the connection status changes */
  onStatusChange?: (status: WebSocketStatus) => void;
  /** Whether to connect automatically on mount. Defaults to true. */
  autoConnect?: boolean;
  /** Reconnection configuration */
  reconnect?: {
    enabled?: boolean;
    initialDelay?: number;
    maxDelay?: number;
    backoffMultiplier?: number;
    maxAttempts?: number;
    /** Maximum number of messages to queue while disconnected. Oldest messages are dropped when exceeded. Defaults to 100. */
    maxQueueSize?: number;
  };
  /** Protocols to request during the WebSocket handshake */
  protocols?: string | string[];
  /** Custom message serializer. Defaults to JSON.stringify. */
  serialize?: (message: TOutgoing) => string;
  /** Custom message deserializer. Defaults to JSON.parse. */
  deserialize?: (data: string) => TIncoming;
}

interface WebSocketReturn<TOutgoing = unknown> {
  /** Current connection status */
  status: WebSocketStatus;
  /** Send a typed message. Queues if not connected. */
  send: (message: TOutgoing) => void;
  /** The most recently received message */
  lastMessage: unknown | null;
  /** Manually trigger a reconnection */
  reconnect: () => void;
  /** Manually disconnect (disables auto-reconnect) */
  disconnect: () => void;
}

// --- Hook Implementation ---

export function useWebSocket<
  TIncoming = unknown,
  TOutgoing = unknown,
>(
  url: string | null,
  options: WebSocketOptions<TIncoming, TOutgoing> = {},
): WebSocketReturn<TOutgoing> {
  const {
    onMessage,
    onStatusChange,
    autoConnect = true,
    reconnect: reconnectConfig = {},
    protocols,
    serialize = JSON.stringify,
    deserialize = (data: string) => JSON.parse(data) as TIncoming,
  } = options;

  const {
    enabled: reconnectEnabled = true,
    initialDelay = 500,
    maxDelay = 30_000,
    backoffMultiplier = 2,
    maxAttempts = 20,
    maxQueueSize = 100,
  } = reconnectConfig;

  // --- State ---
  const [status, setStatus] = useState<WebSocketStatus>('disconnected');
  const [lastMessage, setLastMessage] = useState<unknown | null>(null);

  // --- Refs (stable across renders) ---
  const wsRef = useRef<WebSocket | null>(null);
  const attemptRef = useRef(0);
  const reconnectTimerRef = useRef<ReturnType<typeof setTimeout> | null>(null);
  const messageQueueRef = useRef<TOutgoing[]>([]);
  const intentionalCloseRef = useRef(false);
  const mountedRef = useRef(true);

  // Keep callback refs fresh without triggering reconnects
  const onMessageRef = useRef(onMessage);
  onMessageRef.current = onMessage;
  const onStatusChangeRef = useRef(onStatusChange);
  onStatusChangeRef.current = onStatusChange;

  // --- Status updater ---
  const updateStatus = useCallback((newStatus: WebSocketStatus) => {
    if (!mountedRef.current) return;
    setStatus(newStatus);
    onStatusChangeRef.current?.(newStatus);
  }, []);

  // --- Flush queued messages ---
  const flushQueue = useCallback(() => {
    const ws = wsRef.current;
    if (!ws || ws.readyState !== WebSocket.OPEN) return;

    while (messageQueueRef.current.length > 0) {
      const msg = messageQueueRef.current.shift()!;
      try {
        ws.send(serialize(msg));
      } catch (err) {
        console.error('Failed to send queued message:', err);
        // Put it back at the front
        messageQueueRef.current.unshift(msg);
        break;
      }
    }
  }, [serialize]);

  // --- Connect ---
  const connect = useCallback(() => {
    if (!url) return;

    // Clean up any existing connection
    if (wsRef.current) {
      wsRef.current.close(1000, 'Reconnecting');
      wsRef.current = null;
    }

    updateStatus('connecting');
    intentionalCloseRef.current = false;

    const ws = new WebSocket(url, protocols);
    wsRef.current = ws;

    ws.addEventListener('open', () => {
      if (!mountedRef.current) {
        ws.close();
        return;
      }

      attemptRef.current = 0;
      updateStatus('connected');
      flushQueue();
    });

    ws.addEventListener('message', (event: MessageEvent) => {
      if (!mountedRef.current) return;

      try {
        const parsed = deserialize(event.data as string);
        setLastMessage(parsed);
        onMessageRef.current?.(parsed);
      } catch (err) {
        console.error('Failed to parse WebSocket message:', err);
      }
    });

    ws.addEventListener('close', (event: CloseEvent) => {
      wsRef.current = null;

      if (!mountedRef.current) return;

      if (intentionalCloseRef.current) {
        updateStatus('disconnected');
        return;
      }

      // Attempt reconnection
      if (reconnectEnabled && attemptRef.current < maxAttempts) {
        updateStatus('reconnecting');

        // Exponential backoff with ceiling and full jitter to prevent thundering herd
        const capped = Math.min(
          initialDelay * Math.pow(backoffMultiplier, attemptRef.current),
          maxDelay,
        );
        const delay = Math.random() * capped;

        console.log(
          `WebSocket closed (${event.code}). Reconnecting in ${Math.round(delay)}ms ` +
          `(attempt ${attemptRef.current + 1}/${maxAttempts})`,
        );

        reconnectTimerRef.current = setTimeout(() => {
          if (mountedRef.current) {
            attemptRef.current++;
            connect();
          }
        }, delay);
      } else if (attemptRef.current >= maxAttempts) {
        console.error(`WebSocket reconnection failed after ${maxAttempts} attempts`);
        updateStatus('failed');
      } else {
        updateStatus('disconnected');
      }
    });

    ws.addEventListener('error', () => {
      // The close event will fire after this; reconnection is handled there.
      console.error('WebSocket error');
    });
  }, [
    url, protocols, reconnectEnabled, initialDelay, maxDelay,
    backoffMultiplier, maxAttempts, updateStatus, flushQueue, deserialize,
  ]);

  // --- Send ---
  const send = useCallback((message: TOutgoing) => {
    const ws = wsRef.current;
    if (ws && ws.readyState === WebSocket.OPEN) {
      try {
        ws.send(serialize(message));
      } catch (err) {
        console.error('Failed to send message:', err);
        enqueueMessage(message);
      }
    } else {
      // Queue the message for delivery after reconnection
      enqueueMessage(message);
    }

    function enqueueMessage(msg: TOutgoing): void {
      if (messageQueueRef.current.length >= maxQueueSize) {
        const dropped = messageQueueRef.current.shift();
        console.warn(`Message queue full (${maxQueueSize}). Dropping oldest message:`, dropped);
      }
      messageQueueRef.current.push(msg);
    }
  }, [serialize, maxQueueSize]);

  // --- Manual reconnect ---
  const manualReconnect = useCallback(() => {
    if (reconnectTimerRef.current) {
      clearTimeout(reconnectTimerRef.current);
      reconnectTimerRef.current = null;
    }
    attemptRef.current = 0;
    connect();
  }, [connect]);

  // --- Disconnect ---
  const disconnect = useCallback(() => {
    intentionalCloseRef.current = true;

    if (reconnectTimerRef.current) {
      clearTimeout(reconnectTimerRef.current);
      reconnectTimerRef.current = null;
    }

    if (wsRef.current) {
      wsRef.current.close(1000, 'Client disconnecting');
      wsRef.current = null;
    }

    updateStatus('disconnected');
  }, [updateStatus]);

  // --- Auto-connect on mount ---
  useEffect(() => {
    mountedRef.current = true;

    if (autoConnect && url) {
      connect();
    }

    return () => {
      mountedRef.current = false;

      if (reconnectTimerRef.current) {
        clearTimeout(reconnectTimerRef.current);
      }

      if (wsRef.current) {
        intentionalCloseRef.current = true;
        wsRef.current.close(1000, 'Component unmounting');
        wsRef.current = null;
      }
    };
  }, [url, autoConnect, connect]);

  return {
    status,
    send,
    lastMessage,
    reconnect: manualReconnect,
    disconnect,
  };
}

Usage Example

import { useWebSocket } from '@platform/shared-utils';

// Define typed message contracts
interface ChatIncoming {
  type: 'chat' | 'typing' | 'user-joined' | 'user-left' | 'history' | 'replay';
  payload: Record<string, unknown>;
  sender: string;
  timestamp: number;
}

interface ChatOutgoing {
  type: 'chat' | 'typing' | 'ping';
  payload: Record<string, unknown>;
}

function ChatPanel({ roomId }: { roomId: string }) {
  const [messages, setMessages] = useState<ChatIncoming[]>([]);

  const { status, send, reconnect } = useWebSocket<ChatIncoming, ChatOutgoing>(
    `wss://api.example.com/ws/room/${roomId}`,
    {
      onMessage: (msg) => {
        switch (msg.type) {
          case 'chat':
            setMessages((prev) => [...prev, msg]);
            break;
          case 'history':
          case 'replay':
            setMessages((prev) => [
              ...prev,
              ...(msg.payload.messages as ChatIncoming[]),
            ]);
            break;
        }
      },
      onStatusChange: (newStatus) => {
        console.log(`Chat connection: ${newStatus}`);
      },
      reconnect: {
        enabled: true,
        maxAttempts: 30,
      },
    },
  );

  const sendMessage = (text: string) => {
    send({ type: 'chat', payload: { text } });
  };

  return (
    <div>
      <ConnectionBadge status={status} onReconnect={reconnect} />
      <MessageList messages={messages} />
      <MessageInput onSend={sendMessage} disabled={status !== 'connected'} />
    </div>
  );
}

Integration with Module Federation

Shared Utils Package

The useWebSocket hook and related utilities live in the shared-utils package within the monorepo. This package is exposed via Module Federation so that every micro frontend can import the same implementation without bundling it separately.

packages/
  shared-utils/
    src/
      hooks/
        useWebSocket.ts        # The hook described above
        usePresence.ts         # Presence-specific hook built on useWebSocket
        useNotifications.ts    # Notification channel hook
      websocket/
        types.ts               # Shared message type definitions
        constants.ts           # Endpoints, reconnect defaults
      index.ts                 # Public API

Module Federation configuration in the shell:

// rsbuild.config.ts (shell)
new ModuleFederationPlugin({
  name: 'shell',
  shared: {
    '@platform/shared-utils': {
      singleton: true,
      requiredVersion: '^1.0.0',
    },
    react: { singleton: true, requiredVersion: '^19.2.4' },
    'react-dom': { singleton: true, requiredVersion: '^19.2.4' },
  },
});

MFE Independence

Each micro frontend imports and uses the WebSocket hook independently. There is no centralized WebSocket manager that all MFEs must communicate through:

// In the "orders" MFE
import { useWebSocket } from '@platform/shared-utils';

function OrderTracker({ orderId }: { orderId: string }) {
  const { status, lastMessage } = useWebSocket(
    `wss://api.example.com/ws/orders/${orderId}`,
  );
  // ...
}
// In the "analytics" MFE
import { useWebSocket } from '@platform/shared-utils';

function LiveDashboard({ dashboardId }: { dashboardId: string }) {
  const { status, lastMessage } = useWebSocket(
    `wss://api.example.com/ws/dashboard/${dashboardId}`,
  );
  // ...
}

Shell-Managed Connections

The shell application manages platform-wide WebSocket connections that span all MFEs -- specifically presence and notifications. These connections are established once when the user logs in and are shared with MFEs through the browser's CustomEvent API:

// Shell app -- establishes global connections
function ShellApp() {
  const { status: presenceStatus } = useWebSocket(
    `wss://api.example.com/ws/presence/${orgId}`,
    {
      onMessage: (msg) => {
        // Dispatch presence updates as CustomEvents for MFEs to consume
        window.dispatchEvent(
          new CustomEvent('platform:presence', { detail: msg }),
        );
      },
    },
  );

  const { status: notifStatus } = useWebSocket(
    `wss://api.example.com/ws/notifications/${userId}`,
    {
      onMessage: (msg) => {
        window.dispatchEvent(
          new CustomEvent('platform:notification', { detail: msg }),
        );
      },
    },
  );

  return <MicroFrontendContainer />;
}
// Any MFE can listen for shell-managed events
function usePresenceFromShell() {
  const [presenceMap, setPresenceMap] = useState<Map<string, PresenceInfo>>(new Map());

  useEffect(() => {
    const handler = (event: CustomEvent) => {
      const { userId, status } = event.detail.payload;
      setPresenceMap((prev) => {
        const next = new Map(prev);
        next.set(userId, { status, lastSeen: Date.now() });
        return next;
      });
    };

    window.addEventListener('platform:presence', handler as EventListener);
    return () => {
      window.removeEventListener('platform:presence', handler as EventListener);
    };
  }, []);

  return presenceMap;
}

Connection Ownership Summary

Connection TypeManaged ByScopeExample
PresenceShellOrganization-widews/presence/:orgId
NotificationsShellPer-userws/notifications/:userId
Chat roomsMFE (messaging)Per-roomws/room/:roomId
Document editingMFE (docs)Per-documentws/doc/:docId
Dashboard updatesMFE (analytics)Per-dashboardws/dashboard/:dashboardId
Order trackingMFE (orders)Per-orderws/orders/:orderId

Connection Count Monitoring and Alerting

Monitoring WebSocket connection counts is critical for detecting capacity issues, abnormal reconnection storms, and memory pressure before they affect users.

Server-side Monitoring

Expose connection metrics from the Durable Object by publishing counts on every connection and disconnection event. Use Cloudflare Workers Analytics Engine for time-series storage:

export class MonitoredChatRoom extends DurableObject<Env> {
  /**
   * Report the current connection count to the Analytics Engine.
   * Call this on every connect, disconnect, and periodically via alarm.
   */
  private reportConnectionMetrics(): void {
    const connections = this.ctx.getWebSockets();
    const totalConnections = connections.length;

    // Write to Analytics Engine for dashboarding and alerting
    this.env.WEBSOCKET_METRICS.writeDataPoint({
      blobs: [this.ctx.id.toString()],
      doubles: [totalConnections],
      indexes: ['connection_count'],
    });

    // Log a warning when approaching the recommended ceiling
    if (totalConnections > 4_000) {
      console.warn(
        `DO ${this.ctx.id.toString()} has ${totalConnections} connections — ` +
        `approaching the 5,000 recommended limit. Consider sharding.`,
      );
    }
  }

  async fetch(request: Request): Promise<Response> {
    // ... accept WebSocket ...
    this.reportConnectionMetrics();
    return new Response(null, { status: 101, webSocket: client });
  }

  async webSocketClose(ws: WebSocket, code: number, reason: string): Promise<void> {
    ws.close(code, reason);
    this.reportConnectionMetrics();
    // ... rest of cleanup ...
  }

  async alarm(): Promise<void> {
    // Periodically report metrics even when idle, so gaps in the time series
    // indicate the DO was hibernating rather than simply unreported.
    this.reportConnectionMetrics();
    // ... rest of alarm logic ...
  }
}

Alerting Thresholds

Configure alerts on the following conditions using Cloudflare Notifications or an external system consuming Analytics Engine data:

ConditionThresholdAction
High connection count> 4,000 per DOPrepare to shard; investigate whether a single coordination boundary is too broad
Rapid reconnection surge> 500 new connections/min to a single DOLikely deployment or network event; verify clients are using backoff with jitter
Connection count drop to zeroAll DOs in a namespace simultaneouslyPossible deployment or infrastructure incident; verify Worker routing
Sustained growth trendSteady daily increaseCapacity plan for sharding before hitting the 5,000 ceiling

Client-side Monitoring

Track reconnection events on the client to surface connectivity issues to observability backends:

onStatusChange: (status) => {
  // Emit to your telemetry pipeline (e.g., OpenTelemetry, Datadog, custom)
  telemetry.trackEvent('websocket_status_change', {
    status,
    url: wsUrl,
    timestamp: Date.now(),
  });

  if (status === 'failed') {
    telemetry.trackEvent('websocket_reconnection_exhausted', {
      url: wsUrl,
      timestamp: Date.now(),
    });
  }
},

Aggregate client-side reconnection metrics to detect patterns such as:

  • A spike in reconnecting events across many clients at the same time (indicates a server-side event).
  • Persistent failed status for specific regions or ISPs (indicates a network path issue).
  • High reconnection frequency for individual users (indicates flaky local network conditions).

Pricing and Limits

Understanding the resource limits and cost model is essential for designing the Durable Object topology. The following values reflect Cloudflare's Workers Paid plan.

Durable Object Limits

ResourceLimitNotes
Incoming requests per DO~1,000/secSoft limit; can be raised on Enterprise
WebSocket message size32 MiBMaximum per-message payload size
Durable Object storage10 GB per DOKey-value storage co-located with the DO
Storage read operations1,000/sec per DOBatch reads count as one operation
Storage write operations1,000/sec per DOBatch writes count as one operation
Subrequest limit1,000 per invocationOutbound fetch calls from the DO
CPU time per request30 secondsPer-invocation CPU wall-clock time
Tags per WebSocket10Set at accept time, immutable
Tag size256 bytesPer tag
Alarm precision~30 seconds minimumNot suitable for sub-second intervals

Hibernation Billing

Billing ComponentActiveHibernating
CPU time chargesFull rateNot charged
Duration chargesFull rateNot charged
Storage chargesStandardStandard (always applies)
WebSocket message chargesPer messagePer message (wakes the DO)

Note on Durable Object SQLite billing: Durable Object SQLite storage billing has been active since January 2026. If your Durable Objects use the SQLite-backed storage backend (the default for newly created DOs), be aware that reads and writes are billed per row read/written rather than per key-value operation. Review the Durable Objects Pricing page for the current SQLite rate card.

Cost Implications for Architecture

  • Presence channels (mostly idle, occasional heartbeat): Hibernate 95%+ of the time. Extremely cost-effective.
  • Notification channels (idle until a notification arrives): Nearly zero cost during idle periods.
  • Chat rooms (sporadic activity): Cost-effective for rooms with bursts of activity followed by quiet periods.
  • Collaborative editing (continuous high-frequency updates): Higher cost due to frequent messages preventing hibernation. Use message batching to reduce wake-ups.

Scaling Guidelines

  • Keep each Durable Object under 5,000 concurrent WebSocket connections for reliable performance.
  • For organizations with more than 5,000 concurrent users, shard presence across multiple DOs (e.g., presence:org-acme:shard-0, presence:org-acme:shard-1).
  • Use this.ctx.storage batch operations to stay within the 1,000 ops/sec limit.
  • Monitor DO CPU time per request to avoid hitting the 30-second limit during alarm-based cleanup.

References