Version Management

Table of Contents


Overview

Runtime version management allows platform operators to control exactly which version of each micro frontend (MFE) is loaded in production, without redeploying the shell application or any other part of the infrastructure. The Admin UI provides a centralized interface for pinning, promoting, and rolling back MFE versions across environments.

Core Capabilities

  • No-redeploy version changes: Update a configuration value and every subsequent page load picks up the new MFE version. The shell application reads the active version config at bootstrap time and instructs Module Federation to load remotes from the corresponding CDN URLs.
  • Instant rollback: Previous build artifacts remain on R2/CDN indefinitely. Rolling back means pointing the version config to an older URL --- no rebuild, no redeployment.
  • Canary deployments: Route a percentage of users to a new MFE version using consistent hashing on user identity. Gradually increase the percentage as confidence grows.
  • Environment-specific pinning: Maintain independent version configs for dev, staging, and production. Promote versions through environments with explicit approval gates.
  • Full audit trail: Every version change is recorded in D1 with the identity of the actor, timestamp, and event type, enabling compliance and post-incident analysis.

Architecture

Version management architecture: Admin UI to Version Config Service to D1/KV to Shell App Admin UI Version Config Service D1 persistent store sync on write KV edge cache Shell App Module Federation Runtime (loads versioned remotes)

Data Flow

The version management system follows a write-through caching pattern where D1 serves as the persistent source of truth and KV provides low-latency edge reads.

1. Admin pins MFE version via UI

The operator selects an MFE, chooses a registered version from the dropdown, and clicks "Activate". The Admin UI sends an authenticated POST request to the Version Config Service.

2. Config Service writes to D1 (audit) + KV (fast reads)

The Version Config Service Worker performs a transactional write:

  • Inserts a deployment_events row in D1 recording the activation.
  • Updates the version_configs table: sets is_active = false on the previously active row for that MFE/environment, and is_active = true on the newly activated row.
  • Writes the aggregated version config JSON to KV under the key version-config:{environment}.

Because the D1 write and KV write happen in the same Worker invocation, the system provides strong consistency in the common case. However, D1 and KV are separate storage systems with no cross-system transactional guarantee --- see KV/D1 Write Atomicity Gap for failure modes and mitigations.

3. Shell app fetches version config from KV on page load

When a user navigates to the application, the shell app's bootstrap logic issues a GET request to the Version Config Service endpoint (e.g., GET /api/v1/version-config?env=production). This endpoint reads directly from KV, ensuring sub-millisecond response times at any Cloudflare edge location worldwide.

4. Module Federation runtime loads remotes from versioned CDN URLs

The shell passes the entry URLs from the version config into Module Federation's init() call. The runtime fetches each MFE's mf-manifest.json from the versioned CDN path, then loads the necessary chunks on demand as the user navigates to each route.

Version Config Resolution Flow — Shell fetches config from KV, loads MFE via Module Federation User navigates to /dashboard Shell bootstrap GET /api/v1/ version-config KV: version-config:production JSON with MFE entries + versions response MF init({ remotes: { mfe_dashboard: ... } }) Lazy load mfe_dashboard CDN / R2 mfe-dashboard/v2.3.1/mf-manifest.json Load exposed modules + render

Version Config Schema

TypeScript Interface

interface MfeVersionEntry {
  /** Semver version string, e.g. "2.3.1" */
  version: string;

  /** Full URL to the mf-manifest.json for this version */
  entry: string;

  /** Subresource Integrity hash for the manifest (optional but recommended) */
  integrity?: string;

  /** ISO 8601 timestamp of when this version was activated */
  updatedAt: string;

  /** Email or identity of the user who activated this version */
  updatedBy: string;

  /** Canary configuration (present only during gradual rollouts) */
  canary?: {
    /** Version being rolled out */
    version: string;
    /** Entry URL for the canary version */
    entry: string;
    /** Percentage of traffic routed to canary (0-100) */
    percentage: number;
    /** Integrity hash for canary manifest */
    integrity?: string;
  };
}

interface VersionConfig {
  [mfeName: string]: MfeVersionEntry;
}

Example KV Value

The following JSON is stored in KV under the key version-config:production:

{
  "mfe_dashboard": {
    "version": "2.3.1",
    "entry": "https://cdn.example.com/mfe-dashboard/v2.3.1/mf-manifest.json",
    "integrity": "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8w",
    "updatedAt": "2025-03-15T10:30:00Z",
    "updatedBy": "admin@example.com"
  },
  "mfe_settings": {
    "version": "1.8.0",
    "entry": "https://cdn.example.com/mfe-settings/v1.8.0/mf-manifest.json",
    "integrity": "sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEb2V2I",
    "updatedAt": "2025-03-14T16:45:00Z",
    "updatedBy": "admin@example.com"
  },
  "mfe_analytics": {
    "version": "3.1.0",
    "entry": "https://cdn.example.com/mfe-analytics/v3.1.0/mf-manifest.json",
    "updatedAt": "2025-03-13T09:00:00Z",
    "updatedBy": "deploy-bot@example.com",
    "canary": {
      "version": "3.2.0-rc.1",
      "entry": "https://cdn.example.com/mfe-analytics/v3.2.0-rc.1/mf-manifest.json",
      "percentage": 10
    }
  }
}

KV Key Naming Convention

Key PatternDescription
version-config:productionActive version config for production
version-config:stagingActive version config for staging
version-config:devActive version config for dev
version-history:{env}:{mfe_name}Last 50 versions for quick history lookup

Shell App Bootstrap

The shell application reads the version config at startup and dynamically configures Module Federation remotes. This means the shell never hardcodes remote URLs --- it discovers them at runtime.

Bootstrap Sequence

  1. Fetch version config from the Version Config Service (backed by KV at the edge).
  2. Resolve canary assignments --- if any MFE has a canary config, determine whether the current user should receive the canary version based on consistent hashing.
  3. Initialize Module Federation runtime with dynamic remotes derived from the config.
  4. Register routes that lazy-load each MFE's exposed components.
  5. Render the application.

Implementation

// src/bootstrap.tsx

import { init, loadRemote } from '@module-federation/enhanced/runtime';
import { createRoot } from 'react-dom/client';
import { App } from './App';
import type { VersionConfig, MfeVersionEntry } from './types/version-config';

const VERSION_CONFIG_URL = 'https://config.example.com/api/v1/version-config';

/**
 * Fetches the active version config from the Version Config Service.
 * The service reads from KV, so this is fast at any edge location.
 */
async function fetchVersionConfig(): Promise<VersionConfig> {
  const environment = import.meta.env.VITE_ENVIRONMENT ?? 'production';
  const response = await fetch(`${VERSION_CONFIG_URL}?env=${environment}`, {
    headers: { 'Accept': 'application/json' },
  });

  if (!response.ok) {
    throw new Error(
      `Failed to fetch version config: ${response.status} ${response.statusText}`
    );
  }

  return response.json();
}

/**
 * Determines which entry URL to use for an MFE, accounting for canary config.
 * Uses consistent hashing on the user ID so the same user always gets the
 * same version within a canary window.
 */
function resolveEntry(
  mfeName: string,
  config: MfeVersionEntry,
  userId: string | null
): string {
  if (!config.canary || !userId) {
    return config.entry;
  }

  const hash = simpleHash(`${userId}:${mfeName}`);
  const bucket = hash % 100;

  if (bucket < config.canary.percentage) {
    return config.canary.entry;
  }

  return config.entry;
}

/**
 * Simple deterministic hash for canary bucketing.
 * Not cryptographic --- just needs to be consistent and well-distributed.
 */
function simpleHash(input: string): number {
  let hash = 0;
  for (let i = 0; i < input.length; i++) {
    const char = input.charCodeAt(i);
    hash = ((hash << 5) - hash + char) | 0;
  }
  return Math.abs(hash);
}

/**
 * Main bootstrap function.
 */
const bootstrap = async (): Promise<void> => {
  try {
    const config = await fetchVersionConfig();
    const userId = localStorage.getItem('user_id');

    // Build the remotes object for Module Federation init().
    // Each remote includes an entry URL and, when available, an SRI integrity
    // hash so that the browser rejects tampered bundles at load time.
    const resolvedRemotes = Object.entries(config).map(([name, mfeEntry]) => {
      const isCanary =
        mfeEntry.canary && userId
          ? simpleHash(`${userId}:${name}`) % 100 < mfeEntry.canary.percentage
          : false;

      return {
        name,
        entry: isCanary ? mfeEntry.canary!.entry : mfeEntry.entry,
        integrity: isCanary
          ? mfeEntry.canary!.integrity
          : mfeEntry.integrity,
      };
    });

    // Enforce SRI: reject any remote that was registered without an integrity hash.
    // This prevents loading MFE bundles that cannot be verified against tampering.
    const remotesWithoutIntegrity = resolvedRemotes.filter((r) => !r.integrity);
    if (remotesWithoutIntegrity.length > 0) {
      console.error(
        '[Shell] SRI integrity hash missing for remotes:',
        remotesWithoutIntegrity.map((r) => r.name)
      );
      throw new Error(
        `SRI integrity hash is required for all MFE remotes. ` +
        `Missing: ${remotesWithoutIntegrity.map((r) => r.name).join(', ')}`
      );
    }

    init({
      name: 'shell',
      remotes: Object.fromEntries(
        resolvedRemotes.map(({ name, entry }) => [name, { name, entry }])
      ),
      shared: {
        react: {
          version: '19.2.4',
          scope: 'default',
          lib: () => import('react'),
          shareConfig: { singleton: true, requiredVersion: '^19.2.4' },
        },
        'react-dom': {
          version: '19.2.4',
          scope: 'default',
          lib: () => import('react-dom'),
          shareConfig: { singleton: true, requiredVersion: '^19.2.4' },
        },
      },
    });

    const root = createRoot(document.getElementById('root')!);
    root.render(<App versionConfig={config} />);
  } catch (error) {
    console.error('[Shell] Bootstrap failed:', error);
    // Render a fallback error UI so the user is not left with a blank screen
    const root = createRoot(document.getElementById('root')!);
    root.render(
      <div role="alert">
        <h1>Application failed to load</h1>
        <p>Please refresh the page or contact support.</p>
      </div>
    );
  }
};

bootstrap();

Route Registration with Lazy-Loaded MFEs

// src/routes.tsx

import { lazy, Suspense } from 'react';
import { loadRemote } from '@module-federation/enhanced/runtime';
import type { RouteObject } from 'react-router-dom';
import { LoadingSpinner } from './components/LoadingSpinner';
import { MfeErrorBoundary } from './components/MfeErrorBoundary';

/**
 * Creates a lazy React component backed by a Module Federation remote.
 */
function createRemoteComponent(remoteName: string, exposedModule: string) {
  return lazy(async () => {
    const module = await loadRemote<{ default: React.ComponentType }>(
      `${remoteName}/${exposedModule}`
    );
    if (!module) {
      throw new Error(`Failed to load remote module: ${remoteName}/${exposedModule}`);
    }
    return module;
  });
}

const Dashboard = createRemoteComponent('mfe_dashboard', 'DashboardPage');
const Settings = createRemoteComponent('mfe_settings', 'SettingsPage');
const Analytics = createRemoteComponent('mfe_analytics', 'AnalyticsPage');

export const routes: RouteObject[] = [
  {
    path: '/dashboard',
    element: (
      <MfeErrorBoundary mfeName="mfe_dashboard">
        <Suspense fallback={<LoadingSpinner />}>
          <Dashboard />
        </Suspense>
      </MfeErrorBoundary>
    ),
  },
  {
    path: '/settings/*',
    element: (
      <MfeErrorBoundary mfeName="mfe_settings">
        <Suspense fallback={<LoadingSpinner />}>
          <Settings />
        </Suspense>
      </MfeErrorBoundary>
    ),
  },
  {
    path: '/analytics/*',
    element: (
      <MfeErrorBoundary mfeName="mfe_analytics">
        <Suspense fallback={<LoadingSpinner />}>
          <Analytics />
        </Suspense>
      </MfeErrorBoundary>
    ),
  },
];

Deployment Pipeline

When a developer merges a change to an MFE, the CI pipeline builds the MFE, uploads artifacts to R2, and registers the new version with the Version Config Service. Crucially, registration does not mean activation --- the new version sits in the registry until an admin (or an automated promotion rule) explicitly activates it.

End-to-End Flow

Deployment pipeline: build, upload R2, register version in D1, promote to KV, live Build Rsbuild + MF Upload R2 CDN artifacts Register in D1 version record Promote to KV edge cache update LIVE next page load 1 2 3 4 5 Registration does not mean activation -- admin or automation promotes explicitly

CI Script

The following GitHub Actions workflow handles steps 2 through 4.

# .github/workflows/deploy-mfe.yml

name: Deploy MFE
on:
  push:
    branches: [main]

env:
  MFE_NAME: mfe-dashboard
  R2_BUCKET: mfe-artifacts
  CONFIG_SERVICE_URL: https://config.example.com/api/v1

jobs:
  build-and-register:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v6

      - name: Setup Node.js
        uses: actions/setup-node@v6
        with:
          node-version: '22'
          cache: 'pnpm'

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Determine version
        id: version
        run: |
          VERSION=$(node -p "require('./package.json').version")
          SHORT_SHA=$(git rev-parse --short HEAD)
          FULL_VERSION="${VERSION}+${SHORT_SHA}"
          echo "version=${FULL_VERSION}" >> "$GITHUB_OUTPUT"

      - name: Build MFE with Rsbuild
        run: pnpm rsbuild build
        env:
          MFE_VERSION: ${{ steps.version.outputs.version }}
          PUBLIC_PATH: https://cdn.example.com/${{ env.MFE_NAME }}/${{ steps.version.outputs.version }}/

      - name: Compute integrity hash
        id: integrity
        run: |
          HASH=$(shasum -b -a 384 dist/mf-manifest.json | awk '{ print $1 }' | xxd -r -p | base64)
          echo "hash=sha384-${HASH}" >> "$GITHUB_OUTPUT"

      - name: Upload to R2
        uses: cloudflare/wrangler-action@v4
        with:
          command: r2 object put "${{ env.R2_BUCKET }}/${{ env.MFE_NAME }}/${{ steps.version.outputs.version }}/" --file=dist/ --recursive
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}

      - name: Register version with Config Service
        run: |
          curl -sf -X POST "${{ env.CONFIG_SERVICE_URL }}/versions" \
            -H "Authorization: Bearer ${{ secrets.CONFIG_SERVICE_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{
              "mfeName": "${{ env.MFE_NAME }}",
              "version": "${{ steps.version.outputs.version }}",
              "entryUrl": "https://cdn.example.com/${{ env.MFE_NAME }}/${{ steps.version.outputs.version }}/mf-manifest.json",
              "integrityHash": "${{ steps.integrity.outputs.hash }}",
              "environment": "dev",
              "createdBy": "ci-bot@example.com"
            }'

      - name: Auto-activate in dev environment
        if: success()
        run: |
          curl -sf -X POST "${{ env.CONFIG_SERVICE_URL }}/versions/activate" \
            -H "Authorization: Bearer ${{ secrets.CONFIG_SERVICE_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d '{
              "mfeName": "${{ env.MFE_NAME }}",
              "version": "${{ steps.version.outputs.version }}",
              "environment": "dev",
              "activatedBy": "ci-bot@example.com"
            }'

Version Config Service --- Registration Endpoint

// workers/version-config-service/src/handlers/register-version.ts

import { D1Database, KVNamespace } from '@cloudflare/workers-types';

interface RegisterVersionRequest {
  mfeName: string;
  version: string;
  entryUrl: string;
  integrityHash?: string;
  environment: string;
  createdBy: string;
}

export async function handleRegisterVersion(
  request: Request,
  env: { DB: D1Database; VERSION_KV: KVNamespace }
): Promise<Response> {
  const body: RegisterVersionRequest = await request.json();

  // Validate that the manifest is actually accessible before registering
  const manifestCheck = await fetch(body.entryUrl, { method: 'HEAD' });
  if (!manifestCheck.ok) {
    return Response.json(
      { error: `Manifest not accessible at ${body.entryUrl}: ${manifestCheck.status}` },
      { status: 400 }
    );
  }

  // Check for duplicate registration
  const existing = await env.DB.prepare(
    'SELECT id FROM version_configs WHERE environment = ? AND mfe_name = ? AND version = ?'
  )
    .bind(body.environment, body.mfeName, body.version)
    .first();

  if (existing) {
    return Response.json(
      { error: 'Version already registered', existingId: existing.id },
      { status: 409 }
    );
  }

  // Insert the version record (is_active defaults to false)
  const result = await env.DB.prepare(
    `INSERT INTO version_configs (environment, mfe_name, version, entry_url, integrity_hash, created_by)
     VALUES (?, ?, ?, ?, ?, ?)`
  )
    .bind(
      body.environment,
      body.mfeName,
      body.version,
      body.entryUrl,
      body.integrityHash ?? null,
      body.createdBy
    )
    .run();

  // Record the deployment event
  await env.DB.prepare(
    `INSERT INTO deployment_events (environment, mfe_name, version, event_type, created_by)
     VALUES (?, ?, ?, 'registered', ?)`
  )
    .bind(body.environment, body.mfeName, body.version, body.createdBy)
    .run();

  return Response.json(
    { id: result.meta.last_row_id, status: 'registered' },
    { status: 201 }
  );
}

Rollback

Rollback is one of the most critical operational capabilities. Because all previously deployed MFE bundles remain on R2/CDN, a rollback is simply a configuration change that points the version config back to a prior version's URL. No rebuild or redeployment is involved.

Rollback Flow

Rollback flow: Admin clicks rollback, KV update is instant, next page load gets old version Admin clicks Rollback Config Service D1 update + KV write KV Update instant at edge Next Page Load gets old version No rebuild needed -- old bundles remain on R2/CDN indefinitely

Rollback Handler

// workers/version-config-service/src/handlers/activate-version.ts

interface ActivateVersionRequest {
  mfeName: string;
  version: string;
  environment: string;
  activatedBy: string;
  isRollback?: boolean;
}

export async function handleActivateVersion(
  request: Request,
  env: { DB: D1Database; VERSION_KV: KVNamespace }
): Promise<Response> {
  const body: ActivateVersionRequest = await request.json();

  // 1. Verify the target version exists and its bundle is accessible
  const targetVersion = await env.DB.prepare(
    'SELECT * FROM version_configs WHERE environment = ? AND mfe_name = ? AND version = ?'
  )
    .bind(body.environment, body.mfeName, body.version)
    .first<VersionConfigRow>();

  if (!targetVersion) {
    return Response.json({ error: 'Version not found' }, { status: 404 });
  }

  const manifestCheck = await fetch(targetVersion.entry_url, { method: 'HEAD' });
  if (!manifestCheck.ok) {
    return Response.json(
      { error: `Bundle no longer accessible at ${targetVersion.entry_url}` },
      { status: 400 }
    );
  }

  // 2-4. Wrap deactivation, activation, and audit event in a D1 batch
  // transaction to prevent race conditions (e.g., two simultaneous
  // activations leaving multiple rows with is_active = true).
  const eventType = body.isRollback ? 'rollback' : 'activated';

  const stmtDeactivate = env.DB.prepare(
    `UPDATE version_configs SET is_active = false
     WHERE environment = ? AND mfe_name = ? AND is_active = true`
  ).bind(body.environment, body.mfeName);

  const stmtActivate = env.DB.prepare(
    `UPDATE version_configs
     SET is_active = true, activated_at = datetime('now'), activated_by = ?
     WHERE id = ?`
  ).bind(body.activatedBy, targetVersion.id);

  const stmtEvent = env.DB.prepare(
    `INSERT INTO deployment_events (environment, mfe_name, version, event_type, metadata, created_by)
     VALUES (?, ?, ?, ?, ?, ?)`
  ).bind(
    body.environment,
    body.mfeName,
    body.version,
    eventType,
    JSON.stringify({ previousVersion: targetVersion.version }),
    body.activatedBy
  );

  // D1 batch() executes all statements in a single transaction.
  // If any statement fails, the entire batch is rolled back.
  await env.DB.batch([stmtDeactivate, stmtActivate, stmtEvent]);

  // 5. Rebuild and write the aggregated version config to KV
  await syncVersionConfigToKV(env, body.environment);

  return Response.json({ status: eventType, version: body.version });
}

/**
 * Reads all active versions for an environment from D1 and writes the
 * aggregated config to KV.
 */
async function syncVersionConfigToKV(
  env: { DB: D1Database; VERSION_KV: KVNamespace },
  environment: string
): Promise<void> {
  const activeVersions = await env.DB.prepare(
    'SELECT * FROM version_configs WHERE environment = ? AND is_active = true'
  )
    .bind(environment)
    .all<VersionConfigRow>();

  const config: Record<string, MfeVersionEntry> = {};

  for (const row of activeVersions.results) {
    config[row.mfe_name] = {
      version: row.version,
      entry: row.entry_url,
      integrity: row.integrity_hash ?? undefined,
      updatedAt: row.activated_at ?? row.created_at,
      updatedBy: row.activated_by ?? row.created_by,
    };
  }

  await env.VERSION_KV.put(
    `version-config:${environment}`,
    JSON.stringify(config),
    { metadata: { updatedAt: new Date().toISOString() } }
  );
}

Rollback Considerations

ConcernMitigation
KV propagation delayFor urgent rollbacks, also purge the Cloudflare cache on the version config endpoint using the Cache API. See Cache Invalidation Strategy.
Session continuityUsers with an active session will continue running the old MFE code until they refresh or navigate. The shell can detect version mismatches and show a soft prompt: "A new version is available. Click to refresh."
Shared state compatibilityIf the new version changed the shape of persisted state (e.g., localStorage, IndexedDB), rolling back to the old version may encounter unexpected data. MFEs should use versioned storage keys or schema migrations.
CDN cache on MFE bundlesMFE bundles are served from versioned paths (/v2.3.1/), so they are effectively immutable. Rolling back does not require purging bundle caches --- the old bundles are already cached under their own paths.
Bundle retentionNever delete old bundles from R2. Implement a retention policy (e.g., keep the last 20 versions) to manage storage costs while preserving rollback capability.

KV/D1 Write Atomicity Gap

D1 (SQLite) and KV are independent storage systems. There is no distributed transaction spanning both, which means a write can succeed in one system and fail in the other. This section documents the failure modes and recommended mitigations.

Failure Scenarios

ScenarioSymptomImpact
D1 write succeeds, KV write failsThe audit trail and version_configs table reflect the new active version, but the shell continues loading the old version because KV still holds the previous config.Users see stale MFE versions. The Admin UI shows the version as "active" even though it is not being served.
D1 write fails, KV write succeedsUnlikely in practice because the code writes to D1 first, but could happen if D1 commits and then the Worker crashes before the KV write, followed by a retry that skips D1 (due to UNIQUE constraint) but writes KV.KV serves a config that does not match the D1 source of truth. A subsequent full sync from D1 to KV would overwrite the stale KV value.
Partial D1 batch + KV writeThe D1 batch transaction (deactivate + activate + event) succeeds atomically, but the subsequent KV write fails due to a transient KV error or Worker timeout.Same as the first scenario: D1 is correct, KV is stale.

Mitigation Strategies

1. Retry with Idempotency

The KV write is inherently idempotent (a PUT with the same key and value is safe to repeat). If the KV write fails, the Config Service should retry it a bounded number of times before returning an error to the caller.

async function syncVersionConfigToKVWithRetry(
  env: { DB: D1Database; VERSION_KV: KVNamespace },
  environment: string,
  maxRetries = 3
): Promise<void> {
  const config = await buildVersionConfigFromD1(env, environment);
  const payload = JSON.stringify(config);

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      await env.VERSION_KV.put(
        `version-config:${environment}`,
        payload,
        { metadata: { updatedAt: new Date().toISOString() } }
      );
      return; // Success
    } catch (error) {
      console.error(
        `[syncKV] Attempt ${attempt}/${maxRetries} failed for env=${environment}:`,
        error
      );
      if (attempt === maxRetries) throw error;
      // Brief delay before retry
      await new Promise((resolve) => setTimeout(resolve, 100 * attempt));
    }
  }
}

2. Periodic Reconciliation Job

A scheduled Worker (Cron Trigger) runs every few minutes and reconciles KV with D1. For each environment, it reads the active versions from D1, builds the expected KV value, and overwrites KV if the values differ.

// workers/version-config-service/src/scheduled.ts

export default {
  async scheduled(
    _event: ScheduledEvent,
    env: { DB: D1Database; VERSION_KV: KVNamespace }
  ): Promise<void> {
    for (const environment of ['dev', 'staging', 'production']) {
      const expected = await buildVersionConfigFromD1(env, environment);
      const current = await env.VERSION_KV.get(`version-config:${environment}`);

      if (JSON.stringify(expected) !== current) {
        console.warn(
          `[Reconciliation] KV drift detected for env=${environment}. Resyncing.`
        );
        await env.VERSION_KV.put(
          `version-config:${environment}`,
          JSON.stringify(expected),
          { metadata: { updatedAt: new Date().toISOString(), reconciledBy: 'cron' } }
        );
      }
    }
  },
};

3. Response Indicates Partial Failure

If D1 succeeds but KV fails (even after retries), the activation endpoint should return a response that clearly indicates partial success so the caller can take corrective action.

// In handleActivateVersion, after the D1 batch succeeds:
try {
  await syncVersionConfigToKVWithRetry(env, body.environment);
} catch (kvError) {
  console.error('[Activate] KV sync failed after D1 commit:', kvError);
  return Response.json(
    {
      status: eventType,
      version: body.version,
      warning: 'D1 updated successfully but KV sync failed. '
        + 'The reconciliation job will correct this within minutes. '
        + 'You may also retry the activation.',
    },
    { status: 207 } // 207 Multi-Status
  );
}

Design Principle

D1 is the source of truth. KV is a derived, eventually-consistent cache. Any divergence should be treated as a KV staleness issue and resolved by re-deriving KV from D1. The reconciliation job provides the safety net that ensures divergence is always temporary and bounded.


Canary / Gradual Rollout

Percentage-Based Rollout

Canary rollout: user hash determines percentage routing to old version (90%) vs new version (10%) User page load Consistent Hash userId + mfeName % Routing bucket < canary%? Stable v2.3.1 90% of traffic Canary v3.0.0-rc 10% of traffic

Canary deployments allow a new MFE version to be tested with a fraction of real production traffic before full activation. The version config supports an optional canary field on any MFE entry.

Canary Config Schema

interface CanaryConfig {
  /** The canary version string */
  version: string;

  /** Entry URL for the canary mf-manifest.json */
  entry: string;

  /** Integrity hash for the canary manifest */
  integrity?: string;

  /** Percentage of users who should receive the canary (0-100) */
  percentage: number;

  /** When the canary was started */
  startedAt: string;

  /** Who initiated the canary */
  startedBy: string;
}

Admin UI Canary Workflow

  1. Admin navigates to the MFE's version list and selects a registered version.
  2. Instead of "Activate", clicks "Start Canary".
  3. Sets the initial traffic percentage (e.g., 5%).
  4. Config Service writes the canary config nested inside the MFE's entry in KV.
  5. Admin monitors error rates and performance in the dashboard.
  6. Admin increases percentage incrementally (5% -> 25% -> 50% -> 100%).
  7. At 100%, admin clicks "Promote" to fully activate and remove the canary config.

Shell-Side Canary Logic

// src/canary.ts

/**
 * Resolves the effective entry URL for an MFE, accounting for canary routing.
 *
 * Uses consistent hashing so the same user always lands in the same bucket
 * for a given MFE. This prevents users from flipping between versions on
 * successive page loads.
 */
export function resolveCanaryEntry(
  mfeName: string,
  config: MfeVersionEntry,
  userId: string | null
): { entry: string; version: string; isCanary: boolean } {
  // No canary config or no user ID: always serve the stable version.
  // Anonymous users never get canary to avoid inconsistent experiences.
  if (!config.canary || !userId) {
    return { entry: config.entry, version: config.version, isCanary: false };
  }

  const bucket = consistentBucket(userId, mfeName);

  if (bucket < config.canary.percentage) {
    return {
      entry: config.canary.entry,
      version: config.canary.version,
      isCanary: true,
    };
  }

  return { entry: config.entry, version: config.version, isCanary: false };
}

/**
 * Produces a stable integer in [0, 100) for a given user + MFE pair.
 * Uses FNV-1a for speed and good distribution.
 */
function consistentBucket(userId: string, mfeName: string): number {
  const input = `${userId}:${mfeName}`;
  let hash = 0x811c9dc5; // FNV offset basis

  for (let i = 0; i < input.length; i++) {
    hash ^= input.charCodeAt(i);
    hash = Math.imul(hash, 0x01000193); // FNV prime
  }

  return ((hash >>> 0) % 100);
}

Canary Observability

The shell reports the active version of each MFE to the observability stack so that error rates and performance can be segmented by version.

// src/telemetry.ts

export function reportMfeVersions(
  resolvedVersions: Record<string, { version: string; isCanary: boolean }>
): void {
  // Tag all subsequent telemetry with MFE versions
  for (const [mfeName, { version, isCanary }] of Object.entries(resolvedVersions)) {
    analytics.setGlobalTag(`mfe.${mfeName}.version`, version);
    analytics.setGlobalTag(`mfe.${mfeName}.canary`, String(isCanary));
  }
}

Environment-Based Promotion

In addition to canary rollouts within a single environment, the platform supports promoting version configs across environments. This provides a structured path from development to production.

Promotion Flow

Environment Promotion Flow — dev to staging to production dev Auto-activate on CI push manual staging Manual promote from dev manual production Manual activate or canary
StageActivation Policy
devAuto-activated by CI on every push to main.
stagingManually promoted from dev via Admin UI or API.
productionManually activated or canary-deployed from staging.

Promotion Endpoint

// workers/version-config-service/src/handlers/promote-version.ts

interface PromoteRequest {
  mfeName: string;
  version: string;
  fromEnvironment: string;  // e.g., "staging"
  toEnvironment: string;    // e.g., "production"
  promotedBy: string;
}

export async function handlePromoteVersion(
  request: Request,
  env: { DB: D1Database; VERSION_KV: KVNamespace }
): Promise<Response> {
  const body: PromoteRequest = await request.json();

  // Verify the version is active in the source environment
  const sourceVersion = await env.DB.prepare(
    `SELECT * FROM version_configs
     WHERE environment = ? AND mfe_name = ? AND version = ? AND is_active = true`
  )
    .bind(body.fromEnvironment, body.mfeName, body.version)
    .first<VersionConfigRow>();

  if (!sourceVersion) {
    return Response.json(
      { error: `Version ${body.version} is not active in ${body.fromEnvironment}` },
      { status: 400 }
    );
  }

  // Check if this version is already registered in the target environment
  const existingInTarget = await env.DB.prepare(
    `SELECT id FROM version_configs
     WHERE environment = ? AND mfe_name = ? AND version = ?`
  )
    .bind(body.toEnvironment, body.mfeName, body.version)
    .first();

  if (!existingInTarget) {
    // Register the version in the target environment
    await env.DB.prepare(
      `INSERT INTO version_configs (environment, mfe_name, version, entry_url, integrity_hash, created_by)
       VALUES (?, ?, ?, ?, ?, ?)`
    )
      .bind(
        body.toEnvironment,
        body.mfeName,
        body.version,
        sourceVersion.entry_url,
        sourceVersion.integrity_hash,
        body.promotedBy
      )
      .run();
  }

  // Activate it in the target environment (reuses the activation handler logic)
  const activateRequest = new Request(request.url, {
    method: 'POST',
    headers: request.headers,
    body: JSON.stringify({
      mfeName: body.mfeName,
      version: body.version,
      environment: body.toEnvironment,
      activatedBy: body.promotedBy,
    }),
  });

  return handleActivateVersion(activateRequest, env);
}

Cache Invalidation Strategy

The version config sits behind multiple caching layers. Understanding propagation delays is essential for predictable operational behavior.

Caching Layers

Admin writes to Config Service
        │
        ▼
D1 (immediate consistency within the same colo)
        │
        ▼
KV write (propagation: ~60 seconds to all edge locations)
        │
        ▼
Cloudflare CDN cache on /api/v1/version-config endpoint
        │
        ▼
Browser HTTP cache (if Cache-Control headers allow)

Propagation Timeline

LayerTypical DelayWorst Case
D1 writeImmediate< 100ms
KV global propagation~60 secondsUp to 60 seconds
CDN edge cache (if enabled)Depends on TTLUp to TTL duration
Browser cacheDepends on headersUntil expiration or manual refresh

Mitigation Strategies

1. Short TTL on the Version Config Endpoint

The Version Config Service endpoint sets conservative cache headers to ensure freshness.

// workers/version-config-service/src/handlers/get-config.ts

export async function handleGetConfig(
  request: Request,
  env: { VERSION_KV: KVNamespace }
): Promise<Response> {
  const url = new URL(request.url);
  const environment = url.searchParams.get('env') ?? 'production';

  const config = await env.VERSION_KV.get(`version-config:${environment}`);

  if (!config) {
    return Response.json({}, { status: 200 });
  }

  return new Response(config, {
    headers: {
      'Content-Type': 'application/json',
      // Short TTL: browsers and CDN edge will re-validate frequently
      'Cache-Control': 'public, max-age=30, s-maxage=15, stale-while-revalidate=60',
      // ETag for conditional requests
      'ETag': `"${await hashContent(config)}"`,
    },
  });
}

2. Cache API Purge for Urgent Rollbacks

When an urgent rollback is performed, the Config Service proactively purges the CDN cache.

// workers/version-config-service/src/cache.ts

export async function purgeVersionConfigCache(environment: string): Promise<void> {
  const cache = caches.default;
  const cacheKey = new Request(
    `https://config.example.com/api/v1/version-config?env=${environment}`
  );
  await cache.delete(cacheKey);
}

3. Shell-Side Polling

For long-lived sessions, the shell polls for config updates on a timer so users eventually receive the latest versions without a full page reload.

// src/version-poller.ts

const POLL_INTERVAL_MS = 5 * 60 * 1000; // 5 minutes

export function startVersionPoller(
  currentConfig: VersionConfig,
  onVersionChange: (newConfig: VersionConfig) => void
): () => void {
  const intervalId = setInterval(async () => {
    try {
      const latestConfig = await fetchVersionConfig();

      const hasChanges = Object.entries(latestConfig).some(
        ([name, entry]) => currentConfig[name]?.version !== entry.version
      );

      if (hasChanges) {
        onVersionChange(latestConfig);
      }
    } catch (error) {
      // Silently ignore polling errors --- the user is still running a valid version.
      console.warn('[VersionPoller] Failed to check for updates:', error);
    }
  }, POLL_INTERVAL_MS);

  return () => clearInterval(intervalId);
}

4. WebSocket Push for Immediate Notification

For scenarios where even a 5-minute delay is unacceptable (e.g., critical security patches), the shell can maintain a WebSocket connection to a Durable Object that broadcasts version change events.

// src/version-websocket.ts

export function connectVersionWebSocket(
  onVersionChange: (newConfig: VersionConfig) => void
): WebSocket {
  const ws = new WebSocket('wss://config.example.com/ws/version-updates');

  ws.addEventListener('message', (event) => {
    try {
      const message = JSON.parse(event.data);
      if (message.type === 'version-changed') {
        onVersionChange(message.config);
      }
    } catch {
      // Ignore malformed messages
    }
  });

  ws.addEventListener('close', () => {
    // Reconnect with exponential backoff
    setTimeout(() => connectVersionWebSocket(onVersionChange), 5000);
  });

  return ws;
}

5. Soft Reload Prompt

When the shell detects a version change (via polling or WebSocket), it does not force a reload. Instead, it shows a non-intrusive notification.

// src/components/VersionUpdateBanner.tsx

import { useState } from 'react';

interface VersionUpdateBannerProps {
  updatedMfes: string[];
}

export function VersionUpdateBanner({ updatedMfes }: VersionUpdateBannerProps) {
  const [dismissed, setDismissed] = useState(false);

  if (dismissed) return null;

  return (
    <div role="status" className="version-update-banner">
      <p>
        Updated versions available for: {updatedMfes.join(', ')}.
      </p>
      <button onClick={() => window.location.reload()}>
        Refresh now
      </button>
      <button onClick={() => setDismissed(true)}>
        Dismiss
      </button>
    </div>
  );
}

Admin UI Features

The Admin UI is a dedicated MFE (or a standalone application) that provides operational control over MFE version management. It communicates with the Version Config Service via authenticated API calls.

Dashboard

  • Displays the currently active version of each MFE, grouped by environment.
  • Shows a health indicator (green/yellow/red) based on the last health check result.
  • Highlights MFEs with active canary deployments and their current rollout percentage.

Version History with Audit Trail

  • Full chronological log of every version event: registration, activation, deactivation, rollback.
  • Each entry shows: timestamp, version, event type, actor (who), and optional metadata.
  • Filterable by MFE name, environment, event type, and date range.
  • Data sourced from the deployment_events table in D1.

One-Click Rollback

  • From the version history view, each previously active version has a "Rollback to this version" button.
  • Before executing the rollback, the system performs a pre-flight health check to confirm the bundle is still accessible on CDN.
  • A confirmation dialog shows the current version and the target version side by side.
  • After rollback, the event is logged and appears immediately in the audit trail.

Canary Configuration

  • "Start Canary" button on any registered (inactive) version.
  • Slider or input for setting the traffic percentage.
  • Real-time metrics panel showing error rates for the canary vs. stable version.
  • "Promote" button to graduate the canary to full activation.
  • "Abort Canary" button to immediately revert all traffic to the stable version.

Health Checks

Before activating any version, the system performs automated health checks.

// workers/version-config-service/src/health-check.ts

interface HealthCheckResult {
  mfeName: string;
  version: string;
  manifestAccessible: boolean;
  manifestValid: boolean;
  exposedModulesAccessible: boolean;
  responseTimeMs: number;
  checkedAt: string;
}

export async function performHealthCheck(
  entryUrl: string,
  mfeName: string,
  version: string
): Promise<HealthCheckResult> {
  const start = Date.now();
  const result: HealthCheckResult = {
    mfeName,
    version,
    manifestAccessible: false,
    manifestValid: false,
    exposedModulesAccessible: false,
    responseTimeMs: 0,
    checkedAt: new Date().toISOString(),
  };

  try {
    // Check that the manifest is accessible
    const manifestResponse = await fetch(entryUrl);
    result.manifestAccessible = manifestResponse.ok;

    if (!manifestResponse.ok) return result;

    // Validate manifest structure
    const manifest = await manifestResponse.json();
    result.manifestValid =
      manifest != null &&
      typeof manifest === 'object' &&
      'exposes' in manifest;

    if (!result.manifestValid) return result;

    // Spot-check that at least one exposed module's entry chunk is accessible
    const firstExpose = Object.values(manifest.exposes ?? {})[0] as
      | { path: string }
      | undefined;

    if (firstExpose?.path) {
      const baseUrl = entryUrl.replace(/\/[^/]+$/, '/');
      const chunkResponse = await fetch(`${baseUrl}${firstExpose.path}`, {
        method: 'HEAD',
      });
      result.exposedModulesAccessible = chunkResponse.ok;
    }
  } catch {
    // Leave all checks as false
  } finally {
    result.responseTimeMs = Date.now() - start;
  }

  return result;
}

Role-Based Access Control (RBAC)

Access to version management operations is governed by roles managed through WorkOS.

RolePermissions
viewerView dashboard, version history, and health status
developerAll viewer permissions + activate versions in dev environment
release-managerAll developer permissions + activate/rollback in staging and production, configure canary deployments
adminAll permissions + manage RBAC roles, configure retention policies
// workers/version-config-service/src/middleware/auth.ts

import { WorkOS } from '@workos-inc/node';

const workos = new WorkOS(process.env.WORKOS_API_KEY);

type Permission = 'version:read' | 'version:activate:dev' | 'version:activate:staging' | 'version:activate:production' | 'canary:manage';

const ROLE_PERMISSIONS: Record<string, Permission[]> = {
  viewer: ['version:read'],
  developer: ['version:read', 'version:activate:dev'],
  'release-manager': [
    'version:read',
    'version:activate:dev',
    'version:activate:staging',
    'version:activate:production',
    'canary:manage',
  ],
  admin: [
    'version:read',
    'version:activate:dev',
    'version:activate:staging',
    'version:activate:production',
    'canary:manage',
  ],
};

export function requirePermission(permission: Permission) {
  return async (request: Request): Promise<Response | null> => {
    const sessionToken = request.headers.get('Authorization')?.replace('Bearer ', '');
    if (!sessionToken) {
      return Response.json({ error: 'Unauthorized' }, { status: 401 });
    }

    try {
      const session = await workos.userManagement.authenticateWithSessionToken({
        sessionToken,
      });

      const userRoles: string[] = session.organizationMemberships?.map(
        (m) => m.role?.slug ?? 'viewer'
      ) ?? [];

      const hasPermission = userRoles.some((role) =>
        ROLE_PERMISSIONS[role]?.includes(permission)
      );

      if (!hasPermission) {
        return Response.json({ error: 'Forbidden' }, { status: 403 });
      }

      return null; // null means "authorized, continue"
    } catch {
      return Response.json({ error: 'Invalid session' }, { status: 401 });
    }
  };
}

D1 Schema

The D1 database provides persistent storage for version registrations and a complete audit trail of all deployment events.

Tables

-- Stores every version that has been registered for each MFE/environment pair.
-- Only one row per (environment, mfe_name) should have is_active = true at any time.
CREATE TABLE version_configs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  environment TEXT NOT NULL,       -- 'dev', 'staging', 'production'
  mfe_name TEXT NOT NULL,          -- e.g., 'mfe_dashboard'
  version TEXT NOT NULL,           -- semver, e.g., '2.3.1' or '2.3.1+abc1234'
  entry_url TEXT NOT NULL,         -- full URL to mf-manifest.json
  integrity_hash TEXT,             -- SRI hash (e.g., 'sha384-...')
  is_active BOOLEAN DEFAULT false, -- only one active version per (env, mfe_name)
  activated_at DATETIME,           -- when this version was activated
  activated_by TEXT,               -- who activated it
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  created_by TEXT NOT NULL         -- who registered it (usually CI)
);

-- Prevent duplicate version registrations at the database level.
-- The application also checks for duplicates before inserting, but this
-- constraint provides a hard guarantee against race conditions.
CREATE UNIQUE INDEX uq_version_configs_env_mfe_version
  ON version_configs (environment, mfe_name, version);

-- Indexes for common query patterns
CREATE INDEX idx_version_configs_active
  ON version_configs (environment, mfe_name, is_active)
  WHERE is_active = true;

CREATE INDEX idx_version_configs_lookup
  ON version_configs (environment, mfe_name, version);

CREATE INDEX idx_version_configs_history
  ON version_configs (environment, mfe_name, created_at DESC);

-- Records every state transition for full audit trail.
CREATE TABLE deployment_events (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  environment TEXT NOT NULL,       -- 'dev', 'staging', 'production'
  mfe_name TEXT NOT NULL,          -- e.g., 'mfe_dashboard'
  version TEXT NOT NULL,           -- the version this event pertains to
  event_type TEXT NOT NULL,        -- 'registered', 'activated', 'deactivated', 'rollback'
  metadata TEXT,                   -- JSON blob for additional context
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
  created_by TEXT NOT NULL         -- who triggered the event
);

-- Index for querying events by MFE and environment
CREATE INDEX idx_deployment_events_lookup
  ON deployment_events (environment, mfe_name, created_at DESC);

-- Index for filtering by event type (useful for audit queries)
CREATE INDEX idx_deployment_events_type
  ON deployment_events (event_type, created_at DESC);

Query Examples

-- Get the currently active version for all MFEs in production
SELECT mfe_name, version, entry_url, activated_at, activated_by
FROM version_configs
WHERE environment = 'production' AND is_active = true;

-- Get version history for a specific MFE (most recent first)
SELECT version, is_active, activated_at, activated_by, created_at, created_by
FROM version_configs
WHERE environment = 'production' AND mfe_name = 'mfe_dashboard'
ORDER BY created_at DESC
LIMIT 20;

-- Get recent deployment events for audit
SELECT de.environment, de.mfe_name, de.version, de.event_type,
       de.metadata, de.created_at, de.created_by
FROM deployment_events de
WHERE de.environment = 'production'
ORDER BY de.created_at DESC
LIMIT 50;

-- Count deployments per MFE in the last 30 days
SELECT mfe_name, COUNT(*) as deploy_count
FROM deployment_events
WHERE environment = 'production'
  AND event_type = 'activated'
  AND created_at >= datetime('now', '-30 days')
GROUP BY mfe_name
ORDER BY deploy_count DESC;

References