ADR 005: WorkOS AuthKit for Authentication

Status: Accepted Date: 2025-03-15

Context

We need authentication for a multi-tenant SaaS platform with enterprise SSO requirements. The solution must work with Cloudflare Workers (edge runtime, no Node.js APIs), support React SPAs, and handle organization-based multi-tenancy.

Decision

Use WorkOS AuthKit with its hosted OAuth flow and React SDK (@workos-inc/authkit-react@^1.3.0).

WorkOS AuthKit WorkOS AuthKit Positive Enterprise SSO (SAML/OIDC) 1M free MAU JWT edge validation Hosted login UI Negative Third-party dependency Limited theming Vendor lock-in

Consequences

Positive

  • Enterprise SSO out of the box (SAML, OIDC) — critical for B2B customers
  • Organization-based multi-tenancy native to WorkOS
  • Generous free tier: 1M monthly active users
  • React SDK (@workos-inc/authkit-react@^1.3.0) with AuthKitProvider, useAuth hook
  • JWT-based tokens: validate on edge with jose@^6.1.3 library (no Node.js crypto needed)
  • Directory sync for user provisioning (SCIM)
  • Hosted login UI (AuthKit) — no need to build login/signup/password reset flows
  • RBAC with roles and permissions

Negative

  • Third-party dependency for critical auth infrastructure
  • Less customizable than building auth from scratch
  • AuthKit hosted UI has limited theming options
  • Vendor lock-in: switching auth providers requires significant migration
  • WorkOS SDK does not have a native Cloudflare Workers package (JWT validation done manually with jose@^6.1.3)

Fallback Strategy (WorkOS Unavailability)

If WorkOS is temporarily unavailable (outage, network issues, rate limiting), the following fallback behavior applies:

Cached Sessions Continue

  • Existing authenticated sessions rely on locally validated JWTs using jose@^6.1.3. Token validation is performed entirely at the edge using the cached JWKS (JSON Web Key Set) and does not require a round-trip to WorkOS.
  • Sessions remain valid until the JWT expires (based on the exp claim). Users with active, unexpired tokens experience no disruption.
  • The JWKS endpoint response is cached with a TTL (recommended: 1 hour). If WorkOS is down but the cached JWKS has not expired, token validation continues to work for all users, including those making new requests.

New Logins Fail Gracefully

  • New login attempts that require the WorkOS hosted OAuth flow will fail, since the AuthKit redirect depends on WorkOS availability.
  • The application must present a user-friendly error page explaining that login is temporarily unavailable. Avoid exposing raw error responses.
  • A clear "Try again" action should be provided so users can retry without navigating away.

Retry with Exponential Backoff

  • All API calls to WorkOS (token exchange, user info, organization lookups) must implement retry logic with exponential backoff.
  • Recommended retry schedule: initial delay 1s, backoff factor 2x, maximum 5 retries, maximum delay 30s. Add jitter (randomized +/- 20%) to avoid thundering herd.
  • After exhausting retries, fail open for read operations (serve cached data) and fail closed for write operations (deny the action and notify the user).
  • Circuit breaker pattern: after a configurable number of consecutive failures (e.g., 10), stop calling WorkOS for a cooldown period (e.g., 60s) before probing again. This protects against cascading failures.

Monitoring and Alerting

  • Health checks against WorkOS API endpoints should run on a periodic schedule. Alert the on-call team if WorkOS is unreachable for more than 5 minutes.
  • Log all WorkOS API failures with correlation IDs for post-incident analysis.

Alternatives Considered

Auth0

Mature and feature-rich but expensive at scale (pricing per MAU increases steeply). Enterprise SSO is an add-on with significant cost. More complex setup than WorkOS for B2B use cases.

Clerk

Excellent DX and React integration, beautiful pre-built UI components. But: less enterprise SSO focus, no SCIM/directory sync, higher cost for enterprise features. Better suited for B2C.

Custom JWT implementation

Maximum control and no vendor dependency. But: requires building login flows, session management, token refresh, JWKS endpoints, MFA, SSO integrations — enormous effort and security risk. Not justified for this team size.

Supabase Auth

Good integration with Supabase ecosystem but doesn't fit our Cloudflare-first architecture. Limited enterprise SSO support. Better for Supabase-native projects.