Deployment topology & sizing¶

How the Bedrock components combine into a deployment, how coverage cells map routers to geography, and what actually drives scaling — including a clear account of what needs HA (the Directory, and the web recorders that durably archive position and voice) versus what is relay infrastructure (the routers).

Components¶

A deployment is built from a handful of process types. Only the router scales horizontally; the Directory is a singleton; the remote-api bridge exists only where there's a web client.

Component	Count	Scales?	Notes
Directory	1	No — singleton	The identity authority: mints tokens, signs the revocation list, holds the trust root. Single point of failure — there is no documented replication, standby, or HA. If it is down, new logins and token issuance stop; already-cached tokens on routers and clients keep working.
Server / router	1+	Yes — horizontal	Relays cell-scoped traffic. Redundancy comes from running federated peers, not from making one box highly available (see What needs HA). Carries a set of coverage cells and keeps short-lived local buffers (chat 24 h, drawings, channels) in RocksDB, not replicated across routers.
Gateway	0+	Optional	Stateless interop bridge. Connects to one or more routers as a Directory-authenticated client (service IdentityToken) and subscribes. Not on the critical path — the mesh keeps working if a gateway is down.
Web / Android / Node	many	—	Clients, each pointed at a static router endpoint (no load balancer or service discovery). Android/Node speak the router's native transport (`quic`/`tls`); browsers can't — they reach the mesh only through a remote-api bridge (see below). The web tier also runs server-side recorders that durably archive to its database — the position recorder (`web:app/domains/tracks/position_zenoh_recorder.ts`) and the voice recorder (`VoiceZenohRecorder`); these are the one durable record of positions and of voice respectively (both are non-durable in the mesh transport; clients show only a ~10-minute live position trail and hear voice live only).
Feed ingestion (NiFi + HTTP seam)	0+	Optional	Two ways external producers push data in, neither on the core critical path. NiFi writes AIS/ADS-B/GeoJSON rows into `track_hits` directly via the least-privilege `nifi_ingest` Postgres role. Separately, the HTTP force-tracking ingest seam (`POST /api/feed/*`, HS256 JWT) lets external force-tracking producers push detections/targets into the web tier (see Machine Ingest API). Both are optional — a deployment with no external feeds runs without either.
Zenoh remote-api bridge	1 per web	—	Browser-only. `@eclipse-zenoh/zenoh-ts` speaks the `zenoh-plugin-remote-api` WebSocket, not native zenoh transport, so the web VM runs a `zenoh-bridge-remote-api` sidecar: it joins the mesh as a client over native `tls/7447` and Caddy fronts its WebSocket as `wss` (port `7448`). One per deployment (mirrors the single web); it attaches to one or a few routers and receives the whole federated mesh via gossip — the N routers stay symmetric. Stateless relay of sealed bytes; not on the durability path.

What needs HA¶

Durability is concentrated in two places; everything else is relay or client-owned. That is the whole reason "make the routers highly available" is not the goal.

Directory — the one true SPOF. It is the identity authority (token signing, revocation list, trust root) with no replication or standby in source. Down ⇒ no new logins or token issuance — but already-issued tokens on routers and clients keep working, so the live mesh keeps running. This is the component whose availability matters most. Source ships no HA mechanism; design redundancy around it (see Status & Roadmap).
Web recorders — the position and voice archive. The web tier runs two server-side recorders that durably persist to its database: the position recorder (web:app/domains/tracks/position_zenoh_recorder.ts) and the voice recorder (VoiceZenohRecorder). Both record streams that are otherwise non-durable in the mesh transport — positions relay through the mesh with clients showing only a ~10-minute live trail, and voice is non-durable in the live mesh transport (no router persists it). The web voice recorder is the only hop that durably archives voice (to SQL, for replay/ORK); so voice is live-only in the mesh but durably archived in the web DB. If position or voice history matters to you, these recorders are the thing worth protecting.
Routers — relays, redundant by federation, not by HA. A router forwards live traffic and keeps short-lived local buffers. For live continuity you don't make one router highly available — you run federated peers on the same cells and let clients reconnect. What a client is sending is covered by the client outbox (web/Android), durable on the client and replayed on reconnect, so a router restart never loses outbound messages.

The one caveat: per-router durable history is not replicated

A router does hold authoritative durable state — chat (24 h TTL), drawings (indefinite), channel definitions (indefinite) — in its local RocksDB, and there is no backfill between routers. The outbox is sender-side: it covers what you send, not what you receive. So a client that reconnects to a different peer after a failure sees that peer's history, not the dead router's — received chat/drawing history can be lost on failover. For short-lived tactical traffic this is an accepted tradeoff; if long-lived shared drawings matter to you, treat it as a real limitation (see Known limits & gaps).

Coverage cells¶

Routing in Bedrock is cell-first: every position, heartbeat, and drawing is tagged with the geographic cell it happened in, and the cell is the first segment of the message address (waypoint/<cell>/...).

Cells are geohash-5. A cell is exactly five characters from the geohash base32 alphabet (digits plus lowercase letters, minus a, i, l, o) — enforced by GEOHASH5_PATTERN in the Directory (directory:app/domains/shared/server_token_service.ts). A geohash-5 cell is roughly a few km on a side at mid-latitudes (the create-server runbook uses ~5 km × ~5 km as a rule of thumb).
Deny-by-default ACL. A router's transport access control denies everything, then adds exactly two allow rules: waypoint/global/** (auth, chat, voice, channels, key-rotation, liveliness, revocation relay) and one waypoint/<cell>/** per cell in the router's ServerToken.coverage_cells (server:src/router.rs). A router with no coverage cells drops all cell-routed traffic — it is misconfigured.
256-cell cap. A single ServerToken may carry at most 256 coverage cells (MAX_COVERAGE_CELLS = 256, directory:app/domains/shared/server_token_service.ts). The source comment notes this covers ~6,400 km² at the widest geohash-5 cell size (~24 km × ~24 km near the equator).
Manual assignment at enrollment. Cells are chosen by the operator and baked into the ServerToken when the server is registered — there is no automatic geography-to-cell allocation tool. Two routers covering neighbouring areas simply own disjoint cell lists.

Cells are set when you register a server. See Add a server (Step 2 "Decide what to register" and "How geofencing works").

Fog of war — and the trusted web client's exemption¶

Cell-first routing is fog of war: a field device (Android, a drone, a sensor) scopes its geocentric subscriptions to its own cell, so it only receives positions/drawings/sensors for where it is. That's deliberate — a captured handset shouldn't expose the whole operating picture.

The web client is the Common Operating Picture — an analyst/ops view that, by design, must see everything, with no fog of war. It is also a trusted tier: it runs server-side (Inertia render + the durable position recorder), authed by a deployment-scoped machine identity (the Directory web device — a service platform that holds an IdentityToken but is not a router; see create-web). So the web is treated differently from a field device on two axes:

It subscribes the cell wildcard. Instead of waypoint/<myCell>/pos/**, the web (and its server-side recorder) subscribe waypoint/*/pos/** (and the same for heartbeat/draw/sensor) — every cell, not one. This is the no-fog-of-war subscription; see web:inertia/features/transport/infra/key_expressions.ts (*_SUB_ALL).
Its broad read must be explicitly authorized — the wildcard is denied by default. zenoh's ACL requires a subscription keyexpr to be included in an allow rule. The deny-by-default node rule only allows exact per-cell shapes (waypoint/<cell>/**), which do not include a cross-cell wildcard — so the web's waypoint/*/pos/** subscription is rejected, even on a single router that covers the cells (this is not a federation-only edge case; the COP map is simply empty until granted). The fix is a subject-scoped allow rule granting the web bridge broad read (waypoint/**), bound to its mTLS cert common-name (its session identity). This is the web's "special keys, valid only for that web's session": omniscient access that follows the bridge's authenticated identity — never the generic node subject and never field devices.
Config: list the bridge's cert CN(s) in transport.web_bridge_cert_names (env WAYPOINT_WEB_BRIDGE_CERT_NAMES, comma-separated) on each router. Empty = no grant. The CN is whatever the bridge cert was minted with (e.g. pki/generate.sh --add node --name web-<env>-zenoh-bridge). See the server repo DEPLOY.md → Browser / COP web client.

Why a subject-scoped grant, not a per-session relay

Two designs were considered for trusted-web omniscience: (A) a subject-scoped ACL rule keyed on the web principal (the web keeps subscribing the cell wildcard it already uses), or (B) the router mirroring all cells into a dedicated per-session keyexpr (waypoint/websession/<id>/**). (A) was chosen — it needs no web change, binds access to the web's authenticated identity, and composes with federation (each router grants the web principal), whereas (B) duplicates traffic and adds a relay mechanism for no real gain. The geocentric publish path (web operators publishing their own position) is gated separately on browser-operator signing-key delivery.

Topology patterns¶

Single router¶

Smallest possible deployment: one Directory, one router holding all the coverage cells. Every client connects to the same endpoint. No federation, no redundancy.

flowchart TD
    D["Directory<br/>(identity authority)"]
    R["Router<br/>(all cells)"]
    C["Clients<br/>(web / android / node)"]
    D -. tokens .-> R
    D -. tokens .-> C
    C --> R

Use it for: a single site, a demo, or a pilot where one box comfortably carries the whole operating area.

Single-site HA¶

Multiple routers covering the same geography — i.e. identical coverage cells — with clients spread across them (round-robin DNS or a static endpoint list). Each router still routes the same cells.

flowchart TD
    D["Directory"]
    R1["Router A<br/>(cells X, Y, Z)"]
    R2["Router B<br/>(cells X, Y, Z)"]
    C1["Clients (group 1)"]
    C2["Clients (group 2)"]
    D -. tokens .-> R1
    D -. tokens .-> R2
    C1 --> R1
    C2 --> R2
    R1 <--> R2

Redundancy here is federation, not HA of one box

Two routers on the same cells give live continuity: if one fails, clients reconnect to the other and traffic keeps flowing — no standby promotion, no automatic failover, client redirection is operator-managed (DNS / static endpoint list). It is not state replication: received chat/drawing history is per-router and not backfilled, so a client reconnecting to the peer sees the peer's history, not the dead router's (see What needs HA). Positions are non-durable in the mesh — routers relay them and clients keep a ~10-minute live trail.

Use it for: a single site that wants more than one box live so one failure doesn't take everyone offline — accepting that received durable history is per-router.

Multi-site / federated¶

One router per region, each carrying different coverage cells, linked into a mesh by an explicit static peer list. Cross-region traffic relays over the federation hop.

flowchart LR
    D["Directory"]
    RE["Router EU<br/>(EU cells)"]
    RU["Router US<br/>(US cells)"]
    CE["EU clients"]
    CU["US clients"]
    D -. tokens .-> RE
    D -. tokens .-> RU
    CE --> RE
    CU --> RU
    RE <==>|"transport.connect<br/>mTLS gossip"| RU

Use it for: a deployment spanning regions where each region has its own router and operators in one region need to see relevant traffic from another.

Federation¶

Routers federate router↔router. The link is not auto-discovered — the operator declares it:

Static peer list. Each router lists the peers it dials in transport.connect (server:src/config.rs), a list of Zenoh locators. Add a locator to connect, prune it to disconnect.
mTLS gossip with CN pinning. Federation links are mutually authenticated; an inbound peer's certificate CN is its router principal_id (server:src/active_peers.rs). Federation does not re-sign or re-verify per-hop application envelopes — payloads relay byte-identical across the hop — so the revocation audit on the peer's CN is the only enforcement, and an effectively-removed peer router is fully cut off only once its cert expires and the operator prunes it from transport.connect.
Operator chooses the shape. Because peering is explicit, the operator decides whether routers form a chain, a star, or a full mesh. There is no topology controller.

Sizing¶

What drives scaling¶

There is no single "users per box" number in the source. Scale is driven by three independent pressures:

Geography → more routers / more cells. Wider or more-fragmented coverage means more geohash-5 cells, and (past the 256-cell cap per ServerToken, or past one box's reach) more routers.
Redundancy → more routers. Wanting more than one router live for a given area means duplicating its cells onto additional boxes (see Single-site HA) — bounded by the state caveat, not by a config limit.
Operators / devices → bigger revocation list + Directory load. More principals and devices grow the revocation list every router polls and the token-issuance load on the (single) Directory.

The hard, source-defined limits a deployment runs into:

Limit	Value	Where
Coverage cells per ServerToken	256 (`MAX_COVERAGE_CELLS`)	`directory:app/domains/shared/server_token_service.ts`
ServerToken TTL	default 30 days, max 365 days	`directory:app/domains/shared/server_token_service.ts` (`DEFAULT_SERVER_TOKEN_TTL_DAYS`, `MAX_SERVER_TOKEN_TTL_DAYS`)
Chat retention (durable, per-router)	24 h TTL, swept hourly	`server:src/store_ttl.rs` (`CHAT_TTL_MS`)
Key-rotation grace period	1–60 min, default 10	`server:src/config.rs` (`default_grace_minutes`, `MAX_GRACE_PERIOD_MINUTES`)
Revocation-list poll cadence	default 300 s	`server:src/config.rs` (`default_revocation_sync_interval_secs`)
Max invite size	65,536 bytes	`server:src/invite_handler.rs` (`MAX_INVITE_BYTES`)

e2-small is the demo/dev baseline, not a validated production size

The Terraform example environments provision a GCE e2-small (2 vCPU / 2 GB) with a 20 GB pd-balanced disk for both the server and directory modules (infrastructure:terraform/modules/server-environment/, .../directory-environment/). These are demo/dev examples — there is no autoscaling and no load balancer in the modules, and no production sizing has been validated in source. Treat e2-small as a starting point, not a recommendation.

Browser bridge fan-in¶

Every browser on a deployment funnels through one remote-api bridge on the web VM (see Components). This is a fan-in worth understanding before sizing the web tier.

flowchart LR
    B1["browser 1"] -->|WS| BR
    B2["browser 2"] -->|WS| BR
    BN["browser N"] -->|WS| BR
    BR["Bridge<br/>(one runtime,<br/>N sessions)"] -->|1 tls/mTLS link| R["Router → mesh"]

Each browser WebSocket is its own Zenoh session inside the bridge, but all N sessions share the bridge's single runtime and its one uplink to the router. That asymmetry sets the performance shape:

Upstream is deduplicated. Fifty browsers all subscribing waypoint/<cell>/positions declare ~one aggregated subscription upstream — upstream interest scales with the number of distinct keyexprs, not the user count.
Downstream fans out per browser. Each inbound sample is copied to every matching browser WebSocket, all serialized and written on the single web VM. Downstream cost ≈ samples × subscribers. This is the web tier's scaling chokepoint, and voice is the worst case.

Voice is full-mesh — fan-out scales with channel size, not a mixer

Voice has no server-side mixing or SFU. Each speaker publishes its own Opus stream (waypoint/voice/<voiceId>/<principalId>/<sessionId>) at 50 frames/s (20 ms frames), and every participant subscribes to all of them (waypoint/voice/<voiceId>/**). So in an N-person channel the bridge fans each active speaker's 50 frames/s to the other N−1 browsers: roughly 50 × (concurrent speakers) × (N−1) frame deliveries per second, all on the one bridge / one web VM. Voice is push-to-talk, so steady state is usually 1–2 concurrent speakers; the absolute worst case (everyone keyed at once) is 50 × N × (N−1) — ~19,000 deliveries/s for N=20. Either way voice saturates the web tier long before map traffic does — size (and load-test) the web VM against your largest channel and a realistic concurrent-talker count, not against average load. (e2-small is a baseline, not a validated voice size — see the note above.)

The per-channel subscription is wildcard (waypoint/voice/<voiceId>/**), so it matches every speaker in the channel — including the operator's own published key. There is no application-level self-filter; whether a speaker receives its own frames back is governed by Zenoh subscriber locality. For sizing this ±1 is immaterial — treat the local fan as ~N subscribers per channel. Source: web:inertia/features/voice/voice_manager.ts (50 fps), …/transport/subscribers/voice_audio_subscriber.ts (per-channel ** subscription).

Single point of failure for realtime. The bridge is per-web; if it restarts, every browser on that web drops its session at once and reconnects (the client retries automatically). The web app itself — HTTP/SSR, login — is unaffected, and the position recorder runs server-side, so the durable position archive never depends on the bridge.

Identity survives the multiplexing. The bridge holds one transport identity (its mTLS client cert) and presents one session to the router, so the router ACL only gates it coarsely by keyexpr (waypoint/<cell>/**). Per-operator authorization is unaffected because it lives at the app layer: each browser packs its own AuthEnvelope with its own IdentityToken and verifies inbound envelopes itself — the bridge only ferries sealed bytes. The tradeoff: you cannot apply per-browser transport-level ACL, since to the router they are one session.

Scaling the web tier. Vertically, give the web VM more CPU/bandwidth headroom for voice fan-out. Horizontally, run more than one web VM — each gets its own bridge and hostname, and to the mesh each bridge is just another client. There is no shared-bridge clustering; capacity grows by adding webs, not by pooling one.

Worked example¶

A 2-site, ~50-operator deployment: operators split across Europe and the US, with one router per region and a gateway feeding an external system.

Shape:

1 Directory — the single identity authority for both sites. Mints all operator/device tokens and signs the one revocation list both routers poll.
2 routers, peer-linked:
Router EU — coverage cells over the EU operating area.
Router US — coverage cells over the US operating area (a disjoint cell list).
Each lists the other in transport.connect, forming a 2-node mesh over mTLS gossip.
1 gateway — connects to one (or both) routers as a Directory-authenticated service client and bridges to the external system. If it dies, the mesh is unaffected.
Clients — web and Android operators, each pointed at the router for their region.

How it connects and what federates:

Before anything, the PKI and the Directory exist, and each box meets its prerequisites — see Before you begin.
Each router is registered against the Directory with its region's cells — see Add a server (Step 2 sets the coverage cells; Step 3 mints and installs the ServerToken). The EU and US routers get different cell lists.
An EU operator authenticates against the Directory, receives a token, and connects to Router EU. Their position/heartbeat/drawing traffic is tagged with EU cells and routes through Router EU.
A US operator does the same against Router US with US cells.
What federates: global-namespace traffic (waypoint/global/** — chat, voice, channels, liveliness, key-rotation, revocation relay) and any cell-scoped traffic for cells the receiving router covers relays across the transport.connect hop. Because the cell lists are disjoint, an EU client's cell-scoped (position/drawing) traffic is only delivered on routers that cover EU cells; the cross-region visibility you get is what the global namespace and overlapping coverage carry. Durable chat is held on the router that received it (24 h TTL) and is not replicated to the peer.

Known limits & gaps¶

These are the current reality, not future plans. Plan around them; do not assume an HA mechanism the source doesn't implement. The framing is in What needs HA — these are the sharp edges of it.

Directory is a single point of failure. No replication, standby, or HA in source. Directory down ⇒ no new logins or token issuance (cached tokens keep working, so the live mesh continues). This is the availability priority.
Per-router durable history is not replicated (by design). Routers are relays; chat (24 h), drawings (indefinite), and channels (indefinite) live in each router's local RocksDB with no replication or backfill, so failover loses the dead router's received history. Acceptable for short-lived tactical traffic; a real limitation if long-lived shared drawings matter.
No automated failover. Nothing promotes a standby or reroutes clients automatically; client redirection is operator-managed (DNS / static endpoint lists). Live continuity comes from running federated peers, not from HA of one box.
No autoscaling and no load balancer. The Terraform modules provision fixed single instances; clients use static router endpoints.
No per-router capacity SLA. Source defines no "clients per router" or throughput model. Capacity must be measured, not read off a number.
No multi-region cell-assignment tooling. Coverage cells are chosen and assigned manually at enrollment; there is no algorithm or tool to allocate geohash cells across regions.

For where these sit relative to planned work, see Status & Roadmap.

Verified against server@ab688f0, directory@9c5e565, web@80e3ec2, infrastructure@b3849c0.