Fleet-wide traffic intelligence
for your API.

Gate traffic inside your app based on backend health. Not request counts. Not static rules. Your actual metrics, fleet-wide.

Read the docs View pricing

backend health

p95

err%

allowed traffic

ent

pro

free

When p95_latency > 500 on free → Then block free

Rate limiting is blind to your infrastructure.

Traditional rate limiters count requests. They have no idea if your origin is healthy or on fire.

Static limits can't see your backend.

You set 100 req/s, but your database is responding in 8 seconds. The rate limiter keeps letting traffic through until the connection pool is exhausted.

The "silent" outage.

The rate limiter reports "success" while your users see an error. It counts requests. It has no idea your backend is drowning.

Every request costs the same.

A POST that triggers a complex join costs 100x a cached GET, but they both count as "1 request." Your limit has no idea which traffic is expensive.

Limits are guesses.

You picked a number that worked in staging. The right limit changes every minute based on query complexity, connection pool state, and what else is running.

Everyone gets the same 429.

When limits trigger, your enterprise customer's checkout gets the same rejection as a free-tier user browsing docs. No way to prioritize what matters.

No fleet coordination.

Each instance enforces its own limit. Scale to 10 pods and your 100 req/s becomes 1,000. Scale down and surviving pods still think they can each do 100.

How it works.

A closed loop between your backend health and your in-app gating. The SDK runs inside your application.

In-app SDK. No reverse proxy. Each gate call runs synchronously in your process with zero network hops on the hot path.

Fail-open by design. Short outages use cached policy. Prolonged outages gracefully step aside so your existing infrastructure takes over. WaitState adds intelligence, not a dependency.

Zero dependencies. Every SDK is compiled from a single Rust core. No runtime deps, no JNI classloaders, no C toolchains on your CI.

Gate before origin. Shed traffic at the edge before it reaches your backend. Fewer wasted compute cycles, lower tail latency.

Same Rust core. Compiled to WASM, runs in Cloudflare Workers, Fastly Compute, Vercel. One binary, every edge.

Sub-millisecond gate. WASM near-native speed. No cold-start penalty on the hot path.

Any language. Talk to the agent from any stack. No SDK embedding required. Gate over localhost HTTP.

DaemonSet or sidecar. One agent per node, shared by all pods. Minimal resource footprint. 10m CPU, 16Mi memory.

Apollo Router coprocessor. Built-in GraphQL integration. Tag by operation name, gate before execution.

Automated triage.

Not all traffic is equal. Assign weights to your tags. When a reflex rule fires, low-weight traffic gets shed first.

↓

Tag-Based Shedding

Every gate call includes a tag and weight. When rules fire, low-weight tags get shed first.

free weight: 1

pro weight: 5

enterprise weight: 10

Fail-open + safe mode

Short outages use cached policy. If the lease expires, the SDK enters safe mode. You choose the strategy:

fixed_rps throttle to a max req/sec you set
open allow all traffic (full fail-open)
last_policy keep enforcing the stale policy

Reflex Rules

Rules react to built-in metrics like latency and error rate. Report your own custom metrics and trigger rules on anything your backend can measure.

When p95_latency > 500ms

Then block free

When error_rate > 5%

Then throttle pro to 50%

When p95_latency > 1s

Then block pro

No rules target enterprise. It passes through at full rate while lower tiers are shed.

One Rust core. Native everywhere.

A single Rust engine compiles to native SDKs for every major runtime. Same in-memory gate call. Zero-copy hot path. Use the waitstate crate directly in Rust.

Embed in your app