Skip to content
Infra

Edge or origin? A decision framework for latency-sensitive features

When to move compute to the edge, when to stay at origin, and the hybrid patterns that work in practice.

Anthra AI TeamAnthra AI Team
Engineering Team8 min read
Edge or origin? A decision framework for latency-sensitive features hero image
Table of contents
  1. What "edge" actually means
  2. The first question: latency budget
  3. A concrete budget
  4. The second question: what does the feature need?
  5. Strong data gravity → origin
  6. Read-heavy, cachable → edge shines
  7. Compute-heavy, stateless → edge works
  8. Heavy compute or long-running → origin
  9. The third question: architecture complexity
  10. The decision matrix
  11. Four patterns that actually work
  12. Pattern 1: Edge personalization with precomputed features
  13. Pattern 2: Edge auth + origin data
  14. Pattern 3: Edge content + origin commerce
  15. Pattern 4: Edge image transformation
  16. When edge is wrong
  17. Cost modeling
  18. The migration playbook
  19. Operational guardrails that prevent surprises
  20. Closing
  21. Related resources

Edge compute used to be for DNS and a little bit of Cloudflare Workers magic at the CDN layer. In 2026, it's a serious compute tier: Workers, Durable Objects, Deno Deploy, Vercel Edge, Fastly Compute. You can run real logic, real databases (sort of), and real ML at dozens of POPs around the world.

The question stops being "can we?" and becomes "should we?" Here's the framework we use.

What "edge" actually means

Before deciding, let's be specific. When we say "edge" we mean:

  • CDN edge / edge workers — V8 or WASM runtimes at CDN POPs (Cloudflare, Fastly, Vercel, AWS Lambda@Edge)
  • Regional compute — cloud regions close to users (AWS regions, multi-region deployments)
  • Client edge — the user's own device (browser, mobile app)
  • IoT edge — actual edge hardware (gateways, devices)

This post focuses on CDN edge and how it relates to origin. The patterns extend to the others.

The first question: latency budget

Every latency-sensitive feature has a budget. Interactive apps target ~100ms for responses to feel "instant" (Jakob Nielsen's classic research). Videos and games are tighter. Search autocomplete is ~150ms. AI chat is forgiving (~2s is acceptable).

For each feature, ask:

  1. What's the target end-to-end latency?
  2. What's unavoidable? Network RTT to user, TLS handshake, time to first byte from anywhere.
  3. What's left for compute?

If the math doesn't work at origin, edge becomes a candidate.

A concrete budget

A user in Mumbai, origin in us-east-1 (Virginia). Round-trip time ~200ms. If your target is 300ms end-to-end, you have 100ms for everything else (app server, database, rendering). That's tight.

Moving the same logic to a Mumbai edge POP: RTT ~15ms. You now have 285ms for app logic. Different universe.

The rule of thumb

If your origin is more than 100ms away from a meaningful fraction of your users, edge is worth considering for latency-sensitive features.

The second question: what does the feature need?

Edge compute has constraints. Know them before deciding.

Strong data gravity → origin

If your feature needs to read or write data that lives in a specific region, edge doesn't help unless you replicate the data to the edge (complicated, expensive).

Examples:

  • Writing a purchase to the primary orders DB
  • Reading a user's full profile (if it's only in one region)
  • Joining data across several tables in a SQL database

For these, edge adds a hop instead of removing one. Stay at origin.

Read-heavy, cachable → edge shines

If the feature is mostly reads of data that can be cached or replicated cheaply, edge is excellent.

Examples:

  • Personalization with precomputed user features stored in KV
  • Content rendering from a CMS (the classic edge use case)
  • Feature flag evaluation
  • Rate limiting, bot detection
  • A/B test assignment

Compute-heavy, stateless → edge works

Stateless or low-state compute near the user is a good fit.

Examples:

  • Image transformations
  • Lightweight ML inference (classifiers, embeddings)
  • Input validation, geo-based routing
  • Request/response rewriting

Heavy compute or long-running → origin

Edge runtimes have strict limits:

  • Cloudflare Workers: 30 sec CPU time, 128MB memory (higher in paid tiers)
  • Vercel Edge: 30 sec max duration
  • AWS Lambda@Edge: 30 sec, 128MB

For anything longer-running or memory-intensive, you're at origin.

The third question: architecture complexity

Edge adds an explicit tier to your architecture. That means:

  • Another place bugs can hide
  • Another deploy pipeline
  • Another monitoring surface
  • Another runtime to maintain proficiency in
  • Potentially different language (JS/TS for Workers, Rust/Go for some others)

For low-traffic apps, this complexity isn't worth the latency win. For high-traffic apps with aggressive latency targets, it's essential.

⚠️Edge debugging is harder

You can't ssh into an edge node. Logs are often sampled. Replay is difficult. Instrument heavily before deploying anything non-trivial.

The decision matrix

ScenarioRecommendation
Reading precomputed/cachable data, latency-sensitiveEdge
Writing to a consistent primary DBOrigin
A/B test assignment, feature flagsEdge
Complex SQL joins across large tablesOrigin
Personalization with edge KVEdge
Heavy ML inference (> 500ms CPU)Origin
Lightweight ML inference (< 50ms)Edge
Image / video transformationEdge (CDN)
User auth (JWT verification)Edge
User auth (session lookup in Redis)Origin (or replicated sessions)
Rate limitingEdge
Fraud detection requiring historical dataOrigin
Form submissionOrigin (usually)
Real-time chat / WebSocketsOrigin (or Durable Objects)

Four patterns that actually work

From our engagements, these are the edge patterns with consistent ROI:

Pattern 1: Edge personalization with precomputed features

Problem: Every homepage visit hits a personalization service at origin, adding 150-300ms.

Solution:

  • Batch-compute user features nightly (or in real-time with streaming).
  • Replicate to edge KV storage (Cloudflare KV, Vercel Edge Config).
  • Edge Worker reads user features, applies scoring logic, returns personalized HTML.
  • Origin is consulted only for cold-start or explicit cache bust.

Outcome: p95 latency drops from ~300ms to ~30ms. Origin traffic drops 90%+. We did exactly this for a media platform case study.

Pattern 2: Edge auth + origin data

Problem: Every API request hits origin to validate a session, even if validation is trivial.

Solution:

  • Use JWTs with short expiry (15 minutes).
  • Edge Worker validates the JWT signature (fast, ~1ms).
  • Only requests that need actual user data proceed to origin.
  • Public/cached endpoints bypass origin entirely.

Outcome: Origin traffic reduced significantly. Geo-distributed auth means users see faster responses globally.

Pattern 3: Edge content + origin commerce

Problem: Product detail pages need to be fast globally, but checkout needs strong consistency.

Solution:

  • Product pages rendered at edge from cached product data.
  • Inventory/stock info cached with short TTL (30 seconds).
  • Checkout and purchases hit origin directly — consistency wins over latency.

Outcome: Browsing feels instant worldwide; purchases are still reliable.

Pattern 4: Edge image transformation

Problem: Every image needs multiple sizes, formats (WebP, AVIF), and DPR variants. Pre-generating all combinations is wasteful.

Solution:

  • Store one high-res master image at origin.
  • Edge Worker transforms on-demand based on query params (?w=800&q=80&fm=webp).
  • CDN caches transformed variants at each edge POP.

Outcome: Minimal origin storage, fast delivery, huge flexibility.

When edge is wrong

Equally important: the cases where edge is the wrong answer even though it seems appealing.

  • "We want to be global." Geographic distribution of users isn't enough reason. You need a specific latency problem to solve.
  • "We want to cut costs." Edge can be cheaper, but it's not guaranteed. Count compute-time + per-invocation costs + data transfer carefully.
  • "We have a monolith and want to modernize." Edge is not a good first step in modernization. Get to a serviceable origin architecture first.
  • "Our users are complaining about speed." Sometimes the answer is a simpler page, better caching at origin, or a CDN without edge compute.
💡Measure before migrating

Add real-user monitoring (RUM) to capture p50/p95 latency by region before deciding. If 90% of your users are in North America and your origin is in Virginia, edge won't move the needle.

Cost modeling

Rough comparison for 100M requests/month of moderate-complexity logic:

OptionApproximate monthly cost
AWS Lambda (origin)$500-800
Cloudflare Workers (edge)$500-700
Vercel Edge Functions$600-900
Traditional servers with CDN caching$300-1000 (depends heavily on cache hit rate)

For write-heavy workloads, origin usually wins on cost. For read-heavy with high cache hit rates, edge is competitive or cheaper. Always factor in the hidden costs: engineering time, debugging, monitoring infrastructure.

The migration playbook

If you've decided edge is worth it:

  1. Start with one feature. Don't migrate your whole app. Pick one high-leverage, stateless-ish feature — often personalization or auth.
  2. Measure first. Baseline latency, cost, error rate.
  3. Shadow traffic. Run edge in parallel with origin, compare results, confirm correctness.
  4. Gradual rollout. 1% → 10% → 50% → 100% with monitoring at each step.
  5. Decommission the origin path only after edge has been stable for 2+ weeks.

Operational guardrails that prevent surprises

Before scaling edge workloads broadly, put these controls in place:

  • regional error-rate and latency dashboards with alert thresholds
  • explicit fallback paths to origin for each edge feature
  • deployment blast-radius controls (geo, traffic percentage, feature flags)
  • synthetic probes from key markets to validate user-path latency
  • weekly cost-per-request review to detect hidden drift

Edge wins are real, but only when operational discipline matches architectural ambition.

Closing

Edge compute in 2026 is legitimate, production-ready, and sometimes transformative. It's also not a silver bullet. The best use is surgical: identify the handful of features where origin latency is a real user problem, move those specifically, leave the rest.

If you're not sure where edge fits for your app, start with the decision matrix above. Most features end up staying at origin — and that's fine.


Related: our media edge case study, AWS cost optimization, and how we help with infrastructure optimization.

Tags

Anthra AI Team

Anthra AI Team

Engineering Team

Collective posts from the engineers at Anthra AI. We write about what we build.

More posts by Anthra AI Team

Share this article

Share

Get insights like this weekly

Product engineering notes on AI, data, and infrastructure - no fluff.

Previous post

Event schema design: what every product team gets wrong

Product Analytics

Next post

Building an internal analytics platform: the 14-week playbook

Product Analytics