TTFB Docker / Kubernetes / Cloud Run / ECS

Fix TTFB on Docker: Cut Server Response Time on Containerized Apps

TTFB (Time to First Byte) problems in containerized applications are overwhelmingly infrastructure problems, not application problems. When a request hits your Docker-hosted service and the browser waits 800ms for the first byte, the culprit is almost never slow business logic. It is a fat base image extending cold-start time, a missing startup probe sending traffic to a JIT-compiling runtime, a bridge network adding unnecessary kernel namespace hops, or an autoscaler that let every replica shut down during a quiet period. This guide addresses each of those failure modes in sequence, with real Dockerfile configurations, Kubernetes manifests, and reverse proxy configs. Following all five steps has reduced measured TTFB from 1.4s to under 120ms in production Node.js and Go services running on Kubernetes, Cloud Run, and ECS Fargate.

Expected results

These are representative before/after measurements from a Node.js 20 Express service deployed on Kubernetes (GKE), measured with WebPageTest from a US-East origin against a US-East cluster:

Before

1.4s

TTFB (Poor) — full node:20 image, no startup probe, bridge networking, no reverse proxy, no resource limits

After

80ms

TTFB (Good) — node:20-slim, multi-stage build, startupProbe, Nginx HTTP/2, resource requests set

TL;DR — four highest-impact changes:
  • Switch base image from node:20 to node:20-slim or gcr.io/distroless/nodejs20-debian12.
  • Add a Kubernetes startupProbe so traffic waits for genuine readiness.
  • Put Nginx or Caddy in front for HTTP/2 termination and upstream keepalive.
  • Set min-instances=1 (Cloud Run) or non-zero desiredCount (ECS) to eliminate cold-start TTFB.

Common causes of high TTFB in Docker deployments

Before running a profiler on your application code, audit the container configuration. These infrastructure-layer causes account for the majority of TTFB regressions in containerized workloads:

  • Oversized base images and slow cold starts. The full node:20 image is approximately 350MB compressed. On a new node or scale-out event, pulling and extracting 350MB of layers before the runtime even starts adds hundreds of milliseconds to the first request served by that replica. Slim and distroless images cut this to 60-80MB.
  • Missing startup probes sending traffic to warming runtimes. Node.js, the JVM, and other JIT-compiled runtimes require seconds of warm-up before reaching peak throughput. Without a startupProbe, Kubernetes marks a container ready as soon as the process starts, which is before V8 or the JVM has compiled hot paths. The first 10-20 requests hit interpreted code and respond 3-5x slower than steady-state.
  • Bridge networking overhead. Docker's default bridge network routes packets through a virtual Ethernet bridge and NAT layer. For high-throughput services, switching to host networking or Kubernetes hostNetwork: true reduces per-packet latency by eliminating the kernel namespace translation for each forwarded packet.
  • No reverse proxy for HTTP/2 and connection management. If your application container speaks HTTP/1.1 directly to the internet, each browser connection is sequential within that TCP connection. A reverse proxy handles HTTP/2 multiplexing externally and uses persistent keepalive connections to the upstream container, amortizing TCP and TLS overhead across many requests.
  • Absent or misconfigured CPU and memory limits. Containers without resources.requests set are placed on nodes by the scheduler without guarantee of CPU availability. Under burst conditions, the Linux CFS scheduler throttles CPU-request-less containers, which translates directly into increased server processing time and higher TTFB.
  • Autoscaler cold starts on serverless container platforms. Cloud Run, ECS Fargate, and similar platforms scale to zero by default. A replica that starts from zero requires a full container pull, process initialization, and runtime warm-up before it can serve the triggering request. That first request routinely measures 2-5s TTFB regardless of application code speed.
  • Inefficient layer ordering causing cache misses on every build. If COPY . . appears before RUN npm ci in a Dockerfile, any source-file change invalidates the dependency layer and forces a full npm ci on every build. Developers push to production with unnecessarily large images because the cached layer from the previous build was not reused.

Step-by-step fix

Step 1: Use a slim base image with multi-stage builds

The single highest-leverage Dockerfile change is replacing node:20 with node:20-slim (Debian-based, no build tools, ~80MB) or gcr.io/distroless/nodejs20-debian12 (no shell, no package manager, ~65MB). Combine this with a multi-stage build so build-time dependencies (TypeScript compiler, devDependencies, bundler) never ship in the production image.

Alpine-based images (node:20-alpine) are even smaller (~50MB) but use musl libc instead of glibc. Native Node.js add-ons compiled against glibc will crash at runtime. If your service has no native add-ons, Alpine is safe and fast; otherwise prefer node:20-slim.

Layer order matters for cache hits. Always copy package.json and package-lock.json before copying source files. If source changes but package.json does not, Docker reuses the cached npm ci layer and the build completes in seconds instead of minutes.
Dockerfile — Before (single-stage, full image)
# Bad: full image, no layer ordering, devDeps ship to prod
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "dist/server.js"]
Dockerfile — After (multi-stage, slim runtime)
# Stage 1: build
FROM node:20-slim AS builder
WORKDIR /app

# Copy manifests first — cached unless deps change
COPY package.json package-lock.json ./
RUN npm ci --include=dev

# Copy source and compile TypeScript
COPY tsconfig.json ./
COPY src ./src
RUN npx tsc --outDir dist

# Stage 2: production runtime
FROM node:20-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production

# Copy only production deps and compiled output
COPY package.json package-lock.json ./
RUN npm ci --omit=dev --ignore-scripts \
    && npm cache clean --force

COPY --from=builder /app/dist ./dist

# Run as non-root for least privilege
RUN addgroup --system appgroup \
    && adduser --system --ingroup appgroup appuser
USER appuser

EXPOSE 3000
CMD ["node", "dist/server.js"]

This pattern reduces a typical Node.js image from ~480MB to ~95MB and brings cold-start time down by roughly 60% on a fresh node. The production stage has no TypeScript compiler, no devDependencies, and no build-time secrets.

Step 2: Enable Docker BuildKit cache mounts

Docker BuildKit (enabled by default in Docker 23+ and required for docker buildx) supports cache mounts that persist build-time directories between builds on the same host. This is most impactful for npm ci and equivalent package manager operations, which re-download packages from the network on every build without a cache mount.

Dockerfile — BuildKit cache mount for npm
# syntax=docker/dockerfile:1.7
FROM node:20-slim AS deps
WORKDIR /app
COPY package.json package-lock.json ./

# --mount=type=cache persists ~/.npm between builds
RUN --mount=type=cache,target=/root/.npm \
    npm ci --omit=dev

FROM node:20-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY dist ./dist
USER 1000
EXPOSE 3000
CMD ["node", "dist/server.js"]
Shell — build with BuildKit and inline cache export
# Build with BuildKit and export layer cache to registry
docker buildx build \
  --cache-from type=registry,ref=my-registry/my-app:cache \
  --cache-to   type=registry,ref=my-registry/my-app:cache,mode=max \
  --tag my-registry/my-app:latest \
  --push \
  .

# On CI (GitHub Actions), export cache to gha backend
docker buildx build \
  --cache-from type=gha \
  --cache-to   type=gha,mode=max \
  --tag my-registry/my-app:latest \
  --push \
  .

Cache mounts bring CI build times for a typical 150-dependency Node.js service from 4-6 minutes to under 90 seconds on cache hits. Faster CI means faster deploys and fewer windows where a misbehaving image is in flight.

Step 3: Add a healthcheck and configure Kubernetes probes

A container that is "running" is not necessarily "ready." Node.js applications routinely require 3-8 seconds to load configuration, open database connection pools, register routes, and pre-compile hot code paths. The Dockerfile HEALTHCHECK instruction and Kubernetes probe configuration are the mechanisms that communicate readiness to the scheduler and load balancer.

The startupProbe, introduced in Kubernetes 1.18, is specifically designed for slow-starting containers. It disables the livenessProbe until the startup check passes, giving the container a window to initialize without being killed for missing a liveness deadline. Set failureThreshold * periodSeconds to match the worst-case startup time you have measured in staging.

Dockerfile — HEALTHCHECK instruction
FROM node:20-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY dist ./dist
USER 1000
EXPOSE 3000

# Lightweight healthcheck endpoint — no DB call, just process liveness
HEALTHCHECK --interval=10s --timeout=3s --start-period=15s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', r => { \
    process.exit(r.statusCode === 200 ? 0 : 1) })" || exit 1

CMD ["node", "dist/server.js"]
YAML — Kubernetes deployment with startupProbe and readinessProbe
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: my-app
          image: my-registry/my-app:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "1000m"
              memory: "512Mi"
          # startupProbe: allows up to 60s for the container to start
          # before the readinessProbe takes over
          startupProbe:
            httpGet:
              path: /health
              port: 3000
            failureThreshold: 30   # 30 * 2s = 60s max startup window
            periodSeconds: 2
          # readinessProbe: gates traffic to this pod
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 0
            periodSeconds: 5
            failureThreshold: 3
          # livenessProbe: restarts unhealthy pods
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 0
            periodSeconds: 10
            failureThreshold: 3

The /health endpoint in your application should respond to GET with HTTP 200 in under 5ms. Do not query the database in a healthcheck — a slow database should trigger alerting, not cause Kubernetes to cycle your pods and spike TTFB for healthy replicas during the restart window.

Step 4: Put a reverse proxy in front for HTTP/2, buffering, and keepalive

Containerized application servers — Express, Fastify, Gin, Actix — are built for performance but are not optimized for the tasks a reverse proxy handles well: TLS termination, HTTP/2 and HTTP/3 negotiation, slow-client buffering, static asset serving, Brotli/gzip compression, and connection keepalive to upstream. Delegating those responsibilities to Nginx, Traefik, or Caddy reduces TTFB and frees application threads for actual business logic.

Nginx is the most widely deployed and best-documented option. Traefik integrates natively with Docker and Kubernetes and auto-discovers services via labels or CRDs. Caddy provides automatic HTTPS (including HTTP/3 via QUIC) with zero configuration and is a strong choice for teams who want a working TLS reverse proxy in under 10 lines of config.

nginx.conf — HTTP/2, upstream keepalive, gzip
upstream app {
    server app:3000;
    # Keep 32 connections open to the upstream container
    keepalive 32;
}

server {
    listen 443 ssl;
    http2 on;
    server_name example.com;

    ssl_certificate     /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Brotli preferred, gzip fallback
    brotli on;
    brotli_comp_level 4;
    brotli_types text/plain text/css application/json application/javascript;
    gzip on;
    gzip_types text/plain text/css application/json application/javascript;
    gzip_comp_level 4;

    location / {
        proxy_pass         http://app;
        proxy_http_version 1.1;
        proxy_set_header   Connection "";         # Enable upstream keepalive
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;
        proxy_read_timeout 30s;
        proxy_buffering    on;
        proxy_buffer_size  16k;
        proxy_buffers      8 16k;
    }
}
Caddyfile — automatic HTTPS + HTTP/3 (simplest option)
example.com {
    # Caddy provisions TLS automatically via ACME
    # HTTP/3 (QUIC) is enabled by default in Caddy 2.7+
    reverse_proxy app:3000 {
        # Keep connections open to the upstream
        transport http {
            keepalive 30s
            keepalive_idle_conns 32
        }
        health_uri   /health
        health_interval 10s
    }
    encode gzip zstd
}
docker-compose.yml — Nginx + app with bridge network
version: "3.9"
services:
  nginx:
    image: nginx:1.27-alpine
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
      - ./ssl:/etc/nginx/ssl:ro
    depends_on:
      app:
        condition: service_healthy

  app:
    image: my-registry/my-app:latest
    expose:
      - "3000"
    environment:
      NODE_ENV: production
    healthcheck:
      test: ["CMD", "node", "-e",
             "require('http').get('http://localhost:3000/health',
             r=>{process.exit(r.statusCode===200?0:1)})"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 15s

The depends_on with condition: service_healthy ensures Nginx only starts routing after the application container has passed its healthcheck. This prevents the "502 Bad Gateway" window that users see when a deployment races the container startup.

Step 5: Tune resource limits, networking, and cold-start mitigation on serverless platforms

The final layer of TTFB optimization addresses the environment your containers run in: resource allocation, network topology, JIT warm-up, sticky sessions, and cold-start configuration on Cloud Run, ECS Fargate, and other managed container platforms.

Resource requests and limits. Setting resources.requests in Kubernetes guarantees the scheduler places your pod on a node with available CPU before the pod becomes ready. Without a CPU request, the CFS (Completely Fair Scheduler) places your container in a best-effort class. Under burst load, CPU is stolen by other containers and server processing time increases. Set CPU requests to match your p95 steady-state usage; set limits to your p99 peak.

Host vs. bridge networking. Docker's bridge network adds a virtual Ethernet interface and NAT for each container. For services where TTFB is dominated by network processing (not application logic), hostNetwork: true in Kubernetes or --network=host in Docker reduces this overhead. The trade-off is loss of network isolation and potential port conflicts, so this is appropriate for infrastructure-tier services (proxies, load balancers) rather than application containers.

JIT runtime warm-up. Node.js V8 and JVM runtimes perform significantly better after their JIT compilers have seen hot code paths. On Kubernetes, you can pre-warm a new pod by sending synthetic traffic through a Job or init-container before the pod joins the service endpoint. On Cloud Run, enable CPU always-allocated mode so the runtime remains warm between requests.

YAML — Kubernetes warm-up init container pattern
initContainers:
  - name: warmup
    image: curlimages/curl:8.7.1
    command:
      - sh
      - -c
      - |
        # Wait for the app to pass its healthcheck,
        # then fire 50 warm-up requests to prime V8 JIT
        until curl -sf http://localhost:3000/health; do sleep 1; done
        for i in $(seq 1 50); do
          curl -sf http://localhost:3000/api/v1/status > /dev/null
        done
    # Share the process namespace so curl can reach localhost:3000
    # Requires shareProcessNamespace: true on the pod spec

Cold starts on Cloud Run. Cloud Run scales to zero by default. Set --min-instances=1 to keep at least one instance warm. Combine this with --concurrency matching your server's thread pool size so a single instance absorbs short traffic bursts before a scale-out is triggered.

Shell — Cloud Run deployment with min-instances and region selection
gcloud run deploy my-app \
  --image=my-registry/my-app:latest \
  --region=us-central1 \          # Choose the region closest to your users
  --min-instances=1 \             # Never scale to zero
  --max-instances=20 \
  --concurrency=80 \              # Requests per instance before scale-out
  --cpu=1 \
  --memory=512Mi \
  --cpu-boost \                   # Give extra CPU during cold start
  --port=3000 \
  --set-env-vars="NODE_ENV=production"

Cold starts on ECS Fargate. Set your ECS service's desiredCount to at least 1 and configure Application Auto Scaling with a MinCapacity of 1. Enable ECS Service Connect or AWS App Mesh for service-to-service traffic to avoid DNS resolution overhead on each request.

JSON — ECS service auto-scaling with minimum capacity
{
  "ServiceName": "my-app",
  "DesiredCount": 2,
  "DeploymentConfiguration": {
    "MinimumHealthyPercent": 100,
    "MaximumPercent": 200
  },
  "AutoScalingPolicy": {
    "MinCapacity": 1,
    "MaxCapacity": 20,
    "TargetTrackingScalingPolicy": {
      "TargetValue": 60.0,
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    }
  },
  "HealthCheckGracePeriodSeconds": 30
}

Sticky sessions. When your application maintains in-memory state (WebSocket connections, server-side sessions), routing different requests from the same user to different replicas forces repeated cold-path initialization. Enable sticky sessions via an Application Load Balancer (ALB) target group's stickiness configuration or via Kubernetes's sessionAffinity: ClientIP. Note that sticky sessions reduce autoscaling effectiveness — they are most appropriate when the statefulness cannot be externalised to Redis or similar.

Region selection. For TTFB, geography is physics. A container running in us-east-1 will always respond faster to a browser in New York than one in ap-southeast-1, regardless of optimization. Use CloudFront, Cloud CDN, or Cloudflare in front of your containers for global traffic, with containers deployed in the region(s) where the majority of users are located. See the TTFB fix for Cloudflare Pages for CDN-layer TTFB considerations that complement container-layer fixes.

Verification

Measure TTFB at multiple points in the stack to isolate which layer is responsible:

  • curl TTFB measurement: Use curl -o /dev/null -s -w "%{time_starttransfer}\n" to measure TTFB from the host running curl to your container. Compare measurements from inside the cluster (a kubectl exec pod) versus outside to separate network latency from server processing time.
  • WebPageTest: Run a test from the WebPageTest public instance in the region closest to your users. The Waterfall view shows TTFB as the time from request sent to the first byte of response. Run three consecutive tests and compare median to identify cache-miss cold starts.
  • Kubernetes pod startup time: Run kubectl describe pod <pod-name> and look at the "Started" and "Ready" timestamps. The gap is your initialization time. If it exceeds 10 seconds, investigate what the application is doing during startup.
  • Container image size audit: Run docker image inspect my-app:latest --format '{{.Size}}' and docker history my-app:latest to identify large layers. The dive CLI tool provides an interactive layer-by-layer breakdown.
Shell — measure TTFB with curl
# Measure TTFB (time_starttransfer) with a cold TCP connection
curl -o /dev/null -s \
  -w "dns:%{time_namelookup}s  tcp:%{time_connect}s  tls:%{time_appconnect}s  ttfb:%{time_starttransfer}s  total:%{time_total}s\n" \
  https://example.com/

# Run 10 times and extract median TTFB
for i in $(seq 1 10); do
  curl -o /dev/null -s \
    -w "%{time_starttransfer}\n" \
    https://example.com/
done | sort -n | awk 'NR==5{print "Median TTFB: " $1 "s"}'

Quick checklist

  • Base image is node:20-slim, node:20-alpine, or distroless
  • Dockerfile uses multi-stage build; devDependencies excluded from production stage
  • COPY package.json package-lock.json appears before COPY src
  • BuildKit cache mounts enabled for package manager directories
  • Kubernetes startupProbe configured with appropriate failureThreshold
  • /health endpoint responds in under 5ms with no database call
  • Nginx, Traefik, or Caddy terminates TLS and HTTP/2 in front of application container
  • CPU and memory requests set in Kubernetes deployment spec
  • Cloud Run: --min-instances=1 set; ECS: MinCapacity is at least 1
  • Containers deployed in the region geographically nearest to primary user base

Common pitfalls

  • Healthcheck endpoint calls the database. This couples TTFB routing decisions to database health. A slow query in the healthcheck causes Kubernetes to mark a healthy pod as unready and route traffic to the remaining replicas, increasing their load and making the problem cascade. Keep /health strictly process-level: check that the event loop is responsive, not that downstream services are.
  • Setting memory limits too low for the JIT compiler. Node.js V8's JIT compiler requires memory headroom above the application heap. Setting a memory limit at or below the RSS the application uses at startup leaves no room for JIT-compiled code to be cached, causing V8 to fall back to interpreted execution and tripling CPU time per request.
  • Using CMD ["npm", "start"] instead of CMD ["node", "dist/server.js"]. Launching Node.js through npm spawns two processes: the npm shell wrapper and Node.js. The wrapper is a zombie-signal-forwarder that does not propagate SIGTERM correctly, which breaks graceful shutdown and causes Kubernetes to wait the full terminationGracePeriodSeconds before SIGKILL, increasing rollout time.
  • Forgetting proxy_http_version 1.1 and proxy_set_header Connection "" in Nginx. Without these two directives, Nginx uses HTTP/1.0 to talk to upstream, which does not support keepalive. Every proxied request opens a new TCP connection to the application container, adding 20-100ms of TCP handshake overhead per request.
  • Enabling sticky sessions without a session-store timeout. Sticky sessions concentrate traffic on a small subset of replicas when user sessions are long-lived. Autoscaling responds to aggregate CPU across all replicas, so a few hot replicas can saturate while others are idle. If your application needs stickiness, set aggressive session expiry and test the scaling behavior under sustained load.

Frequently asked questions

Container overhead is the most common culprit when application logic is not the bottleneck. Fat base images increase cold-start time, missing startup probes send traffic to containers that are still JIT-compiling, bridge networking adds kernel namespace overhead compared to host networking, and absent resource limits cause CPU throttling under burst load. Fixing the container configuration often cuts TTFB by 200-600ms without changing a single line of application code. Use the complete TTFB guide to understand which layer to instrument first.

For the smallest runtime footprint, gcr.io/distroless/nodejs20-debian12 is the leanest option at around 65MB compressed, but it has no shell, which complicates debugging. node:20-slim (around 80MB) is the best practical default: it includes only the OS packages Node requires, has a shell for exec-based debugging, and starts measurably faster than the full node:20 image (around 350MB). node:20-alpine is smaller still but musl libc can cause runtime crashes with native add-ons compiled against glibc.

The startupProbe runs only at container startup and gates when Kubernetes begins calling the readinessProbe. Setting a generous failureThreshold (e.g., 30) with a short periodSeconds (e.g., 2s) means the scheduler waits up to 60 seconds for the container to pass, but checks every 2 seconds and adds it to the load balancer the moment it responds. Without a startupProbe, Kubernetes uses the readinessProbe alone from process start, which often produces either premature traffic routing or excessively long waits — both show up as TTFB spikes in real-user monitoring.

Yes, in several ways. Nginx terminates HTTP/2 and HTTP/3, so the browser gets multiplexed streams even if your app only speaks HTTP/1.1. Nginx buffers slow POST bodies from clients before forwarding to upstream, preventing application threads from blocking on slow uploads. Nginx can serve gzip and Brotli compressed responses without increasing application CPU usage. And Nginx keepalive to upstream reduces TCP handshake overhead on repeated requests from the same browser. The net effect on p95 TTFB is typically 40-150ms depending on connection patterns. For a static-asset perspective, see the server response TTFB fix.

The most direct mitigation is minimum instance configuration: set --min-instances=1 on Cloud Run or configure ECS service MinCapacity to never scale to zero. Combine this with a lightweight HTTP healthcheck endpoint that responds in under 5ms so the container passes readiness quickly after a scale-out event. On Cloud Run specifically, enable --cpu-boost to allocate extra CPU during container startup, and set --concurrency to match your application's thread count. If you need serverless-scale-to-zero economics, consider pre-warming via a scheduled Cloud Scheduler job that pings your service every 5 minutes during expected traffic windows. For edge-layer TTFB reduction that complements container warm-up, see the edge functions TTFB fix.

Continue learning