Video Tutorial

Real User Monitoring (RUM) Setup: Collect Web Vitals in Production

Q: Can I send web-vitals data to Google Analytics 4?

Yes. Call window.gtag('event', metric.name, { value: Math.round(metric.value), metric_id: metric.id, metric_value: metric.value, metric_delta: metric.delta }) inside your sendToAnalytics function. You can then explore the data in the GA4 Explore section using custom dimensions. However, GA4 does not offer a native percentile calculation UI, so for p75 reporting you will need to export to BigQuery and run PERCENTILE_CONT queries there.

Q: How do I prevent duplicate metric reports from the same page view?

The web-vitals library generates a unique metric.id for each page session. Use this id as a deduplication key when inserting into your database. Insert with an ON CONFLICT DO NOTHING (PostgreSQL) or MERGE statement (BigQuery) keyed on metric_id and metric_name. The library may also call a callback multiple times as a metric updates -- for example, CLS accumulates shift values -- so always store the latest value for a given id rather than summing multiple reports.

April 21, 2026 15 min read 8 steps

Real User Monitoring (RUM) is the only way to know exactly what Core Web Vitals your visitors experience. Lab tools like Lighthouse run in a controlled environment with a fixed network throttle and a single device profile. They are invaluable for debugging, but they cannot represent the full variance of real-world conditions: the user on a mid-range Android phone in a rural area with a congested LTE connection, the visitor loading your page from a corporate network proxy, or the return visitor whose browser cache is cold after a week away. Only field data collected from actual page loads captures all of that.

This tutorial walks through every layer of a production RUM pipeline: instrumenting your JavaScript with the web-vitals library, delivering metric payloads reliably with the Beacon API, building a lightweight ingest endpoint, aggregating to the 75th percentile that Google uses for ranking, and visualizing results in a dashboard with alerting for regressions. By the end you will have a complete, self-hosted RUM stack that feeds directly into your team's observability workflow.

Step-by-step walkthrough

Install the web-vitals npm package

The web-vitals library is the official JavaScript implementation of Core Web Vitals measurement, maintained by Google Chrome engineers. It weighs approximately 2KB gzipped and measures LCP, CLS, INP, FCP, and TTFB using the same PerformanceObserver-based methodology as CrUX. Installing it ensures your RUM data uses identical thresholds and timing logic to the field data that feeds into Google Search ranking.

Add it as a production dependency -- not a dev dependency -- since it runs in the user's browser, not in your build pipeline. If you are using a module bundler like webpack, Rollup, or Vite, import only the callbacks you need to keep the bundle as small as possible. If your site does not use a bundler, load the ESM module directly from a CDN such as jsDelivr.

# Install as a production dependency
npm install web-vitals

# Or via CDN (no bundler needed)
# Add to your HTML:
# <script type="module">
#   import { onLCP, onCLS, onINP } from
#     'https://cdn.jsdelivr.net/npm/web-vitals@4/dist/web-vitals.js';
# </script>

# Verify the installed version
npm list web-vitals
# web-vitals@4.x.x

Note: Use web-vitals v4 or later. Version 3 introduced INP as a stable metric, and v4 adds extended attribution data helpful for debugging slow interactions.

Add callbacks for LCP, CLS, and INP

Each metric function accepts a callback that fires when the metric value is finalized. LCP fires when the page's largest contentful element has rendered and the user has not scrolled or interacted. CLS fires on page visibility change or unload, accumulating all layout shifts. INP fires on page visibility change or unload, reporting the longest interaction observed during the session.

The callback receives a Metric object with five key fields: name (the metric identifier), value (the raw measurement in milliseconds or unitless score), rating (good, needs-improvement, or poor), delta (the change since the last report -- useful if a callback fires multiple times), and id (a unique string per page session used for deduplication). Register all five metrics even if your dashboard initially focuses on the three Core Web Vitals, since FCP and TTFB provide valuable diagnostic context.

import { onLCP, onCLS, onINP, onFCP, onTTFB } from 'web-vitals';

// All five metrics with attribution for debugging
import {
  onLCP,
  onCLS,
  onINP,
  onFCP,
  onTTFB
} from 'web-vitals/attribution';

function handleMetric(metric) {
  console.log(metric.name, {
    value: metric.value,        // e.g. 2340 (ms for LCP)
    rating: metric.rating,     // 'good' | 'needs-improvement' | 'poor'
    delta: metric.delta,       // change since last callback invocation
    id: metric.id,             // unique per page session
    navigationType: metric.navigationType, // 'navigate' | 'reload' | 'back-forward'
    attribution: metric.attribution,       // debug info (element, url, etc.)
  });
  sendToRUM(metric);
}

onLCP(handleMetric);
onCLS(handleMetric);
onINP(handleMetric);
onFCP(handleMetric);
onTTFB(handleMetric);

Use sendBeacon for reliable delivery

Most Core Web Vitals metrics fire during or just before page unload -- specifically when the page transitions to a hidden visibility state. A standard fetch call during unload may be cancelled by the browser before it completes. The Beacon API (navigator.sendBeacon) was designed precisely for this scenario: it queues the request asynchronously and guarantees delivery even after the page is gone, without blocking the unload process.

Always check for Beacon API support and fall back to fetch with keepalive: true for browsers that lack support (primarily older iOS Safari versions). Batch multiple metrics into a single beacon if they arrive within a short window to reduce the number of network round trips from high-traffic pages. Keep your payload compact -- include only the fields you actually use in your aggregation pipeline.

function sendToRUM(metric) {
  const payload = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    delta: metric.delta,
    id: metric.id,
    navigationType: metric.navigationType,
    // Enrich with page context
    url: location.href,
    referrer: document.referrer,
    timestamp: Date.now(),
    // Connection info (when available)
    effectiveType: navigator.connection?.effectiveType ?? null,
    deviceMemory: navigator.deviceMemory ?? null,
  });

  const endpoint = '/api/rum';

  if (navigator.sendBeacon) {
    // Preferred: guaranteed delivery on page unload
    navigator.sendBeacon(endpoint, payload);
  } else {
    // Fallback: keepalive keeps request alive after navigation
    fetch(endpoint, {
      method: 'POST',
      body: payload,
      keepalive: true,
      headers: { 'Content-Type': 'application/json' },
    });
  }
}

Tip: navigator.sendBeacon has a payload limit (typically 64KB). For most RUM payloads this is not a concern, but avoid including full stack traces or large attribution objects in the beacon body.

Build a simple ingest API

The ingest endpoint is the server-side receiver for your RUM beacons. It needs to be fast (respond within a few milliseconds so the Beacon API call does not add perceived latency), reliable (should not drop data under high concurrency), and idempotent (safe to receive the same metric twice). Keep this endpoint minimal -- its only job is to validate, enrich, and write to storage. Heavy aggregation should happen asynchronously downstream.

Validate the incoming payload against an allowlist of known metric names (LCP, CLS, INP, FCP, TTFB) and expected value ranges before writing. Enrich with server-side context that the browser cannot provide accurately: the canonical page path (strip query strings and fragments for grouping), a normalized device category inferred from the User-Agent header, and a geographic region from a GeoIP lookup if your stack supports it.

// api/rum.js (Express or Vercel Edge Function)
const VALID_METRICS = new Set(['LCP', 'CLS', 'INP', 'FCP', 'TTFB']);
const MAX_VALUE = { LCP: 60000, CLS: 5, INP: 60000, FCP: 30000, TTFB: 30000 };

export default async function handler(req, res) {
  if (req.method !== 'POST') {
    return res.status(405).end();
  }

  let data;
  try {
    data = typeof req.body === 'string' ? JSON.parse(req.body) : req.body;
  } catch {
    return res.status(400).json({ error: 'Invalid JSON' });
  }

  const { name, value, rating, delta, id, url, timestamp } = data;

  // Validate
  if (!VALID_METRICS.has(name)) return res.status(400).end();
  if (typeof value !== 'number' || value < 0 || value > MAX_VALUE[name]) {
    return res.status(400).end();
  }

  // Normalize page path for grouping
  let pagePath = '/unknown';
  try {
    pagePath = new URL(url).pathname.replace(/\/$/, '') || '/';
  } catch { /* ignore invalid URLs */ }

  // Infer device category from User-Agent (simplified)
  const ua = req.headers['user-agent'] ?? '';
  const device = /mobile|android|iphone/i.test(ua) ? 'mobile' : 'desktop';

  // Write to your database (pseudo-code)
  await db.query(
    `INSERT INTO rum_events
       (metric_id, metric_name, value, rating, delta, page_path, device, recorded_at)
     VALUES ($1, $2, $3, $4, $5, $6, $7, to_timestamp($8 / 1000.0))
     ON CONFLICT (metric_id, metric_name) DO NOTHING`,
    [id, name, value, rating, delta, pagePath, device, timestamp]
  );

  // Respond 204 No Content -- no body needed
  res.status(204).end();
}

Aggregate to p75 by page and device

Raw RUM data is only useful once aggregated. A single slow page load does not indicate a systemic problem, but a consistent pattern across hundreds of sessions does. Google evaluates Core Web Vitals at the 75th percentile of all page loads for a URL-device combination, so your aggregation query must match this methodology to produce numbers that correlate with your CrUX field data and Search Console reports.

Run the aggregation over a rolling 28-day window to match the CrUX collection window. Group by metric name, page path, and device category at minimum. The query below works in PostgreSQL using the standard PERCENTILE_CONT function. For BigQuery, replace with APPROX_QUANTILES(value, 100)[OFFSET(75)]. Materialize this query into a summary table refreshed hourly so your dashboard does not run a full table scan on every page load.

-- Materialize as a view or scheduled job refreshed every hour
CREATE MATERIALIZED VIEW rum_p75_summary AS
SELECT
  metric_name,
  page_path,
  device,
  DATE_TRUNC('day', recorded_at)                    AS day,
  COUNT(*)                                           AS sample_count,
  PERCENTILE_CONT(0.75) WITHIN GROUP (
    ORDER BY value
  )                                                  AS p75_value,
  PERCENTILE_CONT(0.50) WITHIN GROUP (
    ORDER BY value
  )                                                  AS p50_value,
  ROUND(
    100.0 * COUNT(*) FILTER (WHERE rating = 'good')
    / NULLIF(COUNT(*), 0), 1
  )                                                  AS pct_good
FROM rum_events
WHERE recorded_at >= NOW() - INTERVAL '28 days'
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2, 3, 4;

-- Refresh on a schedule
REFRESH MATERIALIZED VIEW CONCURRENTLY rum_p75_summary;

Tip: Add a WHERE sample_count >= 100 filter in downstream queries to suppress unreliable p75 estimates from low-traffic pages. Report those pages at the section or template level instead.

Build a dashboard with time-series charts

A good RUM dashboard answers three questions at a glance: are we in the good, needs-improvement, or poor band for each Core Web Vital today? Are values trending better or worse over the past month? Which pages or segments are dragging down the overall score? Structure your dashboard around a summary view (one card per metric showing the current p75 and a sparkline), a detail view (a full time-series chart with threshold lines for each metric), and a table of the worst-performing pages.

Plot horizontal threshold reference lines at the good/needs-improvement boundaries for each metric: 2500ms for LCP good, 4000ms for LCP poor; 0.1 for CLS good, 0.25 for CLS poor; 200ms for INP good, 500ms for INP poor. Seeing the p75 trend line relative to these boundaries makes regression severity immediately obvious without requiring the viewer to remember thresholds.

import Chart from 'chart.js/auto';
import annotationPlugin from 'chartjs-plugin-annotation';
Chart.register(annotationPlugin);

const THRESHOLDS = {
  LCP: { good: 2500, poor: 4000, unit: 'ms' },
  CLS: { good: 0.1,  poor: 0.25, unit: '' },
  INP: { good: 200,  poor: 500,  unit: 'ms' },
};

function buildMetricChart(canvasId, metric, labels, data) {
  const { good, poor, unit } = THRESHOLDS[metric];
  new Chart(document.getElementById(canvasId), {
    type: 'line',
    data: {
      labels,
      datasets: [{
        label: `p75 ${metric}`,
        data,
        borderColor: '#22c55e',
        backgroundColor: 'rgba(34, 197, 94, 0.08)',
        tension: 0.3,
        pointRadius: 3,
      }],
    },
    options: {
      scales: {
        y: { beginAtZero: false },
        x: { type: 'time', time: { unit: 'day' } },
      },
      plugins: {
        annotation: {
          annotations: {
            goodLine: {
              type: 'line', yMin: good, yMax: good,
              borderColor: '#22c55e', borderDash: [4, 4],
              label: { content: `Good (${good}${unit})`, display: true },
            },
            poorLine: {
              type: 'line', yMin: poor, yMax: poor,
              borderColor: '#ef4444', borderDash: [4, 4],
              label: { content: `Poor (${poor}${unit})`, display: true },
            },
          },
        },
      },
    },
  });
}

Slice data by geography and browser

Once your baseline dashboard is working, dimensional slicing is where RUM earns its keep over synthetic monitoring. Add country and region data at ingest time using a GeoIP library such as maxmind/geoip2-node applied to the request IP address. Store the ISO country code and a coarse region bucket (AMER, EMEA, APAC) alongside each event row so you can filter without a join.

Common findings from dimensional analysis: LCP is significantly worse in APAC than AMER, indicating the primary image is not served from an edge cache close to those users; INP is worse on Android than iOS on the same page, pointing to a JavaScript execution bottleneck more pronounced on Android's V8 engine; CLS is worse on 3G connections because font files load late and trigger a layout shift on a slow connection that does not occur on fast broadband. These insights are invisible in aggregate p75 numbers but immediately obvious when you filter to individual segments.

-- Identify the worst-performing country/browser combinations for LCP
SELECT
  country_code,
  browser_family,
  COUNT(*)                                          AS sessions,
  ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (
    ORDER BY value
  ))                                                AS p75_lcp_ms,
  CASE
    WHEN PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) <= 2500 THEN 'good'
    WHEN PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) <= 4000 THEN 'needs-improvement'
    ELSE 'poor'
  END                                               AS rating
FROM rum_events
WHERE metric_name = 'LCP'
  AND recorded_at >= NOW() - INTERVAL '7 days'
  AND sample_count >= 50
GROUP BY 1, 2
HAVING COUNT(*) >= 50
ORDER BY p75_lcp_ms DESC
LIMIT 20;

Set up alerting on regressions

A dashboard you check manually is no substitute for automated alerting. Regressions frequently happen late on a Friday when a new release ships and no one is watching the dashboard. Configure threshold alerts that trigger when the 7-day rolling p75 for any Core Web Vital degrades by more than 10% compared to the previous 7-day window, or when the value crosses from the good band into needs-improvement. Route critical alerts (poor threshold crossed) to PagerDuty or Slack with enough context to start debugging immediately: which metric, which page, current p75 vs threshold, and a link to the relevant dashboard view.

Also configure a weekly digest that summarizes the top five most-improved and most-degraded pages by metric. This gives the engineering team a continuous feedback loop without alert fatigue from transient spikes. Most teams find that one weekly digest plus real-time alerts only for poor-threshold crossings strikes the right balance.

// Run this on a cron job (e.g., every 15 minutes)
async function checkForRegressions() {
  const THRESHOLDS = { LCP: 2500, CLS: 0.1, INP: 200 };
  const POOR_THRESHOLDS = { LCP: 4000, CLS: 0.25, INP: 500 };

  const recent = await db.query(`
    SELECT metric_name, page_path, device,
      PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY value) AS p75
    FROM rum_events
    WHERE recorded_at >= NOW() - INTERVAL '2 hours'
    GROUP BY 1, 2, 3
    HAVING COUNT(*) >= 20
  `);

  for (const row of recent.rows) {
    const poorThreshold = POOR_THRESHOLDS[row.metric_name];
    if (poorThreshold && row.p75 > poorThreshold) {
      await sendSlackAlert({
        text: [
          `*RUM Regression Detected*`,
          `Metric: ${row.metric_name}`,
          `Page: ${row.page_path} (${row.device})`,
          `Current p75: ${Math.round(row.p75)}`,
          `Poor threshold: ${poorThreshold}`,
          `<${DASHBOARD_URL}?metric=${row.metric_name}&page=${row.page_path}|View dashboard>`,
        ].join('\n'),
      });
    }
  }
}

async function sendSlackAlert(payload) {
  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    body: JSON.stringify(payload),
    headers: { 'Content-Type': 'application/json' },
  });
}

Tip: Add a minimum sample count gate (at least 20-50 sessions in the window) to each alert query. This prevents false-positive alerts triggered by a handful of outlier beacons during a low-traffic overnight window.

Pro tips

Sample your data on high-traffic sites. If your site receives more than 100,000 page views per day, full collection adds unnecessary database write pressure. Instrument only a random 10-20% of sessions using a deterministic hash of the session ID. Your p75 will be statistically identical to full collection at those volumes, and your storage costs drop by an order of magnitude.

Use the attribution build for debugging. Import from web-vitals/attribution instead of web-vitals to get structured debug data: the LCP element selector and URL, the INP event target and handler duration breakdown (input delay, processing time, presentation delay), and the CLS shift sources. Store this attribution data in a separate rum_attribution table so you can join it when investigating a specific regression without inflating your main events table.

Normalize page paths before storing. Strip authentication tokens, session IDs, and personal data from URLs at ingest time, not retrospectively. Apply your URL normalization rules consistently: remove query strings (or allowlist a known set), strip trailing slashes, and collapse paginated routes (/blog/page/2, /blog/page/3) into a single template path if you want to aggregate them. Inconsistent path normalization is the most common reason RUM dashboards show hundreds of one-session pages instead of meaningful URL groups.

Exclude bots and prerender traffic. Automated crawlers, headless browser tests, and Google's Googlebot can pollute your RUM data with values that are not representative of real users. Filter rows where the User-Agent matches known bot patterns or where the navigationType is prerender. The web-vitals library itself filters out non-visible page loads, but server-side filtering adds an extra layer of protection at the database level.

Cross-reference with CrUX regularly. Your RUM p75 values should track closely with CrUX data for the same URL. If they diverge significantly (more than 20%), investigate: you may have a sampling bias, a bot-traffic problem, or a difference in how your URL normalization maps to CrUX's URL grouping. The CWV Checker tool can pull CrUX data for comparison so you can validate your RUM pipeline against ground truth.

Common issues

Metrics firing multiple times for the same page session

The web-vitals library may invoke a callback more than once as a metric updates during the session. CLS in particular accumulates layout shifts incrementally and fires on every new shift window if you use the reportAllChanges option. The solution is to always store by metric.id with a unique constraint in your database and use ON CONFLICT DO UPDATE SET value = EXCLUDED.value (upsert) rather than a plain insert. This ensures the final settled value for each session is what you aggregate, not a sum of intermediate reports.

-- Create a unique constraint on (metric_id, metric_name)
ALTER TABLE rum_events
  ADD CONSTRAINT rum_events_id_name_unique
  UNIQUE (metric_id, metric_name);

-- Use upsert when inserting
INSERT INTO rum_events (metric_id, metric_name, value, rating, ...)
VALUES ($1, $2, $3, $4, ...)
ON CONFLICT (metric_id, metric_name)
DO UPDATE SET
  value   = EXCLUDED.value,
  rating  = EXCLUDED.rating,
  delta   = EXCLUDED.delta,
  updated_at = NOW();

Missing data from iOS Safari and in-app browsers

iOS Safari uses WebKit, which has historically lagged behind Chromium in PerformanceObserver API support. Specifically, LCP and the layout-shift entry type are not supported in older WebKit versions, meaning those users will not report those metrics at all. This creates a survivorship bias: your LCP p75 reflects only the users whose browsers can measure it, skewing the data toward desktop Chrome users. Mitigate this by storing the user agent with each event and filtering your dashboards to show Chromium-only data alongside all-browser data so you can see both perspectives. In-app browsers (Facebook, Instagram, TikTok WebViews) often strip PerformanceObserver access entirely -- treat those as no-op silently.

RUM values are significantly worse than Lighthouse lab scores

This is expected and normal. Lighthouse runs with a fixed 4G throttle (10 Mbps, 40ms RTT) and a mid-tier device emulation. Real users include people on 2G connections, old phones, and congested Wi-Fi. If your RUM p75 LCP is 3500ms but Lighthouse scores 1800ms, that does not mean Lighthouse is broken -- it means a substantial share of your real users experience conditions worse than Lighthouse's simulation. Use the dimensional slicing from step 7 to identify which segments are the worst offenders, then use Lighthouse with heavier throttling (Slow 4G or 3G settings) to replicate those conditions locally for debugging.

Content Security Policy blocking the ingest endpoint

If your site has a strict Content Security Policy (CSP) that uses connect-src 'self', calls to an external analytics domain from sendBeacon will be blocked and silently dropped. Verify your CSP allows connections to your RUM endpoint domain. If you use a third-party RUM service (Sentry, Datadog, Vercel Analytics), add their domain to connect-src. For self-hosted endpoints on the same origin, 'self' is sufficient. Check the browser console for Refused to connect errors if you suspect a CSP issue -- they are logged even when the beacon is queued silently.

Summary

Step	What you do	Key tool / pattern
1	Install library	npm install web-vitals
2	Register callbacks	onLCP, onCLS, onINP, onFCP, onTTFB
3	Deliver reliably	navigator.sendBeacon + fetch keepalive
4	Receive and store	POST /api/rum, validate + upsert
5	Aggregate to p75	PERCENTILE_CONT(0.75) by page + device
6	Visualize trends	Chart.js time-series + threshold lines
7	Slice by segment	GROUP BY country, browser, device
8	Alert on regression	Cron job + Slack webhook on poor crossing

Frequently asked questions

What is the difference between RUM and synthetic monitoring?

Real User Monitoring (RUM) collects metrics from actual visitors to your production site using instrumentation code like the web-vitals library. Synthetic monitoring runs scripted tests from a controlled environment on a fixed schedule, similar to Lighthouse CI or WebPageTest. RUM captures genuine variance across devices, networks, and geographies. Synthetic monitoring gives you reproducible lab conditions for debugging. Use both: RUM to know what users actually experience, and synthetic testing to catch regressions before they ship.

Why use the 75th percentile for Core Web Vitals?

Google assesses Core Web Vitals at the 75th percentile (p75) of all page loads for a URL. A URL passes a metric only if at least 75% of visits meet the good threshold. This means a single slow experience for one user does not fail the whole page, but a pattern of poor performance that affects a significant share of users does. Reporting at p75 in your own RUM dashboard ensures your monitoring aligns with how Google evaluates your site for ranking purposes.

How much traffic do I need before RUM data is statistically reliable?

As a rough guideline, aim for at least 1,000 page views per URL per metric before treating p75 values as stable. With fewer samples, individual outliers move the percentile significantly. For low-traffic pages, aggregate at the template or section level rather than individual URLs. CrUX itself requires a minimum traffic threshold before showing URL-level data, which is why many low-traffic pages only show origin-level scores.

Can I send web-vitals data to Google Analytics 4?

Yes. Call window.gtag('event', metric.name, { value: Math.round(metric.value), metric_id: metric.id, metric_value: metric.value, metric_delta: metric.delta }) inside your callback. You can then explore the data in the GA4 Explore section using custom dimensions. However, GA4 does not offer a native percentile calculation UI, so for p75 reporting you will need to export to BigQuery and run PERCENTILE_CONT queries there.

How do I prevent duplicate metric reports from the same page view?

The web-vitals library generates a unique metric.id for each page session. Use this id as a deduplication key when inserting into your database. Insert with an ON CONFLICT DO NOTHING (PostgreSQL) or MERGE statement (BigQuery) keyed on metric_id and metric_name. The library may call a callback multiple times as a metric updates -- always store the latest value for a given id rather than summing multiple reports.

Sara Kim

Observability Engineer at WebVitals.tools

Sara specializes in Real User Monitoring pipelines, distributed tracing, and web performance observability. She has built RUM infrastructure handling billions of metric events per month for large-scale SaaS and e-commerce platforms.