Web QoE Scoring Model¶

Note

This model is currently in beta. We welcome feedback on the scoring methodology and curve parameters, and we are currently evaluating it in practical situations.

The Web QoE Score is an overall Quality of Experience score for web page loads, expressed on a 0–100 scale. It combines multiple web performance metrics into a single value using Lighthouse-style log-normal scoring curves. The score is provided as the statistic value web_qoe_score.

Background¶

The scoring methodology is similar to Google's Lighthouse performance scoring, but slightly adapted from it. In that model, which Google continuously updates, each input metric is scored individually against a log-normal cumulative distribution function, then the per-metric scores are combined as a weighted average. The key parameters for each curve are:

p10: the metric value that produces a score of approximately 90 (the "good" threshold)
median: the metric value that produces a score of approximately 50

Google provides a Lighthouse scoring calculator online. An graphing function was made available from Google for exploring the log-normal curve behavior.

Our model uses the following web performance metrics as inputs.

FCP – First Contentful Paint: when the first content appears on screen.
LCP – Largest Contentful Paint: when the page's main content has likely loaded.
CLS – Cumulative Layout Shift: visual stability during loading.
INP – Interaction to Next Paint: responsiveness to user interactions.
TTFB – Time to First Byte: server response time (also called Server Response Time).

Notably, we do not use Total Blocking Time (TBT) as an input metric, as it is not directly measurable in all environments. Similarly, the Speed Index (SI) is not used, as it is difficult to measure.

Metric Sets¶

In our case, the model has been adapted to work with different sets of input metrics, depending on what data is available from the platform. The full set includes FCP, LCP, CLS, INP, and TTFB; but if TTFB is unavailable, the model can still produce a score using just FCP, LCP, CLS, and INP. If only FCP and LCP are available, a separate 2-metric scoring method is used with curves aligned to Web Vitals thresholds.

Therefore, the scoring function accepts exactly one of three predefined metric combinations; any other combination produces no score.

Full (5 metrics): FCP + LCP + CLS + INP + TTFB — uses Lighthouse v10 curves and weights.
4-metric: FCP + LCP + CLS + INP (no TTFB) — same Lighthouse curves; TTFB's weight is redistributed proportionally among the remaining four metrics.
2-metric: FCP + LCP only — uses Web Vitals threshold-aligned curves with equal weights (see below).

Scoring Curves¶

Full and 4-metric sets¶

The scoring curves for the full and 4-metric sets are as follows:

Metric	PC median	PC p10	Mobile median	Mobile p10	Weight
FCP	1600 ms	934 ms	3000 ms	1800 ms	0.10
LCP	2400 ms	1200 ms	4000 ms	2500 ms	0.25
CLS	0.25	0.1	0.25	0.1	0.25
INP	500 ms	200 ms	500 ms	200 ms	0.25
TTFB	1800 ms	800 ms	1800 ms	800 ms	0.15

When TTFB is unavailable (4-metric set), its weight of 0.15 is redistributed proportionally among FCP, LCP, CLS, and INP. The resulting effective weights are approximately FCP 0.12, LCP 0.29, CLS 0.29, INP 0.29.

Generally, mobile device types use more lenient FCP/LCP curves than PC, reflecting the typically slower rendering on mobile devices.

Two-metric scoring¶

When only FCP and LCP are available, the Lighthouse curves are too harsh to use directly. This is because they were calibrated for a multi-metric set where CLS, INP, and TTFB — which tend to score high at their "good" thresholds — compensate for the steeper FCP/LCP curves. Using the Lighthouse FCP/LCP curves alone would give a score of only ~45 for a page that meets all Web Vitals "good" thresholds.

However, in some cases, only FCP and LCP are available, e.g. when interactivity cannot be simulated in automated measurement scenarios. So instead, the 2-metric set uses curves aligned to the Web Vitals thresholds directly, with equal weights:

Metric	median (= "poor" threshold)	p10 (= "good" threshold)	Weight
FCP	3000 ms	1800 ms	0.50
LCP	4000 ms	2500 ms	0.50

These curves are the same for PC and mobile. This produces scores that align with the standard quality tiers: approximately 90 at the "good" boundary, 50 at "needs improvement".

Note

The 2-metric score only reflects loading performance (FCP + LCP). It cannot capture interactivity or visual stability issues. A page with fast paint times but terrible responsiveness will score well in the 2-metric set.

Example Scores¶

The following table shows example scores for the PC device type across representative performance scenarios.

Scenario	Full (5)	4-metric	2-metric
Excellent	99	98	100
Good (WV thresh)	74	71	90
Needs improvement	37	35	50
Poor	13	11	12

The "Good (WV thresh)" row uses the Web Vitals "good" threshold for each metric (FCP=1.8 s, LCP=2.5 s, CLS=0.1, INP=0.2 s, TTFB=0.8 s).

Scenario	FCP	LCP	CLS	INP	TTFB	Full	4-met	2-met
Fast CDN-served landing page	0.6s	0.9s	0	0.03s	0.15s	99	99	100
Well-optimized news site	1.2s	2.0s	0.05	0.12s	0.4s	87	85	98
Typical corporate website	2.0s	3.0s	0.12	0.25s	0.9s	66	63	81
Heavy SPA (React/Angular)	2.8s	3.5s	0.03	0.40s	0.6s	62	56	61
Ad-heavy media site	3.5s	5.0s	0.35	0.80s	1.2s	28	20	31
Slow shared hosting blog	4.5s	7.0s	0.20	0.60s	2.5s	31	31	11
Overloaded e-commerce site	5.5s	8.0s	0.45	1.20s	3.5s	10	10	5
Broken/failing site	12s	18s	1.50	3.0s	6.0s	1	0	0

Here's a comparison of the scores for PC vs. mobile device types across the same scenarios, using the full and 2-metric sets.

Scenario	PC Full	Mobile Full	PC 2-met	Mobile 2-met
Well-optimized news site	87	98	98	98
Typical corporate site	66	83	81	81
Heavy SPA	62	77	61	61
Slow shared hosting	31	33	11	11