Anomalies¶
The Anomalies report surfaces automated anomaly detection results directly in the Surfmeter Dashboard. The system continuously compares recent measurements against learned baselines and flags episodes of degradation, so you can spot service issues or probe problems without manually sifting through charts.
You can reach the Anomalies report from Reports > Anomalies in the sidebar.
Note
This entry only appears when anomaly detection is enabled for your organization. Ask us if you're interested in trying it out!
How anomaly detection works¶
A background process runs every 30 minutes and, for each monitored service, computes statistical baselines from a rolling window of recent measurements (default: 7 days). It then checks whether current values deviate significantly from those baselines. Each probe builds its own baseline per service and hostname, so a probe on a slow connection has a slow baseline and is only flagged when it degrades relative to its own history.
Detection uses two complementary passes:
- Spike detection looks at a short window (roughly one hour, adapted to the probe's measurement cadence) to catch sudden, acute degradation.
- Shift detection looks at a longer window (6 hours) to catch gradual, sustained drift.
A third pass checks for data quality issues, values outside physically plausible bounds (for example, a MOS score above 5 or a negative latency), which indicate measurement bugs rather than service degradation.
When the system detects an anomaly, it creates an episode: a single event that tracks the issue from start to resolution. If the same anomaly re-fires in subsequent evaluation cycles, the existing episode is extended rather than creating duplicate events. Episodes are automatically resolved when the anomaly stops re-firing.
Anomaly list¶
The list page shows all detected anomaly episodes within the selected time range.
Time range and interval¶
Use the time range picker and interval selector at the top to adjust the reporting period. The list shows all episodes that overlap the selected window, including long-running episodes that started before the window but are still active within it.
Filters¶
Filter controls let you narrow down which anomaly events are displayed. Available filters fall into several categories.
Status and severity:
- Status – Show only active (
new) or closed (resolved) episodes, or both. Defaults tonewso you see currently active issues first. - Severity – Filter by
warningorcritical. Critical events are those that have persisted long enough to escalate or that showed an extreme initial deviation.
Measurement scope:
- Measurement Type – Filter by type (Video, Web, Network, Speedtest, Conferencing).
- Subject – Filter by service subject (e.g., "netflix", "youtube"). For network measurements, this represents the technology.
- Domain – Filter by target domain.
- Hostname – Filter by target hostname.
Client and location:
- Client Label – Filter by client device name.
- Client Group – Filter by client group.
- Client Tag – Filter by client tag.
- ISP – Filter by Internet Service Provider name.
- ASN – Filter by Autonomous System Number.
- Country – Filter by country.
- City – Filter by city.
Detection details:
- Source – Whether the anomaly was detected on a per-probe basis or across the fleet of end users (Player SDK).
- Detection Pass – Spike, shift, or data quality.
- Detection Method – The statistical method that triggered the anomaly (e.g., MAD, IQR, percentile, proportion, range check).
- Detection Type – Whether the anomaly is per-probe (intra-client), fleet-wide, or a data quality issue.
- Diagnostic Confidence – High, medium, or low, reflecting how much the signal can be trusted based on sample size and fleet coverage.
- Deviation Direction – Whether the value went above or below the baseline (or above/below the physical bounds for data quality events).
- Statistic Name – The specific metric that triggered the anomaly (e.g.,
p1203_overall_mos,download).
You can hover over the filter badges to see explanations of what each value means and how it relates to the detection process.
Note
Client and location filters (ISP, country, city) only match per-probe anomalies. Fleet-wide (Player SDK) anomalies are aggregated across a whole domain and do not carry these fields.
Timeline chart¶
A stacked bar chart shows the number of new anomaly episodes over time, broken down by a grouping dimension you can choose from a dropdown. For example, group by subject to see which services generated the most anomalies, or by severity to see the ratio of warnings to critical events.
The chart buckets episodes by their start time (first_seen_at), so a long-running episode contributes one bar at its onset rather than appearing in every interval.
Events table¶
Below the chart, a paginated table lists individual anomaly episodes. By default, episodes are sorted by last activity (most recently active first). Each row shows:
- Status and severity badges
- When the episode started and was last seen (or resolved)
- How many times the anomaly re-fired (occurrence count)
- The measurement type, service subject or domain, and client
- The affected metric, deviation direction, and metric vs. baseline values
- A brief explanation of the anomaly
Click any row to open the detail page for that episode.
Resolving anomalies¶
You can manually mark anomaly episodes as resolved directly from the list. Select one or more episodes using the row checkboxes, then click the Resolve selected button in the toolbar above the table. A confirmation dialog asks you to confirm the action. Once resolved, the episodes move out of the default "new" filter view.
This is useful for known false positives, expected maintenance windows, or other cases where the automatic resolution sweep has not yet closed the episode.
Note
Only users with the admin, editor, or organization admin role can resolve anomalies.
Anomaly detail page¶
Clicking an anomaly row in the list opens its detail page, which provides the full context for the episode.
Info card¶
The top section shows:
- Severity and status badges
- The measurement type, subject, and hostname/domain
- Timeline: when the episode started, when it was last seen or resolved, and how many times it re-fired
- Detection metadata: detection type, pass, method, source, diagnostic confidence, and sample count. Hover over each badge for a brief explanation of what the value means.
Origin card¶
Shows the client that triggered the anomaly (linked to the client's detail page), along with ISP and location information when available. Fleet-wide anomalies display a "Fleet-wide" badge instead of a client link.
Metric vs. baseline card¶
A visual comparison of three key values:
- Metric value – The measured value that triggered the anomaly, highlighted in red with a directional arrow.
- Baseline value – The expected value learned from recent history. Hover for an explanation of what the baseline represents for this detection type.
- Threshold – The cutoff that was crossed. For per-probe and fleet anomalies, this is the baseline plus a multiple of the recent spread. For data quality anomalies, it is an absolute physical bound.
KPI context chart¶
A time-series chart of the affected metric, centered on the episode window. The chart shows the same metric for the same client and service, spanning from before the episode started to after it was last seen. This lets you see the degradation in context: the normal behavior before the episode, the anomaly itself, and the recovery (if any).
Affected measurements table¶
Below the chart, a table lists the individual measurements that fell within the episode window and scope. This mirrors what you would see in the Measurements Explorer if you filtered to the same client, service, and time range.
Actions¶
From the detail page you can:
- View in Explorer – Jump to the Measurements Explorer pre-filtered to the same client, service, and time window.
- Resolve – Mark this single episode as resolved (same confirmation flow as the list page).
- Copy JSON – Copy the raw anomaly event data to your clipboard for further analysis.
Severity levels¶
Anomalies are classified as either warning or critical:
- Warning is the initial severity for any anomaly that crosses the detection threshold.
- Critical means the anomaly has either shown an extreme initial deviation or has persisted across multiple evaluation cycles (roughly 2.5 hours for spike events, 3 hours for shift events), indicating a sustained issue rather than a transient fluctuation. Once an episode reaches critical severity, it stays critical even if the deviation eases, preventing flapping between severity levels.
Episode lifecycle¶
Each anomaly is tracked as an episode with the following lifecycle:
- New – The episode is active. The anomaly has been detected and is still re-firing in subsequent evaluation cycles.
- Resolved – The episode is closed. This happens automatically when the anomaly stops re-firing for a configurable timeout (default: 60 minutes, i.e., two missed evaluation cycles). You can also resolve episodes manually from the list or detail page.