MOS Troubleshooting Guide¶
This guide provides systematic approaches to diagnosing and resolving video playout issues using Surfmeter's comprehensive KPI and KQI suite. The guidance is tailored for different stakeholder perspectives, i.e. whether you are an ISP or OTT provider.
Primary Diagnostic Flow¶
When troubleshooting low MOS values, follow this systematic approach:
-
Assess Overall MOS Severity:
- MOS ≥ 4.0: Excellent performance, minor optimization opportunities
- MOS 3.0-4.0: Good but investigate potential improvements
- MOS 2.0-3.0: Noticeable issues requiring attention
- MOS < 2.0: Critical issues requiring immediate investigation
-
Identify Root Cause Category:
- Stalling Issues: Check
p1203_stalling_quality
(O.23) – if < 4, stalling is primary concern - Quality Issues: Check
p1203_overall_audiovisual_quality
(O.35) – if < 4, encoding/bitrate selection is primary concern - Network Issues: Check
video_response_time
,content_server_hostname
, and performance metrics from related network tests (ICMP Ping, Traceroute, etc.)
- Stalling Issues: Check
-
Apply Stakeholder-Specific Analysis (see sections below)
ISP/Network Operator Perspective¶
As an ISP, your primary goal is to identify network bottlenecks, optimize CDN placement, and ensure adequate bandwidth provisioning. You have little control over the player behavior, but you can optimize your network to improve QoE for affected services.
Stalling-Related Issues (Low O.23)¶
Key Metrics to Examine:
initial_loading_delay
: High values (>3–5s) indicate first-mile connectivity issues. These values are service-dependent, so you should check the average value for your service.total_stalling_time
andnumber_of_stalling_events
: Quantify network-induced interruptions. There should be no stalling events at all; if they occur, these hint at severe network or player issues.average_buffer_length
andmin_buffer_length
: Low values suggest insufficient bandwidth. You can inspect thePerformanceClientReport
'sbufferTrace
property to see the buffer levels over time.video_response_time
: High values indicate server or routing issues.content_server_ip_address
andcontent_server_as
: Identify problematic CDN nodes via their IP address and ASN.
Actionable Insights:
- High Initial Loading Delay:
- Check DNS resolution times in your network
- Investigate first-hop latency and congestion
- Consider CDN cache warming for popular content
- Frequent Stalling with Low Buffer:
- Analyze bandwidth utilization during peak hours
- Check for congestion at peering points
- Evaluate QoS policies for video traffic
- Server Response Issues:
- Investigate routing to specific ASNs (
content_server_as
) - Check peering agreements with content providers
- Consider local CDN deployment negotiations
- Investigate routing to specific ASNs (
Quality-Related Issues (Low O.35)¶
Low video quality is typically a symptom of insufficient bandwidth or inadequate bitrate selection by the OTT provider.
Key Metrics to Examine:
average_video_bitrate
vs. available bandwidth: If the bitrate is significantly below the available bandwidth, this is a sign of invalid player configuration.largest_played_video_size
vs.initial_resolution
: Indicates adaptive quality behavior. If the largest played video size is significantly higher than the initial resolution, the player may be too conservative in bitrate selection, and might start with a lower quality than necessary. On the other hand, starting with a high quality may cause longer initial loading times.quality_switch_down_count
vs.quality_switch_up_count
: Shows adaptation aggressiveness. If the number of quality switches is higher than 1 or 2, this could be visible to the users and cause a bad experience. Ultimately, the player should be able to switch to a higher quality when the network conditions allow it, and keep it there. Make sure the bandwidth is not fluctuating too much.
Actionable Insights:
- Low Bitrate Selection Despite Adequate Bandwidth:
- Conservative player behavior may be responding to past network issues
- Investigate packet loss and jitter patterns
- Consider traffic shaping policies that might affect video streams
- Excessive Quality Switching:
- Network instability causing ABR oscillation
- Check for Wi-Fi interference or cellular handover issues
- Analyze quality switches patterns with
PerformanceClientReport
buffer traces (is the buffer level going down too often?)
Advanced Network Analysis¶
Correlation Analysis:
- Cross-reference
content_server_hostname
with performance metrics such as ping times to identify problematic CDN nodes - Analyze performance by time of day or day of the week to identify capacity constraints
- Compare metrics across different customer segments or access products (e.g., residential vs. business, fiber vs DSL, …)
OTT/Content Provider Perspective¶
As an OTT, your primary goal is to optimize player behavior, improve encoding efficiency, and enhance user experience. You have very little control over the network, but you can optimize your player and encoding parameters to improve QoE.
Stalling-Related Issues (Low O.23)¶
Key Metrics to Examine:
average_buffer_length
and buffer trace fromPerformanceClientReport
: the buffer level should be constantly high after an initial fill-up time. If it is not, this is a sign of insufficient bandwidth or buffer management issues.initial_loading_delay
vs.initial_resolution
: Startup strategy effectiveness. If you are starting with low quality, you should have a low initial loading delay, but the quality should be high enough to not cause visible degradations. If you want to start with high quality, you should expect a higher initial loading delay, but on average it should be lower than 3 seconds.p1203_max_mos_ratio
: Shows potential vs. actual performance: We measure how the quality would have been if the network conditions were optimal. If this ratio is significantly lower than 1, this is a sign of insufficient ABR selections.
Actionable Insights:
- High Initial Loading Delay:
- Consider starting with lower quality for faster startup
- Pre-load content in your application before rendering the video player
- Buffer Management Issues:
- Analyze
bufferTrace
patterns to optimize target buffer levels - Adjust ABR algorithm's buffer-based decisions
- Implement more aggressive prefetching for predicted content
- Analyze
Player Optimization (Quality Issues)¶
Key Metrics to Examine:
number_of_quality_switches
and switch ratios: If the number of quality switches is higher than 1 or 2, this could be visible to the users and cause a bad experience. Ultimately, the player should be able to switch to a higher quality when the network conditions allow it, and keep it there. Make sure the ABR logic is not switching too much.initial_resolution
vs.largest_played_video_size
: Stability vs. quality trade-off. If the initial resolution is significantly lower than the largest played video size, this is typically a sign of insufficient bandwidth or poor bitrate selection.average_video_bitrate
vs.p1203_max_theoretical_mos
: Efficiency analysis. If the average video bitrate is significantly below the theoretical maximum bitrate, this is a sign of insufficient bandwidth.
Actionable Insights:
- Conservative Quality Selection:
- ABR algorithm may be too cautious. Make sure the ABR logic is not switching too much.
- Consider bandwidth estimation improvements. Ensure you have quality levels covering the entire bandwidth range.
- Implement quality ramping strategies for stable connections.
- Excessive Quality Switching:
- Reduce sensitivity to short-term bandwidth fluctuations.
- Implement hysteresis in quality switching decisions to avoid oscillations.
- Consider user preferences for stability vs. quality. There are cases where users prefer a lower quality to avoid stalling, and vice-versa. It depends on the content/what the users assume, the network conditions. For instance, short videos may be preferred to be played faster; VoD movies may be preferred to be played with a higher quality initially.
Encoding and Content Optimization¶
Key Metrics to Examine:
p1203_average_video_quality
(O.22) per codec and resolutiondropped_frames
: Indicates decoding complexity issues that affect your test hardware. Normally this is nothing to worry about in the context of Surfmeter measurements, as we infer video quality from the codec/bitrate/fps, and resolution.- Quality metrics correlation with content type and motion: The P.1203 Mode 0 model cannot incorporate aspects of content complexity, so it may not be able to predict the quality of varying content types and/or encoder optimizations. Hence, lower quality may be expected when bitrate is lower, even though the lower bitrate is just an artifact of lower content complexity.
We suggest using a proper full-reference objective quality metric to assess the quality of your encoding, or using a bitstream-based model like P.1204.3 (which has been shown to correlate well with subjective quality ratings).
Actionable Insights:
- Low Quality Scores Despite High Bitrates:
- Evaluate codec choice (H.264 vs. H.265 vs. AV1 vs. VP9)
Multi-Stakeholder Scenarios¶
Content Delivery Network (CDN) Performance¶
For ISPs: Focus on routing efficiency to CDN nodes.
For OTTs: Focus on CDN selection and failover strategies.
Key Metrics:
content_server_hostname
andcontent_server_as
distribution: Identify problematic CDN nodes via their IP address and ASN.video_response_time
by CDN node: Identify CDN nodes that are causing high latency.- Geographic performance patterns: Identify geographic performance patterns that are causing issues.
There are also joint analysis opportunities for player and network interaction:
- Correlate ABR behavior with network characteristics.
- Identify optimal quality ladders for specific network conditions.
- Develop network-aware streaming strategies for specific network types (e.g., satellite, mobile, …)
Tools and Methodologies¶
Statistical Analysis¶
- Use percentile analysis (P95, P99) rather than just averages for SLA definitions. Outliers can always happen; it's the consistent performance that matters.
- Implement cohort analysis by network type, geographic region, network access type, etc.
- Track quality trends over time to identify systematic issues.
Performance Benchmarking¶
- Compare
p1203_max_mos_ratio
across different services and conditions. - Establish baseline performance metrics for different content types.
- Keep measuring the same VoD content over time to identify systematic issues. Otherwise you might see MOS fluctuations based on the content alone, not the network.