Skip to content

MOS Troubleshooting Guide

This guide provides systematic approaches to diagnosing and resolving video playout issues using Surfmeter's comprehensive KPI and KQI suite. The guidance is tailored for different stakeholder perspectives, i.e. whether you are an ISP or OTT provider.

Primary Diagnostic Flow

When troubleshooting low MOS values, follow this systematic approach:

  1. Assess Overall MOS Severity:

    • MOS ≥ 4.0: Excellent performance, minor optimization opportunities
    • MOS 3.0-4.0: Good but investigate potential improvements
    • MOS 2.0-3.0: Noticeable issues requiring attention
    • MOS < 2.0: Critical issues requiring immediate investigation
  2. Identify Root Cause Category:

    • Stalling Issues: Check p1203_stalling_quality (O.23) – if < 4, stalling is primary concern
    • Quality Issues: Check p1203_overall_audiovisual_quality (O.35) – if < 4, encoding/bitrate selection is primary concern
    • Network Issues: Check video_response_time, content_server_hostname, and performance metrics from related network tests (ICMP Ping, Traceroute, etc.)
  3. Apply Stakeholder-Specific Analysis (see sections below)

ISP/Network Operator Perspective

As an ISP, your primary goal is to identify network bottlenecks, optimize CDN placement, and ensure adequate bandwidth provisioning. You have little control over the player behavior, but you can optimize your network to improve QoE for affected services.

Key Metrics to Examine:

  • initial_loading_delay: High values (>3–5s) indicate first-mile connectivity issues. These values are service-dependent, so you should check the average value for your service.
  • total_stalling_time and number_of_stalling_events: Quantify network-induced interruptions. There should be no stalling events at all; if they occur, these hint at severe network or player issues.
  • average_buffer_length and min_buffer_length: Low values suggest insufficient bandwidth. You can inspect the PerformanceClientReport's bufferTrace property to see the buffer levels over time.
  • video_response_time: High values indicate server or routing issues.
  • content_server_ip_address and content_server_as: Identify problematic CDN nodes via their IP address and ASN.

Actionable Insights:

  • High Initial Loading Delay:
    • Check DNS resolution times in your network
    • Investigate first-hop latency and congestion
    • Consider CDN cache warming for popular content
  • Frequent Stalling with Low Buffer:
    • Analyze bandwidth utilization during peak hours
    • Check for congestion at peering points
    • Evaluate QoS policies for video traffic
  • Server Response Issues:
    • Investigate routing to specific ASNs (content_server_as)
    • Check peering agreements with content providers
    • Consider local CDN deployment negotiations

Low video quality is typically a symptom of insufficient bandwidth or inadequate bitrate selection by the OTT provider.

Key Metrics to Examine:

  • average_video_bitrate vs. available bandwidth: If the bitrate is significantly below the available bandwidth, this is a sign of invalid player configuration.
  • largest_played_video_size vs. initial_resolution: Indicates adaptive quality behavior. If the largest played video size is significantly higher than the initial resolution, the player may be too conservative in bitrate selection, and might start with a lower quality than necessary. On the other hand, starting with a high quality may cause longer initial loading times.
  • quality_switch_down_count vs. quality_switch_up_count: Shows adaptation aggressiveness. If the number of quality switches is higher than 1 or 2, this could be visible to the users and cause a bad experience. Ultimately, the player should be able to switch to a higher quality when the network conditions allow it, and keep it there. Make sure the bandwidth is not fluctuating too much.

Actionable Insights:

  • Low Bitrate Selection Despite Adequate Bandwidth:
    • Conservative player behavior may be responding to past network issues
    • Investigate packet loss and jitter patterns
    • Consider traffic shaping policies that might affect video streams
  • Excessive Quality Switching:
    • Network instability causing ABR oscillation
    • Check for Wi-Fi interference or cellular handover issues
    • Analyze quality switches patterns with PerformanceClientReport buffer traces (is the buffer level going down too often?)

Advanced Network Analysis

Correlation Analysis:

  • Cross-reference content_server_hostname with performance metrics such as ping times to identify problematic CDN nodes
  • Analyze performance by time of day or day of the week to identify capacity constraints
  • Compare metrics across different customer segments or access products (e.g., residential vs. business, fiber vs DSL, …)

OTT/Content Provider Perspective

As an OTT, your primary goal is to optimize player behavior, improve encoding efficiency, and enhance user experience. You have very little control over the network, but you can optimize your player and encoding parameters to improve QoE.

Key Metrics to Examine:

  • average_buffer_length and buffer trace from PerformanceClientReport: the buffer level should be constantly high after an initial fill-up time. If it is not, this is a sign of insufficient bandwidth or buffer management issues.
  • initial_loading_delay vs. initial_resolution: Startup strategy effectiveness. If you are starting with low quality, you should have a low initial loading delay, but the quality should be high enough to not cause visible degradations. If you want to start with high quality, you should expect a higher initial loading delay, but on average it should be lower than 3 seconds.
  • p1203_max_mos_ratio: Shows potential vs. actual performance: We measure how the quality would have been if the network conditions were optimal. If this ratio is significantly lower than 1, this is a sign of insufficient ABR selections.

Actionable Insights:

  • High Initial Loading Delay:
    • Consider starting with lower quality for faster startup
    • Pre-load content in your application before rendering the video player
  • Buffer Management Issues:
    • Analyze bufferTrace patterns to optimize target buffer levels
    • Adjust ABR algorithm's buffer-based decisions
    • Implement more aggressive prefetching for predicted content

Player Optimization (Quality Issues)

Key Metrics to Examine:

  • number_of_quality_switches and switch ratios: If the number of quality switches is higher than 1 or 2, this could be visible to the users and cause a bad experience. Ultimately, the player should be able to switch to a higher quality when the network conditions allow it, and keep it there. Make sure the ABR logic is not switching too much.
  • initial_resolution vs. largest_played_video_size: Stability vs. quality trade-off. If the initial resolution is significantly lower than the largest played video size, this is typically a sign of insufficient bandwidth or poor bitrate selection.
  • average_video_bitrate vs. p1203_max_theoretical_mos: Efficiency analysis. If the average video bitrate is significantly below the theoretical maximum bitrate, this is a sign of insufficient bandwidth.

Actionable Insights:

  • Conservative Quality Selection:
    • ABR algorithm may be too cautious. Make sure the ABR logic is not switching too much.
    • Consider bandwidth estimation improvements. Ensure you have quality levels covering the entire bandwidth range.
    • Implement quality ramping strategies for stable connections.
  • Excessive Quality Switching:
    • Reduce sensitivity to short-term bandwidth fluctuations.
    • Implement hysteresis in quality switching decisions to avoid oscillations.
    • Consider user preferences for stability vs. quality. There are cases where users prefer a lower quality to avoid stalling, and vice-versa. It depends on the content/what the users assume, the network conditions. For instance, short videos may be preferred to be played faster; VoD movies may be preferred to be played with a higher quality initially.

Encoding and Content Optimization

Key Metrics to Examine:

  • p1203_average_video_quality (O.22) per codec and resolution
  • dropped_frames: Indicates decoding complexity issues that affect your test hardware. Normally this is nothing to worry about in the context of Surfmeter measurements, as we infer video quality from the codec/bitrate/fps, and resolution.
  • Quality metrics correlation with content type and motion: The P.1203 Mode 0 model cannot incorporate aspects of content complexity, so it may not be able to predict the quality of varying content types and/or encoder optimizations. Hence, lower quality may be expected when bitrate is lower, even though the lower bitrate is just an artifact of lower content complexity.

We suggest using a proper full-reference objective quality metric to assess the quality of your encoding, or using a bitstream-based model like P.1204.3 (which has been shown to correlate well with subjective quality ratings).

Actionable Insights:

  • Low Quality Scores Despite High Bitrates:
    • Evaluate codec choice (H.264 vs. H.265 vs. AV1 vs. VP9)

Multi-Stakeholder Scenarios

Content Delivery Network (CDN) Performance

For ISPs: Focus on routing efficiency to CDN nodes.

For OTTs: Focus on CDN selection and failover strategies.

Key Metrics:

  • content_server_hostname and content_server_as distribution: Identify problematic CDN nodes via their IP address and ASN.
  • video_response_time by CDN node: Identify CDN nodes that are causing high latency.
  • Geographic performance patterns: Identify geographic performance patterns that are causing issues.

There are also joint analysis opportunities for player and network interaction:

  • Correlate ABR behavior with network characteristics.
  • Identify optimal quality ladders for specific network conditions.
  • Develop network-aware streaming strategies for specific network types (e.g., satellite, mobile, …)

Tools and Methodologies

Statistical Analysis

  • Use percentile analysis (P95, P99) rather than just averages for SLA definitions. Outliers can always happen; it's the consistent performance that matters.
  • Implement cohort analysis by network type, geographic region, network access type, etc.
  • Track quality trends over time to identify systematic issues.

Performance Benchmarking

  • Compare p1203_max_mos_ratio across different services and conditions.
  • Establish baseline performance metrics for different content types.
  • Keep measuring the same VoD content over time to identify systematic issues. Otherwise you might see MOS fluctuations based on the content alone, not the network.