Skip to content

ITU-T P.1203

The ITU-T Recommendation P.1203 is a family of standards that specifies the world's first model to predict the Quality of Experience (QoE) for HTTP Adaptive Streaming (HAS) services. It consists of one main and three sub-recommendations:

  • ITU-T P.1203: Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming services over reliable transport

  • ITU-T P.1203.1: Video quality estimation module (short-term, providing per-one-second output information)

  • ITU-T P.1203.2: Audio quality estimation module (short-term, providing per-one-second output information)

  • ITU-T P.1203.3: Audiovisual integration and integration of final score, reflecting remembered quality for viewing sessions between 30 s and 5 min duration

The standard predicts the QoE in terms of Mean Opinion Scores (MOS) on a scale from 1–5, where 1 refers to Bad quality, and 5 to Excellent. P.1203 is unique among video quality models in that it combines both video quality and delivery quality into an overall session-level QoE score.

We provide the P.1203 MOS as a statistic value called p1203_overall_mos. In the standard, it is referred to as output O.46 (see further Outputs below).

The models described in the standard have been created by an international consortium of academic and industrial partners; they have been trained and validated on over 1000 audiovisual sequences that were rated by human viewers, thus over 25,000 individual ratings. The ratings were captured in the context of standardized subjective tests conducted in dedicated laboratories. The model development was coordinated within ITU-T Study Group 12, and the competition approach ensured only the best performing model candidates were standardized.

Modular Structure of P.1203

P.1203 is composed of several modules that each compute different aspects of the overall quality estimation.

The input streams are analyzed separately for audio and video quality. The P.1203.1 and P.1203.2 Pv and Pa modules produce a per-one-second MOS value corresponding to the per-stream video and audio quality, which are then integrated over time–-considering any influence by stalling and quality fluctuation happening during playout. The integration happens in the Pq module. It predicts the final MOS value (O.46). This MOS value corresponds to the quality rating a user would have given had she/he seen the video.

The modular structure allows the integration module to be used with other video/audio quality models, under the condition that the combination is validated in terms of prediction accuracy.

Modes of Operation

P.1203.1, the video quality estimation module, offers four modes of operation, depending on the available information from the audiovisual stream and the required/available computational resources.

P.1203's simplest mode of operation (mode 0) takes as input: audio/video bitrate, video resolution, frames per second, and stalling events happening at the client side. Depending on the available data, it offers higher modes of operation that increase prediction accuracy at the expense of being more computationally intensive and requiring input data from more in-depth bitstream inspection.

While Mode 0 has access to basic data, Mode 1 can inspect the packet headers of the transmitted stream to obtain frame sizes and types. Modes 2 and 3 have access to the bitstream itself, where mode 2 only accesses 2% of the stream to reduce computing efforts. Mode 2 will be rarely used in practice, since Mode 3 can be calculated rather efficiently using modern hardware.

Outputs

The model produces various outputs that can be used for diagnostic purposes:

Scope and Limitations

The following limitations apply for P.1203-related KQIs:

  • ITU-T Rec. P.1203 has only been validated for video sequences of up to 5 minutes length. Hence, a measurement of a video source that is longer than this duration is technically possible but may be considered invalid with respect to the standard.

  • ITU-T Rec. P.1203 has only been validated for video up to 25 fps frame rate. Use of the model for video with higher frame rates is technically possible but will not yield higher quality ratings.

  • ITU-T Rec. P.1203 has only been validated for video up to 1080p resolution. Use of the model for video with higher resolution is technically possible but will not yield higher quality ratings. An Appendix exists for P.1203 that enables the use of up to UHD-1 resolution.

Amendments for Impact of Stalling and Low Quality

During the deployment of ITU-T Rec. P.1203 it was discovered that the impact of low audiovisual quality and stalling on the overall MOS was too low in comparison to what users of the model would expect. In certain edge cases, the model would give too high predictions for the MOS with considerably large values of initial loading delay or stalling. A set of modifications have been proposed and are available in Surfmeter.

In order to increase the impact of stalling events and very low audiovisual quality, ITU-T Rec. P.1203.3 has been officially updated with Amendment 1 "Adjustment of the audiovisual quality". This amendment is available in the Surfmeter software.

Note

The amendment is enabled by default.