Skip to content

ITU-T P.1204

The ITU-T Recommendation P.1204 is a family of standards that specifies video quality models for sequences of up to 4K/UHD resolution. Unlike P.1203, which predicts integral quality for longer streaming sessions, P.1204 focuses on short-term video quality assessment for video segments of 5–10 seconds duration. P.1204 is not a successor to P.1203, but rather provides alternative Pv (video quality estimation) modules that can be used standalone or integrated with P.1203.3 for session-level QoE prediction.

The P.1204 series was developed through a competition within ITU-T Study Group 12 in collaboration with the Video Quality Experts Group (VQEG), using a large subjective test dataset of approximately 5,000 test sequences rated by human viewers. For details on the development process, see Raake et al.: Multi-Model Standard for Bitstream-, Pixel-Based and Hybrid Video Quality Assessment of UHD/4K: ITU-T P.1204.

P.1204 Model Types

The P.1204 standard series comprises several model types, each using different input information (see Model Classification):

Recommendation Released Model Type Reference Required Input Information
P.1204.1 2025 Bitstream Mode 0 (Metadata) No Bitrate, resolution, framerate, codec
P.1204.3 2020 Bitstream Mode 3 No Full bitstream (QP, frame sizes, motion vectors)
P.1204.4 2020 Pixel-based (RR/FR) Yes (reduced) Reference + processed pixels
P.1204.5 2020 Hybrid (NR) No Metadata + processed pixels

All models support H.264, H.265, and VP9, with resolutions from 240p–2160p and frame rates between 15–60 fps. The P.1204 models output quality predictions on the 5-point ACR MOS scale, providing both per-segment scores (O.27) and per-one-second scores (O.22) suitable for integration with P.1203.3.

In the following sections, we provide an overview of the two main P.1204 models that can used in practice when working with Surfmeter tools: P.1204.1 (metadata-based) and P.1204.3 (bitstream-based).

ITU-T P.1204.1

ITU-T Rec. P.1204.1 specifies a metadata-based (Mode 0) video quality model that uses only basic encoding parameters—bitrate, resolution, framerate, and codec type—to predict video quality. This makes it suitable for scenarios where deep bitstream inspection is not possible, such as encrypted streams or lightweight monitoring applications.

The model architecture follows P.1204.3's degradation-based approach, computing three degradation components:

  • Quantization degradation (Dq): Estimates encoding quality loss by synthesizing the quantization parameter (QP) from bitrate, resolution, and framerate metadata
  • Upscaling degradation (Du): Models quality loss from spatial upscaling when encoding resolution is lower than display resolution
  • Temporal degradation (Dt): Accounts for jerkiness when encoding framerate is lower than display framerate

P.1204.1 has been validated for H.264, H.265, and VP9 codecs, with resolutions up to 4K/UHD-1 (3840×2160) for PC/TV displays and QHD (2560×1440) for mobile/tablet devices, and framerates up to 60 fps.

ITU-T P.1204.3

ITU-T Rec. P.1204.3 specifies a bitstream-based (Mode 3) video quality model that performs deep inspection of the encoded video stream. Unlike the metadata-based P.1204.1, this model extracts detailed information from the bitstream including quantization parameters (QP), frame sizes, frame types, and motion vectors. The model computes quality predictions using per-frame QP values (directly extracted from the bitstream rather than estimated), enabling more accurate assessment of compression artifacts. It also incorporates frame-level complexity effects based on motion information.

P.1204.3 achieves higher prediction accuracy than P.1204.1 due to its access to actual encoding parameters, even outperforming the state-of-the art VMAF model in some cases. It has been validated for H.264, H.265, and VP9 codecs, with resolutions up to 4K/UHD-1 (3840×2160) and framerates between 15–60 fps.

For technical details, see Rao et al.: P.1204.3 – An ITU-T Recommendation for Bitstream-based Video Quality Assessment Supporting Modern Video Codecs and Resolutions.

Note

P.1204.3 is not used in Surfmeter by default due to its higher computational requirements. The metadata-based P.1204.1/AVQBits|M0 model provides a good balance between accuracy and performance for most use cases.

Extensions and Variants for Pv/Pa Modules

ITU-T Rec. P.1203.1 and P.1203.2 (the video and audio modules) have been developed for the H.264 video codec and MPEG-4 AAC (AAC-LC, HE-AAC) and AC-3 audio codecs only.

In order to use the model for video services that use other codecs than the ones specified with a full P.1203-type evaluation (i.e., a session score), specific extensions or variants have been developed in collaboration with TU Ilmenau to enhance AVEQ's monitoring features. These variants are summarized in the following table:

Type Extension/Module Scope Used by default?
Video (Pv) ITU-T P.1204.1 Mode 0 Video Quality Module Codecs: H.264, H.265, VP9
FPS: 12–60
Resolution: 240p–2160p
✅ *
Video (Pv) AVQBits M0 – P.1204.3-based Mode 0 Video Quality Module Codecs: H.264, H.265, VP9, AV1
FPS: 12–60
Resolution: 240p–2160p
Video (Pv) Retrained P.1203.1 Coefficients Codecs: H.264, H.265, VP9, AV1
FPS: 12–24
Resolution: 240p–1080p
Video (Pv) Open-Source P.1203.1 Codec Extension Codecs: H.264, H.265, VP9
FPS: 12–24
Resolution: 240p–1080p

* P.1204.1 is a subset of AVQBits|M0 in terms of supported codecs, so it is, per definition, enabled.

These extensions may be used with the existing Pq component for final quality integration. They are described in the following sections.

Video Extensions

P.1204.1 Mode 0 Video Quality Module

ITU-T Rec. P.1204.1 specifies a metadata-based (Mode 0) video quality model. For technical details on the model, see the sections above. For background on the P.1204 series, see Raake et al.: Multi-Model Standard for Bitstream-, Pixel-Based and Hybrid Video Quality Assessment of UHD/4K: ITU-T P.1204.

Because P.1204.1 uses the AVQBits|M0 model internally, except for the AV1 coefficients, please read the next section for details on the model behavior.

AVQBits|M0 (P.1204.3-based Mode 0 Video Quality Module)

AVQBits|M0 is a mode 0 model (using only metadata), but it is architecturally based on the ITU-T Rec. P.1204.3 model, which is a model that actually uses the bitstream of the video (Mode 3). For background on P.1204.3, read Rao et al.'s paper.

AVQBits|M0 synthesizes the assumed quantization parameter (QP) based on video codec metadata, and derives a MOS score with metadata only. The module is available on GitHub. It covers the H.264, H.265, VP9, and – in our AVEQ-supplied variant – the AV1 codecs, and resolutions up to 2160p and framerates up to 60 fps.

Details on the model are available in Rao et al.'s paper: AVQBits—Adaptive Video Quality Model Based on Bitstream Information for Various Video Applications.

It is recommended to use this model for any video service that matches the above scope. The model's internal accuracy is very high, yielding a correlation of around 0.890 on the AVT-VQDB-UHD1 database, and it is also very fast to compute. The accuracy is better for low-bitrate encodings (and, consequently, evaluating bitrate ladders) compared to the original P.1203.1 model. More performance statistics are available in the paper linked above.

The following figure shows the behavior of the models for different bitrates (kBit/s), codecs, and resolutions, all assuming a frame rate of 60 fps and a target display resolution of 2160p.

Retrained P.1203.1 Codec Coefficients

To add support for H.265, VP9 and AV1, the coefficients of the P.1203.1 model functions have been updated by TU Ilmenau via retraining of the coefficients. Specifically, the coefficients modified were a1 through a4 and q1 through q3 (see Table A.1 and Table B.1 in ITU-T Rec. P.1203.1).

The retraining was made with the help of a newly generated set of test sequences based on the publicly available AVT-VQDB-UHD1 video database. 17 sources were encoded with H.265, VP9, and AV1, for a total number of 1581 video sequences, ranging from a resolution of 360p to 1080p, and a bitrate from 100 kbps to 16 Mbps. As encoders, libx265, libvpx-vp9, and libaom-av1 were used. Two-pass encoding was enabled with a specific encoder speed preset (medium for HEVC, 2 for VP9, 4 for AV1). Framerate was kept constant to 30 fps. As ground truth for retraining the coefficients, VMAF scores were used, which were calculated using the VMAF 0.6.1 model.

Note

This extension is enabled by default for services that do not use H.264.

Open-Source P.1203.1 Codec Extension

To support the codecs H.265 and VP9, a publicly available extension to ITU-T Rec. P.1203.1 can be used. This implementation has been developed by TU Ilmenau and is available on GitHub. Here, a linear mapping is applied to each calculated O.22 score to compensate for the improved efficiency of H.265 and VP9 compared to H.264. The method is described in more detail at the given URL.

Note

This extension is currently not enabled by default due to its lower accuracy in comparison to the retrained coefficients.

Audio Extensions

Due to the lack of available quality models for Opus, for this codec, the same coefficients as for HE-AAC will be used. The impact on the overall audio and consequently the audiovisual quality is considered negligible.

An updated version of the model with support for Opus is currently being investigated.