WebRTC Transmission Modes and Quality Analysis¶

Introduction¶

Your Surfmeter probes are currently deployed to measure Google Meet and Microsoft Teams in 1-to-1 call scenarios with two participants. They send one artificially generated audio and video stream in each direction, enabling detailed analysis of transmission modes and quality metrics both from the sender and recipient side.

The detailed KPIs available are shown in our docs. Please take a look there for a comprehensive list.

When analyzing your measurement data, the question may arise which side of the transmission is responsible for bad quality (as visible in low Mean Opinion Scores, packet loss, AV desync, etc.). The first indicator to examine is the candidateType field in the ICE candidate information from both probes. The second indicator is the congruency of quality metrics. You should expect codec consistency between sent and received streams in all 1-to-1 calls, as platforms typically do not transcode in our measurement scenario.

Current observations, based on measurements with Surfmeter probes, show the following consistent patterns:

Google Meet 1:1 calls: SFU architecture with remote candidates showing Google IP addresses and host candidate type
Microsoft Teams 1:1 calls: TURN relay architecture with remote candidates showing Microsoft relay server IPs and relay candidate type

This document explains how to interpret these fields in the context of different WebRTC architectures.

WebRTC Transmission Modes Overview¶

This section describes the main WebRTC transmission modes, their characteristics, and typical use cases.

Direct Peer-to-Peer (P2P)¶

Media flows directly between the two clients without any intermediate server processing.

graph LR
    A[Probe A] <-->|Direct RTP/RTCP<br/>Video/Audio Streams| B[Probe B]

    style A fill:#4CAF50,stroke:#2E7D32,color:#fff
    style B fill:#4CAF50,stroke:#2E7D32,color:#fff

Direct peer-to-peer connections offer the lowest achievable latency, typically ranging from 20 to 100 milliseconds, delivering the best possible quality since no transcoding occurs in the media path. The infrastructure cost remains minimal as no intermediate servers process the media streams. However, this approach is inherently limited by NAT and firewall traversal capabilities, which explains why it cannot be universally deployed.

TURN Relay (Server Relay without Transcoding)¶

Media is relayed through a TURN (Traversal Using Relays around NAT) server but not re-encoded. The server simply forwards packets.

graph TB
    A[Probe A] -->|Upload Stream| TURN[TURN Relay Server]
    TURN -->|Forward Stream<br/>No Transcoding| B[Probe B]
    B -->|Upload Stream| TURN
    TURN -->|Forward Stream<br/>No Transcoding| A

    style A fill:#FF9800,stroke:#E65100,color:#fff
    style B fill:#FF9800,stroke:#E65100,color:#fff
    style TURN fill:#2196F3,stroke:#0D47A1,color:#fff

A TURN relay introduces higher latency compared to direct P2P connections, adding approximately 10-50 milliseconds per network hop. Despite this latency increase, the relay preserves media quality since no transcoding occurs. The server simply forwards packets without decoding or re-encoding them. This approach becomes necessary when NAT configurations or firewalls prevent direct connections between endpoints.

Services typically deploy TURN relays for 1-to-1 calls when direct peer-to-peer connectivity fails due to restrictive NAT configurations or corporate firewall policies.

SFU (Selective Forwarding Unit)¶

A Selective Forwarding Unit (SFU) is a central server that receives media streams from all participants and forwards them to other participants without decoding/re-encoding. Each client sends one stream and receives N-1 streams.

graph TB
    P1[Participant 1<br/>Sends: 1 stream<br/>Receives: 3 streams] -->|Upload| SFU[SFU Server<br/>Selective Forwarding<br/>No Transcoding]
    P2[Participant 2<br/>Sends: 1 stream<br/>Receives: 3 streams] -->|Upload| SFU
    P3[Participant 3<br/>Sends: 1 stream<br/>Receives: 3 streams] -->|Upload| SFU
    P4[Participant 4<br/>Sends: 1 stream<br/>Receives: 3 streams] -->|Upload| SFU

    SFU -->|Forward P2,P3,P4 streams| P1
    SFU -->|Forward P1,P3,P4 streams| P2
    SFU -->|Forward P1,P2,P4 streams| P3
    SFU -->|Forward P1,P2,P3 streams| P4

    style P1 fill:#9C27B0,stroke:#4A148C,color:#fff
    style P2 fill:#9C27B0,stroke:#4A148C,color:#fff
    style P3 fill:#9C27B0,stroke:#4A148C,color:#fff
    style P4 fill:#9C27B0,stroke:#4A148C,color:#fff
    style SFU fill:#2196F3,stroke:#0D47A1,color:#fff

The architecture scales effectively to accommodate many participants, though client bandwidth ultimately limits the practical number of concurrent streams. The server assumes responsibility for routing and bandwidth multiplication, intelligently distributing streams to participants based on their needs and capabilities.

The Selective Forwarding Unit achieves moderate latency similar to TURN relay while preserving quality through a no-transcoding approach, though this changes when simulcast or SVC encoding techniques come into play (see MDN), or when a MCU sits behind the SFU.

SFU architecture is used in various scenarios, sometimes even for 1-to-1 calls, while multi-party conferences ranging from 3 to 50 or more participants across various platforms should typically use SFU, as do many webinar scenarios.

MCU (Multipoint Control Unit)¶

A Multipoint Control Unit (MCU) is a central server that receives all streams, decodes them, mixes/composes them into a single stream (or multiple optimized streams), and sends the result to each participant. Here is an example:

graph TB
    P1[Participant 1<br/>Sends: 1 stream<br/>Receives: 1 mixed stream] -->|Upload VP9| MCU[MCU Server<br/>Decode All<br/>Compose Layout<br/>Encode Mixed Stream]
    P2[Participant 2<br/>Sends: 1 stream<br/>Receives: 1 mixed stream] -->|Upload H.264| MCU
    P3[Participant 3<br/>Sends: 1 stream<br/>Receives: 1 mixed stream] -->|Upload VP8| MCU
    P4[Participant 4<br/>Sends: 1 stream<br/>Receives: 1 mixed stream] -->|Upload H.264| MCU

    MCU -->|Mixed/Composed<br/>Stream H.264| P1
    MCU -->|Mixed/Composed<br/>Stream H.264| P2
    MCU -->|Mixed/Composed<br/>Stream H.264| P3
    MCU -->|Mixed/Composed<br/>Stream H.264| P4

    style P1 fill:#F44336,stroke:#B71C1C,color:#fff
    style P2 fill:#F44336,stroke:#B71C1C,color:#fff
    style P3 fill:#F44336,stroke:#B71C1C,color:#fff
    style P4 fill:#F44336,stroke:#B71C1C,color:#fff
    style MCU fill:#FF5722,stroke:#BF360C,color:#fff

Multipoint Control Units introduce higher latency ranging from 50 to 200 milliseconds due to the decode and encode operations required to mix media streams. This transcoding process may degrade quality through the introduction of compression artifacts and generational loss.

Despite these quality tradeoffs, the MCU architecture can obtain great scalability because clients only send and receive a single stream regardless of the total participant count. This comes at the cost of the highest server CPU consumption among all WebRTC architectures. As an advantage though, the MCU's processing capabilities enable features like layout composition, supporting picture-in-picture arrangements and grid layouts by actually rendering and composing the video streams server-side.

Services deploy MCU architecture primarily for large conferences with 50 or more participants, or recording scenarios where a single mixed output simplifies capture.

Architectures used by Google Meet¶

Historically, Google Meet used to employ a progressive connectivity strategy for 1-to-1 calls, establishing a direct peer-to-peer connection using STUN for NAT traversal. This approach was optimized for the best possible quality when network conditions permit, but P2P support appears to have been removed with the transition from Legacy Calls to Meet Calls (see FAQ) in September 2025.

Current observations from Surfmeter measurements show that Google Meet now uses SFU architecture even for 1-to-1 calls. Both probes consistently show Google-owned IP addresses in their remote candidates with candidate type host. This indicates that both endpoints connect to Google's media servers rather than directly to each other. The media servers present their own network interfaces (hence host type) and forward streams between participants using SFU architecture.

For group meetings with multiple participants, Google Meet continues to use the same SFU architecture, where each participant sends their media stream once to a Google media server, which then selectively forwards the relevant streams to other participants. This approach is scalable and efficient for conferences of any size.

Architectures used by Microsoft Teams¶

Microsoft Teams routes calls through relay servers in most cases. Per the Citrix docs, it "relies on Media Processor servers in Microsoft 365 for meetings or multiparty calls." Furthermore, it uses Transport Relays when two peers have no direct connectivity, or when a participant "does not have direct connectivity to the media processor."

Current observations from Surfmeter measurements show that Microsoft Teams consistently uses a TURN relay architecture for 1-to-1 calls. In our measurements, both probes show Microsoft-owned IP addresses in their remote candidates with candidate type relay. This indicates that media is forwarded through Microsoft's TURN infrastructure without transcoding, which preserves quality while ensuring connectivity even in restrictive network environments.

Note that Microsoft Teams may use different architectures for group meetings with multiple participants, potentially employing SFU or MCU architectures depending on the scenario and participant count. However, in our current 1-to-1 call measurements, TURN relay is the observed architecture.

Identifying Transmission Modes¶

This section provides step-by-step instructions on how to analyze Surfmeter probe data to determine the WebRTC transmission mode in use.

The candidateType field in localCandidate and remoteCandidate is the primary indicator for identifying the transmission mode. The following table summarizes the candidate types:

Candidate Type	Description	What it means
`host`	Direct local network interface	Direct connection attempt, no intermediary
`srflx`	Server Reflexive (STUN)	Connection through NAT, but direct path
`relay`	TURN relay	Connection must go through relay server
`prflx`	Peer Reflexive	Learned during connectivity checks (rare)

When analyzing data from Probe A (endpoint 1) and Probe B (endpoint 2), use the following table to determine the transmission mode:

Probe A Local	Probe A Remote	Probe B Local	Probe B Remote	Mode	Explanation
`host`	`host`	`host`	`host`	Direct P2P	Both on same local network when local IPs are used in remote candidate
`host`/`srflx`	`host`/`srflx`	`host`/`srflx`	`host`/`srflx`	Direct P2P	Direct connection through NAT(s) – check IPs to confirm
`host`/`srflx`	`host`	`host`/`srflx`	`host`	SFU	Both remotes are server IPs
`host`/`srflx`	`relay`	`host`/`srflx`	`relay`	TURN Relay	Both sides see relay as remote

Note

When relay appears in remote candidate on both probes, the connection uses a TURN server.
When both probes show host in remote candidates but the IPs belong to a third party (e.g., Google, not the other probe), this indicates SFU architecture, not P2P.
Always verify with IP address analysis (see next section) to distinguish between true P2P and SFU connections.

You can compare IP addresses from localCandidate.ip and remoteCandidate.ip across both probes to determine the connection architecture:

Scenario 1: Direct P2P Connection

Probe A's remoteCandidate.ip matches Probe B's localCandidate.ip (or its public IP)
Probe B's remoteCandidate.ip matches Probe A's localCandidate.ip (or its public IP)
The probes are directly connected to each other

Scenario 2: SFU

Both probes' remoteCandidate.ip point to third-party server IPs (e.g., Google-owned IPs)
The remote IPs may be the same server or different servers in the same infrastructure
Remote candidate type is typically host (the server's own interface)

Scenario 3: TURN Relay

Both probes' remoteCandidate.ip point to relay server IPs (e.g., Microsoft Teams relay servers)
Remote candidate type is relay

Scenario 4: MCU

Multiple participants connecting to the same server IP
Transcoding occurs (check codec differences in quality metrics)

How to Detect Transcoding¶

Note

These data are currently only exposed in debugging mode and not in standard Surfmeter reports.

Comparing p1203ClientReportSent and p1203ClientReportReceived reveals whether transcoding may have occurred. The key metrics to compare are:

Metric	Location in Report	Significance
Codec	`I13.segments[].codec` (video) `I11.segments[].codec` (audio)	Different codec = transcoding occurred
Resolution	`I13.segments[].resolution`	Different resolution = scaling/transcoding
Bitrate	`I13.segments[].bitrate` `I11.segments[].bitrate`	Large discrepancy = transcoding or packet loss
Frame Rate	`I13.segments[].fps`	Different FPS = transcoding or frame dropping

So, if the Sent Codec = Received Codec, it is likely that no transcoding happened. If they differ, transcoding definitely occurred. The bitrates for video should also match within ±5% to account for network variations. Large differences may indicate transcoding or significant packet loss.

Note that for audio, we have observed large bitrate differences, which could be due to variable bitrate encoding like Opus Discontinuous Transmission (DTX). Therefore, audio bitrate differences alone are not a reliable indicator of transcoding.

Packet Loss Analysis¶

An important question about interpreting the results is: When you see packet loss, where exactly did the packets get lost? The answer depends critically on whether P2P, a TURN relay or SFU architecture is used. This section explores that.

Consider that RTP packets have sequence numbers (1, 2, 3, 4, …). The receiver can detect gaps in sequence (e.g., it receives 1, 2, 4, 5 → packet 3 is lost). Therefore, packetsLost at receiver = number of expected but never-arrived packets.

Direct P2P¶

In a simple P2P case, there is only one network path (A → B). Therefore, any loss seen at B must have occurred on that single path:

graph LR
    A[Probe A<br/>packetsSent: 1000] -->|Network Path<br/>50 packets lost| B[Probe B<br/>packetsReceived: 950<br/>packetsLost: 50]

    style A fill:#4CAF50,stroke:#2E7D32,color:#fff
    style B fill:#4CAF50,stroke:#2E7D32,color:#fff

TURN Relay¶

A more complex case is when a TURN relay is used. Now there are two network legs, but RTCP is end-to-end:

graph LR
    A[Probe A<br/>packetsSent: 1000] -->|Leg 1<br/>A → TURN<br/>Loss here?| TURN[TURN Server<br/>UDP Forwarding<br/>No RTP termination]
    TURN -->|Leg 2<br/>TURN → B<br/>Loss here?| B[Probe B<br/>packetsReceived: 950<br/>packetsLost: 50]
    B -.->|RTCP feedback| A

    style A fill:#FF9800,stroke:#E65100,color:#fff
    style B fill:#FF9800,stroke:#E65100,color:#fff
    style TURN fill:#2196F3,stroke:#0D47A1,color:#fff

SFU¶

With an SFU, there are two separate RTP sessions with independent RTCP feedback per leg:

graph LR
    A[Probe A<br/>packetsSent: 1000<br/>retransmittedPacketsSent: 50] -->|RTP Session 1<br/>A → SFU<br/>50 packets lost| SFU[SFU Server<br/>RTP Termination<br/>Separate sessions]
    SFU -->|RTP Session 2<br/>SFU → B<br/>50 packets lost| B[Probe B<br/>packetsReceived: 950<br/>packetsLost: 50]
    SFU -.->|RTCP: Session 1| A
    B -.->|RTCP: Session 2| SFU

    style A fill:#9C27B0,stroke:#4A148C,color:#fff
    style B fill:#9C27B0,stroke:#4A148C,color:#fff
    style SFU fill:#2196F3,stroke:#0D47A1,color:#fff

The critical difference between these architectures lies in how they handle RTP sessions and RTCP feedback. TURN relays operate at the transport layer and simply forward UDP packets without terminating the RTP session. This means RTCP feedback travels end-to-end through the relay, so when Probe B sees packetsLost: 50, it cannot tell which leg lost them. A packet lost on the path from A to TURN looks identical to one lost from TURN to B from the receiver's perspective.

SFUs operate differently at the media layer. They terminate RTP sessions and establish separate connections with each participant, creating distinct sessions A↔SFU and SFU↔B. Each leg has its own RTCP feedback loop, which means packet loss on each leg can be measured independently. When Probe B sees packetsLost: 50 in an SFU scenario, it specifically measures loss from SFU to B only, not from A through the SFU to B. This provides much better isolation of where problems occur.

Interpreting Packet Loss Metrics¶

With probes at both ends in TURN relay scenarios, you cannot definitively isolate which leg caused the loss. The loss could be on A's upload path, B's download path, i.e., anywhere along the routing from A through the relay to B. However, in SFU scenarios, each probe's packet loss statistics directly measure their leg to or from the SFU.

This means Probe B's packetsLost specifically indicates loss from SFU to B, and Probe A's packetsLost specifically indicates loss from SFU to A. Similarly, retransmittedPacketsSent from A to SFU relate to the upload path from A to SFU, and vice-versa for B. The directional nature of the problem immediately identifies which probe's connection has issues.

Analysis Decision Tree¶

This flowchart summarizes the decision process for analyzing Surfmeter probe data to determine the WebRTC transmission mode and whether transcoding occurred.

flowchart TD
    Start([START: Analyze Surfmeter Probes]) --> CheckRelay{Is candidateType<br/>'relay' on either side?}

    CheckRelay -->|YES| UsingRelay[Using TURN Relay]
    UsingRelay --> CheckCodecRelay{Codec/Resolution<br/> identical?}
    CheckCodecRelay -->|YES| TurnRelay[TURN RELAY<br/>No transcoding]
    CheckCodecRelay -->|NO| McuBehindTurn[UNCLEAR<br/>Transcoding cannot happen behind TURN]

    CheckRelay -->|NO| NotRelay[No Relay in Candidate Type]
    NotRelay --> CheckServerIP{Remote IPs are<br/>third-party servers?}

    CheckServerIP -->|YES| ConnectingServer[Connecting to Media Server]
    ConnectingServer --> CheckCodecServer{Codec/Resolution<br/>identical?}
    CheckCodecServer -->|YES| SFU[SFU<br/>Selective Forwarding]
    CheckCodecServer -->|NO| MCU[MCU<br/>Transcoding<br/>]

    CheckServerIP -->|NO| CheckP2P{Remote IPs match<br/>opposite local IPs?}
    CheckP2P -->|YES| DirectP2P[DIRECT P2P]
    CheckP2P -->|NO| Unknown[UNCLEAR]

    ConnectingServer --> CheckParticipants{Multiple<br/>participants >2?}
    CheckParticipants -->|YES| SfuOrMcu{Codec identical?}
    SfuOrMcu -->|YES| SFU2[SFU Mode<br/>3+ party conference]
    SfuOrMcu -->|NO| MCU2[MCU Mode<br/>Large conference]

    style Start fill:#2196F3,stroke:#0D47A1,color:#fff
    style TurnRelay fill:#FF9800,stroke:#E65100,color:#fff
    style DirectP2P fill:#4CAF50,stroke:#2E7D32,color:#fff
    style SFU fill:#9C27B0,stroke:#4A148C,color:#fff
    style SFU2 fill:#9C27B0,stroke:#4A148C,color:#fff
    style MCU fill:#F44336,stroke:#B71C1C,color:#fff
    style MCU2 fill:#F44336,stroke:#B71C1C,color:#fff
    style McuBehindTurn fill:#9E9E9E,stroke:#424242,color:#fff
    style Unknown fill:#9E9E9E,stroke:#424242,color:#fff

The flowchart uses color coding to indicate different transmission modes:

🟢 indicates Direct P2P connections with best quality and lowest latency, though these are uncommon in current measurements.
🟠 represents TURN Relay architecture with good quality and moderate latency
🟣 indicates SFU architecture with no transcoding and moderate latency.
🔴 signifies MCU architecture with transcoding and higher latency, which is rare and not applicable in our current measurements.
⚪ indicates unknown configurations that need investigation.