Buffering delay and MPEG-2 Transport stream

As we discussed in the previous post “What are CBR, VBV, and CPB?”, video decoder has a buffer to compensate the different coded size of each picture. Here is the actual example of the buffer occupancy of a MPEG-2 video stream at the beginning of the stream.


The first picture is decoded (= removed from the buffer) at the time zero. Before the decoding time, the coded stream has been entering into the buffer for more than 0.5 second.

The time interval between the entrance of the first byte of the coded picture and the decoding of the picture is called buffering delay. The delay is essentially the time between these two events.

  • The STB starts to receive the coded picture.
  • The STB decodes video frame.

The maximum buffering delay is caused when a picture is decoded when the buffer fullness is same as the buffer size. It means the buffer size and bit rate controls the maximum buffering delay.

max_buffering_delay (second) = buffer_size (bytes) * 8 / bit_rate (bits/second)

As we discussed, with a given bit rate, you could increase the video quality by increasing the buffer size. However, this trick has its own cost. It makes the STB more expensive and the buffering delay longer.

In the case of MPEG-2 video, the maximum buffering delay is almost always less than 0.7 second. However, in the case of H.264, it’s common to see very long delay such as 4 to 10 seconds especially for Web streaming context.

When multiplexing the video stream into transport stream, however, such long delay would cause a problem. While MPEG-2 TS standard allows up to 10 second delay for H.264 video, almost all STBs accepts up to 1 to 2 second. Therefore, if you want to construct the H.264 stream for transport stream broadcast purpose, you should restrict the CPB buffer size so that the buffer delay is less than one second.


About Moto

Engineer who likes coding
This entry was posted in Video. Bookmark the permalink.

8 Responses to Buffering delay and MPEG-2 Transport stream

  1. Pingback: HRD Emulator in HTML5 | CODE: Sequoia

  2. Eland says:

    Very interesting article… My questions are: a) Can the init_cpb_removal_delay for an H.264 stream be limited to the approximately same value as an MPEG-2 video stream, delivered in an MPEG-2 TS? You mention that it is typically larger (4-10 seconds) for H.264 and only 0.7s for MPEG-2 – is that a constraint by nature of encoding? But you do mention it can be lowered in the range of 1-2s in the last paragraph which suggests it can be made lower during encoding. I typically have seen the delay to be larger for MPEG-4 streams although it varies through the stream. Thanks for the nice article.

    • Moto says:

      The value of init_cpb_removal_delay is a parameter to a H.264 encoder. It’s usually exposed to the user interface as “buffer size” or “maximum delay”. So, yes, you can limit the init_cpb_removal_delay approximately same value as an MPEG-2 video stream.

      MPEG-2 video often uses less than one second delay because they are typically used in real time or hardware playback context. For examples,

      – TV broadcasting (= Transport stream) requires the maximum delay to be one second (ISO/IEC 13818-1
      – DVD has limited bandwidth (about 10Mbps) and more than one second delay is not pleasant experience for viewers.

      For the reasons, MPEG-2 standard (ISO/IEC 13818-2 6.3.9) limits the maximum delay in terms of the length of vbv_delay field in picture header. The vbv_delay is coded in 16 bits to carry 90Khz ticks, which means 65535 / 90000 = 0.73 seconds is the maximum. You can use so called FFFF-VBV mode to work around the limitation, but such streams would not work with transport streams or DVD anyway.

      Now, encoder developers found that longer delay could increase the video quality for a given bit rate. Around 2003 when H.264 standard was published, Web video delivery became realistic. The internet has best effort, usually-much-higher-than-video-bitrate bandwidth. In such context, it’s not necessary to limit the maximum delay. Instead, we can get better video quality by giving the flexibility to the encoder with longer delay.

      Thus, H.264 init_cpb_removal_delay uses variable length coding which removes 16 bits limitation. We often see the use of 4-10 seconds delay in MP4 wrapped H.264 stream delivered over Flash, for example.

      The system layer (Transport Stream) was also extended to support maximum T-STD delay of 10 second to multiplex such long latency stream. (ISO/IEC 13818-1:2007 Not many STBs support such delay though.

      Given that said, I am not fully convinced the benefit of such long delay. It may result in better video quality for the first few minutes, but I observe VQ will eventually become same as 1-second delay configuration. Long delay is definitely not the inherent requirement of H.264. H.264 can perform quite well with one second delay at reasonable bit rate.

      Today, it’s common to deliver a same content over different delivery methods, namely MP4 in Flash, adaptive bit rate streaming, and broadcasting over transport stream. It’s desirable to reuse H.264 stream for various container formats so that we can avoid expensive re-encoding. Therefore, I would limit the delay to one to two seconds even for H.264.

      Sorry for lengthy comment. Hope it answers your question.

      • Eland says:

        Sorry for my delay (I thought I will get an auto mail since I am subscribed to the blog 🙂 Thank you for the terrific reply and fine insight into the topic. Keep up the great work you are doing – I just saw your note on HEVC and I am looking forward to reading that too. Best regards

  3. Eland says:

    Could you also explain intuitively why it needs to be larger for MPEG-4? Thanks again for your fine work.

  4. Bond says:

    When you say, increased buffering delay for H.264 will result in better video quality, what do you exactly mean in terms of operation of the encoder?
    Does it mean that, because the encoder has more frames to look at before encoding a frame, it can make a better estimate on how to distribute the bits (effectively, QP value estimates) when encoding different frames and hence the better quality?

    • Moto says:

      Here is the theory. Longer delay is equivalent to larger CPB size. It means an encoder could allocate more bits to a complex picture without causing the underflow error. In other words, the encoder has more flexibility on bits per picture distribution, hence better video quality. Does it make observable difference? Well, that’s another question.

  5. JK says:

    1. I have a question. Does the encoder too need to buffer max_buffering_delay before sending the data over network?
    2. Will the glass to glass delay depend upon max_buffering_delay?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s