Understand H.264 Time Code

SMPTE12M specifies time code counting rules only for broadcast frame rates such as 29.97 fps. We can calculate the timing information from time code with the knowledge of frame rate and the time code counting rule in use. Generally speaking, however, there is no generic way to get the timing information in SMPTE12M for other frame rates such as 12.5 fps.

The H.264 standard (ISO/IEC 14496-10 Table D 2.2) addresses the problem by flexible time code syntax.

First, H.264 specifies a concept of “clockTimestamp”. It’s a tick count in time_scale Hz from some unspecified point in time for which clockTimestamp is zero. For example, clockTimestamp=120 indicates that just two second has passed from the point clockTimestamp=0 when time_scale is 60. The value of time_scale is coded in video usability information (E.1.1) and used together with num_units_in_tick to specify the field rate. For example, 29.97 fps sequence would have time_scale = 60000 and num_units_in_tick = 1001 so that time_scale / num_units_in_tick is the field rate.

The formula to calculate clockTimestamp from time code (hH:mM:sS:nFrames) is (*1):

$clockTimestamp=((hH * 60 + mM) * 60 + sS) * time\_scale + nFrames * (num\_units\_in\_tick * 2) + tOffset$

The key component here is tOffset, which is coded as a variable length coded signed integer syntax called time_offset. It allows H.264 time code to represent the exact timing information without specifying the time code counting rules such as NTSC drop frame time code. Ultimately, for example, you could leave time code to 00:00:00:00 and store the entire timing information in tOffset. Of course, it’s not very good idea because we need many number of bits to code tOffset. For example, to represent at the point of two hours only by tOffset when time_scale is 50, the value of tOffset is $2 * 60min * 60sec * 50Hz = 360000$ , which needs $\lceil \log _2 (360000) \rceil = 19$ bits to code. A better approach is to store non-drop frame time code in hH:mM:sS:nFrames syntax and store only the offset to tOffset.

However, there is still a problem; The number of bits needed to code time_offset is increased over time when frame rate is not an integer. For example, you have 2-hour movie at 29.97fps. The clockTimestamp at 60000Hz is

$clockTimestamp = 2hour * 60min * 60second * 60000Hz = 432000000 ticks$

The number of frames for two hours is $2 * 60 * 60 * (30000 / 1001) \simeq 215784$ , therefore the NDF TC is 01:59:52:24. The value of tOffset is:

$432000000 ticks - ((1 * 60 + 59) * 60 + 52) * 60000 + 24 * (1001 * 2)= 431952$

As $\lceil \log _2 (431952) \rceil = 19$ , we need at least 19 bits to code the offset.

The solution is to use some special time code counting rule such as NTSC DF time code so that we can keep the value of tOffset small.

Technically, we can use any time code counting scheme here. No matter what counting scheme is used, it’s just an optimization to reduce the number of bits needed to code tOffset. We can always recover the accurate timing information without knowing the counting scheme. Practically, however, it’s convenient if we can enforce a certain counting scheme so that an application doesn’t have to calculate the clockTimestamp every time.

H.264 standard allows a stream to specify the following time code counting rules.

The value of 0 is used for integer frame rate such as 25 fps and application does’t have to care tOffset at all. The value of 1 is so-called non drop frame time code but it requires many bits for time_offset syntax. The value of 2 can be used to “half frame rate” such as 12.5 fps. The basic idea here is toggling the frame counter between 0…12 and 1 … 12 so that overall frame counting is 12.5 fps. It reduces the number of bits needed for time_offset significantly. The value of 3 is similar but toggling between 0…12 and 0…11. The value of 4 is so called NTSC DF time code.

Note that these schemes are not limited to “half frame rate” or “NTSC DF Time Code”. In fact, there is a flag called cnt_dropped_flag which signifies the previous time code value is dropped – and you can set the flag at any time code allowed by the counting_type. For example, you can use counting_type=2 for “quarter frame rate” such as 6.25 fps.

Conclusively, the H.264 time code syntax has three major features. First, it allows the precise recovery of timing information from the time code at any frame rate and any frame counting scheme. Second, it does so without consuming a lot of bits. Third, it carries frame counting scheme information so that application can know the semantic of the time code without calculating the timing information every time.

(*1) I made one simplification. In the standard, the factor of 2 multiplied to num_unit_in_tick is specified as $1 + nuit\_field\_based\_flag$ . It’s a left over from the old editions where the standard was unclear if time_scale / num_units_in_tick was frame rate or field rate. Today, everyone agrees that time_scale / num_units_in_tick is a field rate and therefore nuit_field_based_flag shall be always 1. See also: http://lists.mpegif.org/pipermail/mp4-tech/2005-July/005700.html

2 Responses to Understand H.264 Time Code

Takayuki Goto says:

September 10, 2012 at 9:12 pm

Thank you for exposition about framerate!

＞ Today, everyone agrees that time_scale / num_units_in_tick is a field rate and therefore nuit_field_based_flag shall be always 1.
Is there any reference ?
I have read the mail(http://web.archive.org/web/20071114044732/http://lists.mpegif.org/pipermail/mp4-tech/2005-July/005700.html) referenced by this article.

But, I can’t understand the way to determine framerate of bitstreams with fixed_frame_rate=0.
If a bitstream has fixed_frame_rate=1, is constrainted to Δtfi,dpb(n) = Δto,dpb(n) / DeltaTfiDivisor (E-34) .
(if (pic_struct_present_flag,field_pic_flag)=(0,0) then DeltaTfiDivisor=2)
But, if a bitstream has no fixed_frame_rate flag, how do you understand that the value of time_scale/num_units_in_tick represent the field rate ?

Moto says:

September 11, 2012 at 8:59 am

Hi Goto san,

You are right that we cannot say time_scale / num_units_in_tick is field rate when fixed_frame_rate flag is zero based on E-34 and the following text.

In the case of this, we want to check D-2 which defines the upper bound of n_frames (= MaxFPS).

MaxFPS = Ceil(time_scale / (2 * num_units_in_tick)

If time_scale / num_units_in_tick were frame rate, it would not be able to set n_frames value properly. Thus, time_scale / num_units_in_tick should be a field rate.

	Ashis Kumar Sahu on Understanding SCTE-35
	Slice vs Tile in H.2… on HEVC – What are CTU, CU,…
	Manish Pednekar on Understanding SCTE-35
	How can I determine… on HEVC – What are CTU, CU,…
	Bartek Zdanowski on Understanding SCTE-35