## Understand H.264 Time Code

SMPTE12M specifies time code counting rules only for broadcast frame rates such as 29.97 fps. We can calculate the timing information from time code with the knowledge of frame rate and the time code counting rule in use. Generally speaking, however, there is no generic way to get the timing information in SMPTE12M for other frame rates such as 12.5 fps.

The H.264 standard (ISO/IEC 14496-10 Table D 2.2) addresses the problem by flexible time code syntax.

First, H.264 specifies a concept of “clockTimestamp”. It’s a tick count in time_scale Hz from some unspecified point in time for which clockTimestamp is zero. For example, clockTimestamp=120 indicates that just two second has passed from the point clockTimestamp=0 when time_scale is 60. The value of time_scale is coded in video usability information (E.1.1) and used together with num_units_in_tick to specify the field rate. For example, 29.97 fps sequence would have time_scale = 60000 and num_units_in_tick = 1001 so that time_scale / num_units_in_tick is the field rate.

The formula to calculate clockTimestamp from time code (hH:mM:sS:nFrames) is (*1):

$clockTimestamp=((hH * 60 + mM) * 60 + sS) * time\_scale + nFrames * (num\_units\_in\_tick * 2) + tOffset$

The key component here is tOffset, which is coded as a variable length coded signed integer syntax called time_offset. It allows H.264 time code to represent the exact timing information without specifying the time code counting rules such as NTSC drop frame time code. Ultimately, for example, you could leave time code to 00:00:00:00 and store the entire timing information in tOffset. Of course, it’s not very good idea because we need many number of bits to code tOffset. For example, to represent at the point of two hours only by tOffset when time_scale is 50, the value of tOffset is $2 * 60min * 60sec * 50Hz = 360000$, which needs $\lceil \log _2 (360000) \rceil = 19$ bits to code. A better approach is to store non-drop frame time code in hH:mM:sS:nFrames syntax and store only the offset to tOffset.

However, there is still a problem; The number of bits needed to code time_offset is increased over time when frame rate is not an integer. For example, you have 2-hour movie at 29.97fps. The clockTimestamp at 60000Hz is

$clockTimestamp = 2hour * 60min * 60second * 60000Hz = 432000000 ticks$

The number of frames for two hours is $2 * 60 * 60 * (30000 / 1001) \simeq 215784$, therefore the NDF TC is 01:59:52:24. The value of tOffset is:

$432000000 ticks - ((1 * 60 + 59) * 60 + 52) * 60000 + 24 * (1001 * 2)= 431952$

As $\lceil \log _2 (431952) \rceil = 19$, we need at least 19 bits to code the offset.

The solution is to use some special time code counting rule such as NTSC DF time code so that we can keep the value of tOffset small.

Technically, we can use any time code counting scheme here. No matter what counting scheme is used, it’s just an optimization to reduce the number of bits needed to code tOffset. We can always recover the accurate timing information without knowing the counting scheme. Practically, however, it’s convenient if we can enforce a certain counting scheme so that an application doesn’t have to calculate the clockTimestamp every time.

H.264 standard allows a stream to specify the following time code counting rules.

The value of 0 is used for integer frame rate such as 25 fps and application does’t have to care tOffset at all. The value of 1 is so-called non drop frame time code but it requires many bits for time_offset syntax. The value of 2 can be used to “half frame rate” such as 12.5 fps. The basic idea here is toggling the frame counter between 0…12 and 1 … 12 so that overall frame counting is 12.5 fps. It reduces the number of bits needed for time_offset significantly. The value of 3 is similar but toggling between 0…12 and 0…11. The value of 4 is so called NTSC DF time code.

Note that these schemes are not limited to “half frame rate” or “NTSC DF Time Code”. In fact, there is a flag called cnt_dropped_flag which signifies the previous time code value is dropped – and you can set the flag at any time code allowed by the counting_type. For example, you can use counting_type=2 for “quarter frame rate” such as 6.25 fps.

Conclusively, the H.264 time code syntax has three major features. First, it allows the precise recovery of timing information from the time code at any frame rate and any frame counting scheme. Second, it does so without consuming a lot of bits. Third, it carries frame counting scheme information so that application can know the semantic of the time code without calculating the timing information every time.

(*1) I made one simplification. In the standard, the factor of 2 multiplied to num_unit_in_tick is specified as $1 + nuit\_field\_based\_flag$. It’s a left over from the old editions where the standard was unclear if time_scale / num_units_in_tick was frame rate or field rate. Today, everyone agrees that time_scale / num_units_in_tick is a field rate and therefore nuit_field_based_flag shall be always 1. See also: http://lists.mpegif.org/pipermail/mp4-tech/2005-July/005700.html

Posted in Video | Tagged , , , | 2 Comments

## Base64 encoder (C++)

Base64 encoder is used to convert any binary data into “printable” character sequence so that it can be carried over textual protocols such as XML. Here is a simple function to get base64 text out of byte buffer based on RFC 2045 6.8.


std::wstring ToBase64(unsigned char const* buffer, size_t size){
using std::wstring;
static wchar_t const* base64Table =
L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

size_t base64Size = (size + 2 - ((size + 2) % 3)) / 3 * 4;
wstring result(base64Size, L'=');

unsigned char const* s = buffer;  // source pointer
size_t di = 0;                    // destination index
for(size_t i = 0; i < size / 3; i++){
// input: 0000_0000 0000_0000 0000_0000
//
// out1:  0000_00
// out2:         00 0000
// out3:                _0000 00
// out4:                        00_0000

result[di++] = base64Table[s[0] >> 2];
result[di++] = base64Table[((s[0] << 4) | (s[1] >> 4)) & 0x3f];
result[di++] = base64Table[((s[1] << 2) | (s[2] >> 6)) & 0x3f];
result[di++] = base64Table[s[2] & 0x3f];
s += 3;
}

size_t remainSize = size % 3;
switch(remainSize){
case 0:
break;
case 1:
result[di++] = base64Table[s[0] >> 2];
result[di++] = base64Table[(s[0] << 4) & 0x3f];
break;
case 2:
result[di++] = base64Table[s[0] >> 2];
result[di++] = base64Table[((s[0] << 4) | (s[1] >> 4)) & 0x3f];
result[di++] = base64Table[(s[1] << 2) & 0x3f];
break;
default:
throw std::logic_error("Should not happen.");
}
return result;
}

Posted in Algorithm, C++ | Tagged | 1 Comment

## How many tests shall we run?

As many as possible, of course. However, as I discussed in the previous post, we don’t have to run all the test to estimate the overall success rate. It’s useful especially for release scheduling. You can get the automation test success rate daily or even hourly by just running a part of the test suites.

So, how many test cases shall we run to get good estimation? Here is the plot of Agresti-Coull formula with 95% confidence level. X-axis is the number of test runs and Y-axis is the error range. Each line indicates the various success rate.

The graph shows that roughly 150 to 250 test cases results in $\pm5\%$ to $\pm10\%$ confidence interval. Even if you run more tests, the interval doesn’t shrink much.

Conclusively, in the development phase, pick 150 to 250 test cases randomly from the automation test suite. Then process the success rate through Agresti-Coull formula. It gives you the quick estimate of the product stability.

Posted in Algorithm, Tips | Tagged , , , | Leave a comment

## Statistics of partial test cases run

In software development, the number of test cases is increased dramatically especially when you generate the test cases automatically. It becomes a problem when the execution time of the test suites is the bottle neck of the development. For example, assume you have user interface with 20 check boxes. The number of test cases to cover the all possible parameters is roughly one million (=2 ^ 20). If one test takes one second, the whole test suite takes 12 days to complete!

The problem leads me to idea to use statistics. Ultimately, we want to say something like:

“I’ve performed n test cases (n is small enough to run quickly) and success rate was 98%. Therefore, the overall success rate is estimated to be 98% +/- 3% with the confidence level 95%”

The test suite can be considered as Bernoulli trial where the probability p is the success rate of overall test suite. The binomial proportion confidence interval is known to be:

$\hat{p} \pm z_{0.95} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}$

where $\hat{p}$ is a proportion of success in the Bernoulli trial and $n$ is the sample size. (Binomial proportion confidence interval)

For example, assume you pick 100 test cases randomly from the test suite. The success rate was 80% (=80 tests of 100 cases were successful). You can say the estimated success rate of the test suite with the 95% confidence level is:

$0.8 \pm z_{0.95} \sqrt{\frac{0.8(1 - 0.8)}{100}}$
$= 0.8 \pm 0.0784$
$= 80\% \pm 7.8\%$

Now, does it really work? (Actually, it doesn’t. Read on!)

To test this, I wrote a simple Ruby script. It plots the estimated success rate and the error range when the actual success rate is 95% and sample size is 1000. The script loops 64 times.

SAMPLE_SIZE = 1000
PROPORTION = 0.95
TRIAL_SIZE = 64

(1..TRIAL_SIZE).each {|trial|
success = 0
(1..SAMPLE_SIZE).each{
if rand() < PROPORTION
success += 1
end
}

p = success.to_f / SAMPLE_SIZE
e = 1.96 * Math.sqrt(p * (1 - p) / SAMPLE_SIZE)
puts "#{trial} #{p} #{p - e} #{p + e}"
}


Here is the plot. You can see most of them is correctly estimating the true success rate (95%). Actually, it’s expected that 95% of the estimates should hit the true value as the confidence level is 95%.

Let’s verify the confidence level. Increase the number of trial to get the better accuracy. Also, record the number of successful estimation to show the overall estimate accuracy.

SAMPLE_SIZE = 1000
PROPORTION = 0.95
TRIAL_SIZE = 1000

hit = 0
(1..TRIAL_SIZE).each {|trial|
success = 0
(1..SAMPLE_SIZE).each{
if rand() < PROPORTION
success += 1
end
}

p = success.to_f / SAMPLE_SIZE
e = 1.96 * Math.sqrt(p * (1 - p) / SAMPLE_SIZE)
if (p - e < PROPORTION) and (PROPORTION < p + e)
hit += 1
end
}
puts "Successful Estimate: #{hit} / #{TRIAL_SIZE} = #{hit * 100.0 / TRIAL_SIZE} %"


Here is the result of three run. It meets the expectation (95% confidence level).

 Successful Estimate: 950 / 1000 = 95.0 % Successful Estimate: 947 / 1000 = 94.7 % Successful Estimate: 951 / 1000 = 95.1 % 

So far so good. However, 95% of test success rate is pretty bad from product quality perspective. At the end of product development, almost all tests should pass (say, 99.9%). Let’s see how this ruby script estimates the success rate in such situation.

SAMPLE_SIZE = 1000
PROPORTION = 0.999
TRIAL_SIZE = 1000

hit = 0
(1..TRIAL_SIZE).each {|trial|
success = 0
(1..SAMPLE_SIZE).each{
if rand() < PROPORTION
success += 1
end
}

p = success.to_f / SAMPLE_SIZE
e = 1.96 * Math.sqrt(p * (1 - p) / SAMPLE_SIZE)
if (p - e < PROPORTION) and (PROPORTION < p + e)
hit += 1
end
}
puts "Successful Estimate: #{hit} / #{TRIAL_SIZE} = #{hit * 100.0 / TRIAL_SIZE} %"


The three runs result in pretty bad result.

 Successful Estimate: 653 / 1000 = 65.3 % Successful Estimate: 622 / 1000 = 62.2 % Successful Estimate: 636 / 1000 = 63.6 % 

The plot shows that the estimated range is sometimes zero.

The problem is that the formula used to calculate the binomial proportion confidence interval is not suitable when the value of p is extreme (Very closed to 0 or 1). We can use better formula: Agresti-Coull Interval.

$\tilde{n} = n + z^2_{0.95}$
$\tilde{p} = \frac{X + z^2_{0.95} / 2}{\tilde{n}}$
$\tilde{p} \pm z_{0.95} \sqrt{\frac{\tilde{p}(1-\tilde{p})}{\tilde{n}}}$

Let’s see the plot of Agresti-Coull formula. The Ruby script is like this.

POPULATION_SIZE = 1000000
SAMPLE_SIZE = 1000
PROPORTION = 0.999
TRIAL_SIZE = 64

(1..TRIAL_SIZE).each {|trial|
success = 0
(1..SAMPLE_SIZE).each{
if rand() < PROPORTION
success += 1
end
}

nt = SAMPLE_SIZE + 1.96 ** 2
p = (success + (1.96 ** 2) / 2) / nt
e = 1.96 * Math.sqrt(p * (1 - p) / nt)
puts "#{trial} #{p} #{p - e} #{p + e}"
}


The plot is here. You can see majority of the estimates accurately hit the real success rate.

The actual successful estimate rate is shown here:

 Successful Estimate: 975 / 1000 = 97.5 % Successful Estimate: 988 / 1000 = 98.8 % Successful Estimate: 982 / 1000 = 98.2 % 

To make sure the Agresti-Coull formula also works when success rate is low (30%), here is the plot of both simple estimation (green) and Agresti-Coull (red). It tries 1000 times. You can see both result match pretty well.

Conclusively, here is the Ruby function to meet the original goal. Given that you have many test cases, you randomly pick N tests. The success rate of the sampled test cases is P. The estimated overall success rate with the confidence level 95% is:

N = ... # Number of test cases actually performed
SUCCESS = ... # Number of successful test cases

nt = N + 1.96 ** 2
pt = (SUCCESS + (1.96 ** 2) / 2) / nt
e = 1.96 * Math.sqrt(pt * (1 - pt) / nt)

puts "The estimated success rate is #{pt * 100}% with error range +/- #{e * 100}% at the confidence level 95%"

Posted in Algorithm, Tips | Tagged , , | 1 Comment

## Automatic debugging and API monitoring

The debugger allows you to set a break point with commands which is executed on the break point. For example, the following debugging commands shows the stack trace whenever CreateFileW API is called and write them into the log file.

* Whenever CreateFileW is called, get the stack trace

.logopen log.txt
bp kernel32!CreateFileW "kb;gc"


How about logging the result code of the function? Unfortunately, any command that resumes program execution after a breakpoint (such as g or t) ends the execution of the command list. For example, the following command doesn’t work as expected.

* !! Doesn't work !!
* Whenever CreateFileW is called, get the stack trace and exit last error code.

.logopen log.txt
bp kernel32!CreateFileW "kb;pt;!gle;gc"


Here is a solution. On the function call break point, we can get the function return address at poi(@esp). Set another break point there before continue.

* Whenever CreateFileW is called, get the stack trace and exit last error code.

.logopen log.txt
bp kernel32!CreateFileW "kb;bp poi(@esp) \"!gle;gc\";gc"


There is another problem. We should disable the second break point after hit otherwise it would trigger the program break which is not expected.

Also, to make it a little more fun, let’s display the arguments to the function as well.

bp[1] kernel32!CreateFileW ".echo CreateFileW;db poi(@esp+4);bp[101] poi(@esp) \"bd[101];r;gc\";gc"


You can apply this technique to make a poor man’s API tracer program. It’s sometimes more convenient than !logext because we can trace user functions in addition to the Win32 functions.

Here is a debug command list to trace some of file I/O related functions.

bp[10] kernel32!GetFileAttributesA ".echo GetFileAttributeA;db poi(@esp+4);bp[51] poi(@esp) \"bd[51];r;!gle;gc\";gc"
bp[11] kernel32!CreateFileA ".echo CreateFileA;db poi(@esp+4);bp[50] poi(@esp) \"bd[50];r;gc\";gc"
bp[12] kernel32!GetShortPathNameA ".echo GetShortPathNameA;.echo long_name;db poi(@esp+4);bp[52] poi(@esp) \"bd[52];r;.echo short_name;db poi(@esp-8);gc\";gc"
bp[13] kernel32!DeleteFileA ".echo DeleteFileA;db poi(@esp+4);bp[53] poi(@esp) \"bd[53];r;gc\";gc"
bp[14] kernel32!GetLongPathNameA  ".echo GetLongPathNameA;.echo short_name;db poi(@esp+4);bp[54] poi(@esp) \"bd[54];r;.echo long_name;db poi(@esp-8);gc\";gc"
bp[15] kernel32!FindFirstFileW ".echo FindFirstFileW;db poi(@esp+4) L200;bp[55] poi(@esp) \"bd[55];r;!gle;.echo dstamt!WIN32_FIND_DATAW;dt poi(@esp-4) dstamt!WIN32_FIND_DATAW;gc\";gc"
bp[16] kernel32!GetFullPathNameA  ".echo GetFullPathNameA;.echo input_name;db poi(@esp+4);bp[55] poi(@esp) \"bd[55];r;.echo reuslt_name;db poi(@esp-8);gc\";gc"
bp[17] kernel32!SetCurrentDirectoryA  ".echo SetCurrentDirectoryA;db poi(@esp+4);bp[56] poi(@esp) \"bd[56];r;gc\";gc"
bp[18] kernel32!GetVolumePathNameW ".echo GetVolumePathNameW;.echo input_name;db poi(@esp+4);bp[57] poi(@esp) \"bd[57];r;.echo reuslt_name;db poi(@esp-8);gc\";gc"
bp[19] Shlwapi!PathFindFileNameW  ".echo PathFindFileNameW;.echo input_name;db poi(@esp+4);bp[58] poi(@esp) \"bd[58];r;.echo result_name;db @eax;gc\";gc"
bp[20] kernel32!GetTempPathW ".echo GetTempPathW;bp[59] poi(@esp) \"bd[59];r;gc\";gc"


## How does boost::phoenix work?

The boost::phoenix is simply mysterious – it doesn’t look like C++.

for_each(v.begin(), v.end(),
if_(arg1 > 5)
[
cout << arg1 << ", "
]
);


It’s the standard std::for_each. The third parameter is usually a functor but boost::phoenix lets you inline the code there.

To understand the mystery, I try to write simple boost::phoenix from the scratch.

The basic idea is to build a function object with the syntax similar to C++. For example, let’s take the simplest std::count_if example. Here, std::count_if takes a functor which always returns true. The functor is defined as true_.

struct bool_{
bool_(bool v) : v(v){}
bool operator()(int&) { return v;}
bool v;
};
static const bool_ true_(true);
static const bool_ false_(false);

void main(){
int a[] = {1, 2, 3, 4, 5};
int* beg = a;
int* end = a + sizeof(a) / sizeof(a[0]);
ptrdiff_t ret = count_if(beg, end, true_);
cout << ret << endl;
}


It’s not useful until we support operators. How to write an expression to return true when the argument value is odd number? In other words, how to write the expression like this?

count_if(beg, end, _1 % 2);

The functor _1 should return the value of the argument.

struct argument_t {
int operator()(int& v){
return v;
}
};
static const argument_t _1;


The next step is to define the % operator which takes an argument_t object as left hand side value and integer as right hand side value. The operator returns a functor which perform modulo operation when the argument is supplied.

struct mod_composite {
argument_t numerator;
int denominator;

mod_composite(argument_t a, int b)
:numerator(a),
denominator(b)
{
}

int operator()(int& v){
return numerator(v) % denominator;
}
};

mod_composite operator%(argument_t a, int b) {
return mod_composite(a, b);
}


The whole program is below:

#include <iostream>
#include <algorithm>

using namespace std;

struct argument_t {
int operator()(int& v){
return v;
}
};

struct mod_composite {
argument_t numerator;
int denominator;

mod_composite(argument_t a, int b)
:numerator(a),
denominator(b)
{
}

int operator()(int& v){
return numerator(v) % denominator;
}
};

mod_composite operator%(argument_t a, int b) {
return mod_composite(a, b);
}

static const argument_t _1;

void main(){
int a[] = {1, 2, 3, 4, 5};
int* beg = a;
int* end = a + sizeof(a) / sizeof(a[0]);
ptrdiff_t ret = count_if(beg, end, _1 % 2);

cout << ret << endl;
}


The solution has a problem. The mod_composite::operator function is hard-coded to pass the argument to the numerator functor. How to write, for example, return true when the argument is a factor of 12?

    ptrdiff_t ret = count_if(beg, end, 12 % _1);


The first step is to write a dedicated type for all the terms of the expression. The object accepts argument and handle it differently depending on the internal type.

template< typename T>
struct term_t {
term_t(){}
term_t(T v) : v(v){}
int operator()(int& arg);
T v;
};
template<typename T>
int term_t<T>::operator()(int& arg){
return v(arg);
}
template<>
int term_t<int>::operator()(int& arg){
return v;
}


The operator is now takes term_t object instead of the internal types.

template <typename LT, typename RT>
mod_composite<LT, RT> operator%(term_t<LT> a, term_t<RT> b) {
return mod_composite<LT, RT>(a, b);
}


As a result, the mod_composite object can unconditionally apply the argument on both terms.

template< typename LT, typename RT>
struct mod_composite {
term_t<LT> numerator;
term_t<RT> denominator;

mod_composite(term_t<LT> a, term_t<RT> b)
:numerator(a),
denominator(b)
{
}

int operator()(int& v){
return numerator(v) % denominator(v);
}
};


Here is the whole source code.

#include <iostream>
#include <algorithm>

using namespace std;

template< typename T>
struct term_t {
term_t(){}
term_t(T v) : v(v){}
int operator()(int& arg);
T v;
};
template<typename T>
int term_t<T>::operator()(int& arg){
return v(arg);
}
template<>
int term_t<int>::operator()(int& arg){
return v;
}

struct argument_t {
int operator()(int& v){
return v;
}
};

template< typename LT, typename RT>
struct mod_composite {
term_t<LT> numerator;
term_t<RT> denominator;

mod_composite(term_t<LT> a, term_t<RT> b)
:numerator(a),
denominator(b)
{
}

int operator()(int& v){
return numerator(v) % denominator(v);
}
};

template <typename LT, typename RT>
mod_composite<LT, RT> operator%(term_t<LT> a, term_t<RT> b) {
return mod_composite<LT, RT>(a, b);
}

static const term_t<argument_t> _1;

template <typename T>
term_t<T> val_(T v) {
return term_t<T>(v);
}

void main(){
int a[] = {1, 2, 3, 4, 5};
int* beg = a;
int* end = a + sizeof(a) / sizeof(a[0]);
ptrdiff_t ret = count_if(beg, end, val_(12) % _1);

cout << ret << endl;
}


## Buffering delay and MPEG-2 Transport stream

As we discussed in the previous post “What are CBR, VBV, and CPB?”, video decoder has a buffer to compensate the different coded size of each picture. Here is the actual example of the buffer occupancy of a MPEG-2 video stream at the beginning of the stream.

The first picture is decoded (= removed from the buffer) at the time zero. Before the decoding time, the coded stream has been entering into the buffer for more than 0.5 second.

The time interval between the entrance of the first byte of the coded picture and the decoding of the picture is called buffering delay. The delay is essentially the time between these two events.

• The STB starts to receive the coded picture.
• The STB decodes video frame.

The maximum buffering delay is caused when a picture is decoded when the buffer fullness is same as the buffer size. It means the buffer size and bit rate controls the maximum buffering delay.

max_buffering_delay (second) = buffer_size (bytes) * 8 / bit_rate (bits/second)

As we discussed, with a given bit rate, you could increase the video quality by increasing the buffer size. However, this trick has its own cost. It makes the STB more expensive and the buffering delay longer.

In the case of MPEG-2 video, the maximum buffering delay is almost always less than 0.7 second. However, in the case of H.264, it’s common to see very long delay such as 4 to 10 seconds especially for Web streaming context.

When multiplexing the video stream into transport stream, however, such long delay would cause a problem. While MPEG-2 TS standard allows up to 10 second delay for H.264 video, almost all STBs accepts up to 1 to 2 second. Therefore, if you want to construct the H.264 stream for transport stream broadcast purpose, you should restrict the CPB buffer size so that the buffer delay is less than one second.

Posted in Video | 8 Comments