HEVC, also known as H.265 or MPEG-H part 2(ISO/IEC 23008-2), is just around the corner. A good overview was published. The reference implementation is available. Active discussion can be read in the JCT-VC document management system. The Final Draft International Standard (FDIS) is expected be produced in January 2013.
However, as always, the standard writers love cryptic acronyms. Probably the first acronyms which discourage standard readers are block structure coding terminologies, namely CTU, CU, CTB, CB, PB, and TB.
They are basically replacement of Macroblocks and blocks in prior standards. Unlike 10 years ago, we have much higher frame sizes to deal with. 4K production became practical and people start talking about 8K. Even mobile device has higher than HD frame size such as 2048 x 1530. We need larger macroblocks to efficiently encode the motion vectors for these frame size. On the other hand, small detail is still important and we sometimes want to perform prediction and transformation at the granularity of 4×4.
How could we support wide variety of block sizes in efficient manner? That’s a challenge HEVC is trying to solve with those acronyms.
Let’s start from the higher level. Suppose we have a picture to encode. HEVC divides the picture into CTUs (Coding Tree Unit).
The width and height of CTU are signaled in a sequence parameter set, meaning that all the CTUs in a video sequence have the same size: 64×64, 32×32, or 16×16.
We need to understand an important naming convention here. In HEVC standard, if something is called xxxUnit, it indicates a coding logical unit which is in turn encoded into an HEVC bit stream. On the other hand, if something is called xxxBlock, it indicates a portion of video frame buffer where a process is target to.
CTU – Coding Tree Unit is therefore a logical unit. It usually consists of three blocks, namely luma (Y) and two chroma samples (Cb and Cr), and associated syntax elements. Each block is called CTB (Coding Tree Block).
Each CTB still has the same size as CTU – 64×64, 32×32, or 16×16. Depending on a part of video frame, however, CTB may be too big to decide whether we should perform inter-picture prediction or intra-picture prediction. Thus, each CTB can be differently split into multiple CBs (Coding Blocks) and each CB becomes the decision making point of prediction type. For example, some CTBs are split to 16×16 CBs while others are split to 8×8 CBs. HEVC supports CB size all the way from the same size as CTB to as small as 8×8.
The following picture illustrates how 64×64 CTB can be split into CBs.
CB is the decision point whether to perform inter-picture or intra-picture prediction. More precisely, the prediction type is coded in CU (Coding Unit). CU consists of three CBs (Y, Cb, and Cr) and associated syntax elements.
CB is good enough for prediction type decision, but it could still be too large to store motion vectors (inter prediction) or intra prediction mode. For example, a very small object like snowfall may be moving in the middle of 8×8 CB – we want to use different MVs depending on the portion in CB.
Thus, PB was introduced. Each CB can be split to PBs differently depending on the temporal and/or spatial predictability.
Once the prediction is made, we need to code residual (difference between predicted image and actual image) with DCT-like transformation. Again, CB could be too big for this because a CB may contains both a detailed part (high frequency) and a flat part (low frequency). Therefore, each CB can be differently split into TBs (Transform Block). Note that TB doesn’t have to be aligned with PB. It is possible and often makes sense to perform single transform across residuals from multiple PBs, vise versa.
Let’s read a draft standard text regarding to these terminologies. They should make more sense now.
CTU (coding tree unit): A coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples. The division of a slice into coding tree units is a partitioning.
CTB (coding tree block): An NxN block of samples for some value of N. The division of one of the arrays that compose a picture that has three sample arrays or of the array that compose a picture in monochrome format or a picture that is coded using three separate colour planes into coding tree blocks is a partitioning.
CB (coding block): An NxN block of samples for some value of N. The division of a coding tree block into coding blocks is a partitioning.
CU (coding unit): A coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples. The division of a coding tree unit into coding units is a partitioning.
PB (prediction block): A rectangular MxN block of samples on which the same prediction is applied. The division of a coding block into prediction blocks is a partitioning.
TB (transform block): A rectangular MxN block of samples on which the same transform is applied. The division of a coding block into transform blocks is a partitioning.