HEVC, also known as H.265 or MPEG-H part 2(ISO/IEC 23008-2), is just around the corner. A good overview was published. The reference implementation is available. Active discussion can be read in the JCT-VC document management system. The Final Draft International Standard (FDIS) is expected be produced in January 2013.
However, as always, the standard writers love cryptic acronyms. Probably the first acronyms which discourage standard readers are block structure coding terminologies, namely CTU, CU, CTB, CB, PB, and TB.
They are basically replacement of Macroblocks and blocks in prior standards. Unlike 10 years ago, we have much higher frame sizes to deal with. 4K production became practical and people start talking about 8K. Even mobile device has higher than HD frame size such as 2048 x 1530. We need larger macroblocks to efficiently encode the motion vectors for these frame size. On the other hand, small detail is still important and we sometimes want to perform prediction and transformation at the granularity of 4×4.
How could we support wide variety of block sizes in efficient manner? That’s a challenge HEVC is trying to solve with those acronyms.
Let’s start from the higher level. Suppose we have a picture to encode. HEVC divides the picture into CTUs (Coding Tree Unit).
The width and height of CTU are signaled in a sequence parameter set, meaning that all the CTUs in a video sequence have the same size: 64×64, 32×32, or 16×16.
We need to understand an important naming convention here. In HEVC standard, if something is called xxxUnit, it indicates a coding logical unit which is in turn encoded into an HEVC bit stream. On the other hand, if something is called xxxBlock, it indicates a portion of video frame buffer where a process is target to.
CTU – Coding Tree Unit is therefore a logical unit. It usually consists of three blocks, namely luma (Y) and two chroma samples (Cb and Cr), and associated syntax elements. Each block is called CTB (Coding Tree Block).
Each CTB still has the same size as CTU – 64×64, 32×32, or 16×16. Depending on a part of video frame, however, CTB may be too big to decide whether we should perform inter-picture prediction or intra-picture prediction. Thus, each CTB can be differently split into multiple CBs (Coding Blocks) and each CB becomes the decision making point of prediction type. For example, some CTBs are split to 16×16 CBs while others are split to 8×8 CBs. HEVC supports CB size all the way from the same size as CTB to as small as 8×8.
The following picture illustrates how 64×64 CTB can be split into CBs.
CB is the decision point whether to perform inter-picture or intra-picture prediction. More precisely, the prediction type is coded in CU (Coding Unit). CU consists of three CBs (Y, Cb, and Cr) and associated syntax elements.
CB is good enough for prediction type decision, but it could still be too large to store motion vectors (inter prediction) or intra prediction mode. For example, a very small object like snowfall may be moving in the middle of 8×8 CB – we want to use different MVs depending on the portion in CB.
Thus, PB was introduced. Each CB can be split to PBs differently depending on the temporal and/or spatial predictability.
Once the prediction is made, we need to code residual (difference between predicted image and actual image) with DCT-like transformation. Again, CB could be too big for this because a CB may contains both a detailed part (high frequency) and a flat part (low frequency). Therefore, each CB can be differently split into TBs (Transform Block). Note that TB doesn’t have to be aligned with PB. It is possible and often makes sense to perform single transform across residuals from multiple PBs, vise versa.
Let’s read a draft standard text regarding to these terminologies. They should make more sense now.
CTU (coding tree unit): A coding tree block of luma samples, two corresponding coding tree blocks of chroma samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples. The division of a slice into coding tree units is a partitioning.
CTB (coding tree block): An NxN block of samples for some value of N. The division of one of the arrays that compose a picture that has three sample arrays or of the array that compose a picture in monochrome format or a picture that is coded using three separate colour planes into coding tree blocks is a partitioning.
CB (coding block): An NxN block of samples for some value of N. The division of a coding tree block into coding blocks is a partitioning.
CU (coding unit): A coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate colour planes and syntax structures used to code the samples. The division of a coding tree unit into coding units is a partitioning.
PB (prediction block): A rectangular MxN block of samples on which the same prediction is applied. The division of a coding block into prediction blocks is a partitioning.
TB (transform block): A rectangular MxN block of samples on which the same transform is applied. The division of a coding block into transform blocks is a partitioning.
Pingback: Video/Audio Codec | canlinflexray
Good summary on CTU/CU/PU
Pingback: Slice vs Tile in H.265 | Cash's Blog
Pingback: HEVC – What are CTU, CU, CTB, CB, PB, and TB? | CODE: Sequoia | CODE Paint
Excellent post, clears every thing
I actually started reading the HEVC review paper, felt lost pretty fast. But this post save my day…!
what is frame buffer ? what is the importance of frame buffer ? And How do we configure ?
Excellent post. Great work.
Thank you for the informative and understandable post
I am working on HEVC motion estimation as my Final year project.
I have extracted frames of video and I have the pixel value of a particular frame like this:
How to partition this value further into PUs? Need Help. Reply as soon as possible
hello am also doing my final year project on the same. did you get the help you needed? please if you did am also in need… this is my email firstname.lastname@example.org
What a great post, thats the idea of a good teacher make something complex easy to understand. Thanks for share. Which tool did you use for generated graphics?
Pingback: H.265 – The Arrival of Video Compression’s Future – S3 Security Systems
It’s really an execellent summary and explaination for newbies to HEVC
Pingback: H.265 / HEVC Codec – 세빛기술 블로그
Congratulations, thank you very much for this great explanation.
Great summary. Thanks man
Well written, both concise and clear, for one with some prior art experience.
Could you explain me rule of decision splitting CTB? How identify best size of CB?
Very Useful. thank you for the pictorial representation
SIR from where i can get H,265 matlab codec
Pingback: Improvement of CTU Split Mode Decision in H.265 by Machine Learning, Part 1 – Developer Journal
Pingback: HEVC – What are CTU, CU, CTB, CB, PB, and TB? | CODE: Sequoia – BLOG.DONGHWI.KIM
Pingback: How can I determine if a video can be encoded successfully with HEVC (x265) encoding - Boot Panic
Pingback: Slice vs Tile in H.265 – Cash Chou's Blog