What you call “H.264 stream” is probably not the real raw H.264 stream – it’s probably “H.264 byte stream”. It’s important to use the proper terminology to discuss stream validity or multiplexing matters.
Let’s start from the SODB (String Of Data Bits). This is the real raw H.264 stream. The syntax is specified ISO/IEC 14496-10 in a form of bit string syntax.
SODB is not very convenient because it’s a bit stream, not a byte stream. It means the number of bits for one syntax may not be byte-aligned and therefore difficult to process. This is why the standard introduces RBSP (Raw Byte Sequence Payload). RBSP stores SODB in a byte stream so that the first bit of each syntax is always aligned at the first bit of a byte.
The problem of RBSP is that RBSP may contain any byte pattern. Since it doesn’t have special synchronization byte sequence, to find the synchronization point in RBSP (For example, to find the first byte of IDR picture), you may have to parse every single bit syntax from the beginning of the file.
NAL Unit is another wrapper layer to prevent certain byte pattern from occurring in the stream. When RBSP has any of 0x000000, 0x000001, 0x000002, and 0x000003, they are converted to 0x00000300, 0x00000301, 0x00000302, and 0x00000303 respectively in NAL unit. Therefore, we can use any of 0x000000, 0x000001, 0x000002, or 0x000003 as a special synchronization byte sequence.
NAL Unit also adds one byte header to indicate the type of the NAL Unit.
Byte Stream Format
Although NAL Unit allows to put a synchronization byte sequence, it doesn’t have any yet. The standard defines another wrapper to add three or four bytes synchronization byte pattern: the Byte Stream Format. The byte stream format puts a synchronization byte sequence (0x000001 or 0x00000001) before every NAL Unit. The byte stream format is used as the elementary stream of H.264 in transport stream.
The original intention of the standard is to let applications to pick a suitable format. For example, MPEG Transport Stream uses byte stream format to allow decoder to find the NAL Unit easily. Other container format such as AVI where the length of the header is stored in a packet would use NAL Unit or even RBSP to reduce the overhead of synchronization bytes. However, in real world, almost all storage and delivery formats use byte stream format. It’s due to practical reasons such as to simplify the re-wrapping process.
Nonetheless, to avoid any confusion, it’s better idea to use the term “H.264 Byte Stream Format” instead of “H.264 raw stream” or “H.264 stream”.