When you define a structure like this, the integer field is allocated not just after the byte field, but aligned to 4 byte boundary.
unsigned char v1;
unsigned int v2;
It’s because x86 architecture has a performance issue in accessing non-aligned data. Other CPU architectures such as Itanium even don’t allow non-aligned access.
So, how much performance deceleration happens in x86? Here is a bench mark.
The test is performed by accessing various size of buffers and getting the elapsed time. The graph is based on average values of 10 runs.
In reading test, there is actually little performance difference between aligned access and non-aligned access. Using byte-to-byte access to avoid alignment issue is actually the worst choice; it’s far slower than non-aligned integer access.
In integer writing test, however, aligned and non-aligned have almost x2 difference. Using byte-to-byte access would make sense here – it’s slightly faster than non-aligned access.
Now, here is a question. You can see a spike at 512KB in non-aligned access. Where does it come from?